Spaces:
Runtime error
Runtime error
| # Quantize DeepLab model for faster on-device inference | |
| This page describes the steps required to quantize DeepLab model and convert it | |
| to TFLite for on-device inference. The main steps include: | |
| 1. Quantization-aware training | |
| 1. Exporting model | |
| 1. Converting to TFLite FlatBuffer | |
| We provide details for each step below. | |
| ## Quantization-aware training | |
| DeepLab supports two approaches to quantize your model. | |
| 1. **[Recommended]** Training a non-quantized model until convergence. Then | |
| fine-tune the trained float model with quantization using a small learning | |
| rate (on PASCAL we use the value of 3e-5) . This fine-tuning step usually | |
| takes 2k to 5k steps to converge. | |
| 1. Training a deeplab float model with delayed quantization. Usually we delay | |
| quantization until the last a few thousand steps in training. | |
| In the current implementation, quantization is only supported with 1) | |
| `num_clones=1` for training and 2) single scale inference for evaluation, | |
| visualization and model export. To get the best performance for the quantized | |
| model, we strongly recommend to train the float model with larger `num_clones` | |
| and then fine-tune the model with a single clone. | |
| Here shows the commandline to quantize deeplab model trained on PASCAL VOC | |
| dataset using fine-tuning: | |
| ``` | |
| # From tensorflow/models/research/ | |
| python deeplab/train.py \ | |
| --logtostderr \ | |
| --training_number_of_steps=3000 \ | |
| --train_split="train" \ | |
| --model_variant="mobilenet_v2" \ | |
| --output_stride=16 \ | |
| --train_crop_size="513,513" \ | |
| --train_batch_size=8 \ | |
| --base_learning_rate=3e-5 \ | |
| --dataset="pascal_voc_seg" \ | |
| --initialize_last_layer \ | |
| --quantize_delay_step=0 \ | |
| --tf_initial_checkpoint=${PATH_TO_TRAINED_FLOAT_MODEL} \ | |
| --train_logdir=${PATH_TO_TRAIN_DIR} \ | |
| --dataset_dir=${PATH_TO_DATASET} | |
| ``` | |
| ## Converting to TFLite FlatBuffer | |
| First use the following commandline to export your trained model. | |
| ``` | |
| # From tensorflow/models/research/ | |
| python deeplab/export_model.py \ | |
| --checkpoint_path=${CHECKPOINT_PATH} \ | |
| --quantize_delay_step=0 \ | |
| --export_path=${OUTPUT_DIR}/frozen_inference_graph.pb | |
| ``` | |
| Commandline below shows how to convert exported graphdef to TFlite model. | |
| ``` | |
| tflite_convert \ | |
| --graph_def_file=${OUTPUT_DIR}/frozen_inference_graph.pb \ | |
| --output_file=${OUTPUT_DIR}/frozen_inference_graph.tflite \ | |
| --output_format=TFLITE \ | |
| --input_shape=1,513,513,3 \ | |
| --input_arrays="MobilenetV2/MobilenetV2/input" \ | |
| --inference_type=QUANTIZED_UINT8 \ | |
| --inference_input_type=QUANTIZED_UINT8 \ | |
| --std_dev_values=128 \ | |
| --mean_values=128 \ | |
| --change_concat_input_ranges=true \ | |
| --output_arrays="ArgMax" | |
| ``` | |
| **[Important]** Note that converted model expects 513x513 RGB input and doesn't | |
| include preprocessing (resize and pad input image) and post processing (crop | |
| padded region and resize to original input size). These steps can be implemented | |
| outside of TFlite model. | |
| ## Quantized model on PASCAL VOC | |
| We provide float and quantized checkpoints that have been pretrained on VOC 2012 | |
| train_aug set, using MobileNet-v2 backbone with different depth multipliers. | |
| Quantized model usually have 1% decay in mIoU. | |
| For quantized (8bit) model, un-tar'ed directory includes: | |
| * a frozen inference graph (frozen_inference_graph.pb) | |
| * a checkpoint (model.ckpt.data*, model.ckpt.index) | |
| * a converted TFlite FlatBuffer file (frozen_inference_graph.tflite) | |
| Checkpoint name | Eval OS | Eval scales | Left-right Flip | Multiply-Adds | Quantize | PASCAL mIOU | Folder Size | TFLite File Size | |
| -------------------------------------------------------------------------------------------------------------------------------------------- | :-----: | :---------: | :-------------: | :-----------: | :------: | :----------: | :-------: | :-------: | |
| [mobilenetv2_dm05_coco_voc_trainaug](http://download.tensorflow.org/models/deeplabv3_mnv2_dm05_pascal_trainaug_2018_10_01.tar.gz) | 16 | [1.0] | No | 0.88B | No | 70.19% (val) | 7.6MB | N/A | |
| [mobilenetv2_dm05_coco_voc_trainaug_8bit](http://download.tensorflow.org/models/deeplabv3_mnv2_dm05_pascal_train_aug_8bit_2019_04_26.tar.gz) | 16 | [1.0] | No | 0.88B | Yes | 69.65% (val) | 8.2MB | 751.1KB | |
| [mobilenetv2_coco_voc_trainaug](http://download.tensorflow.org/models/deeplabv3_mnv2_pascal_train_aug_2018_01_29.tar.gz) | 16 | [1.0] | No | 2.75B | No | 75.32% (val) | 23MB | N/A | |
| [mobilenetv2_coco_voc_trainaug_8bit](http://download.tensorflow.org/models/deeplabv3_mnv2_pascal_train_aug_8bit_2019_04_26.tar.gz) | 16 | [1.0] | No | 2.75B | Yes | 74.26% (val) | 24MB | 2.2MB | |
| Note that you might need the nightly build of TensorFlow (see | |
| [here](https://www.tensorflow.org/install) for install instructions) to convert | |
| above quantized model to TFLite. | |