Giter VIP home page Giter VIP logo

ragnariock / pinto_model_zoo Goto Github PK

View Code? Open in Web Editor NEW

This project forked from pinto0309/pinto_model_zoo

0.0 1.0 0.0 745.13 MB

A repository that shares tuning results of trained models generated by Tensorflow. Post-training quantization (Weight Quantization, Integer Quantization, Full Integer Quantization), Quantization-aware training.

Home Page: https://qiita.com/PINTO

License: MIT License

Shell 3.40% Python 96.60%

pinto_model_zoo's Introduction

PINTO_model_zoo

A repository that shares tuning results of trained models generated by Tensorflow. Post-training quantization (Weight Quantization, Integer Quantization, Full Integer Quantization), Quantization-aware training.

[Note Jan 05, 2020] Currently, the MobileNetV3 backbone model and the Full Integer Quantization model do not return correctly.

[Note Jan 08, 2020] If you want the best performance with RaspberryPi4/3, install Ubuntu 19.10 aarch64 (64bit) instead of Raspbian armv7l (32bit). The official Tensorflow Lite is performance tuned for aarch64. On aarch64 OS, performance is about 4 times higher than on armv7l OS.

1. Environment

  • Ubuntu 18.04 x86_64
  • RaspberryPi4 Raspbian Buster 32bit / Raspbian Buster 64bit / Ubuntu 19.10 aarch64
  • Tensorflow-GPU v1.15.0 or Tensorflow v2.1.0 or Tensorflow v2.2.0-dev (tf-nightly)
  • Python 3.6.8
  • PascalVOC Dataset
  • COCO Dataset
  • Cityscapes Dataset
  • Imagenette Dataset
  • CelebA Dataset
  • Audio file (.wav)
  • Google Colaboratory

2. Procedure

2-1. MobileNetV3+DeeplabV3+PascalVOC

2-1-1. Preparation

$ cd ~
$ mkdir deeplab;cd deeplab
$ git clone --depth 1 https://github.com/tensorflow/models.git
$ cd models/research/deeplab/datasets
$ mkdir pascal_voc_seg

$ curl -sc /tmp/cookie \
  "https://drive.google.com/uc?export=download&id=1rATNHizJdVHnaJtt-hW9MOgjxoaajzdh" > /dev/null
$ CODE="$(awk '/_warning_/ {print $NF}' /tmp/cookie)"
$ curl -Lb /tmp/cookie \
  "https://drive.google.com/uc?export=download&confirm=${CODE}&id=1rATNHizJdVHnaJtt-hW9MOgjxoaajzdh" \
  -o pascal_voc_seg/VOCtrainval_11-May-2012.tar

$ sed -i -e "s/python .\/remove_gt_colormap.py/python3 .\/remove_gt_colormap.py/g" \
      -i -e "s/python .\/build_voc2012_data.py/python3 .\/build_voc2012_data.py/g" \
      download_and_convert_voc2012.sh

$ sh download_and_convert_voc2012.sh

$ cd ../..
$ mkdir -p deeplab/datasets/pascal_voc_seg/exp/train_on_train_set/train
$ mkdir -p deeplab/datasets/pascal_voc_seg/exp/train_on_train_set/eval
$ mkdir -p deeplab/datasets/pascal_voc_seg/exp/train_on_train_set/vis

$ export PATH_TO_TRAIN_DIR=${HOME}/deeplab/models/research/deeplab/datasets/pascal_voc_seg/exp/train_on_train_set/train
$ export PATH_TO_DATASET=${HOME}/deeplab/models/research/deeplab/datasets/pascal_voc_seg/tfrecord
$ export PYTHONPATH=${HOME}/deeplab/models/research:${HOME}/deeplab/models/research/deeplab:${HOME}/deeplab/models/research/slim:${PYTHONPATH}
# See feature_extractor.network_map for supported model variants.
# models/research/deeplab/core/feature_extractor.py

networks_map = {
    'mobilenet_v2': _mobilenet_v2,
    'mobilenet_v3_large_seg': mobilenet_v3_large_seg,
    'mobilenet_v3_small_seg': mobilenet_v3_small_seg,
    'resnet_v1_18': resnet_v1_beta.resnet_v1_18,
    'resnet_v1_18_beta': resnet_v1_beta.resnet_v1_18_beta,
    'resnet_v1_50': resnet_v1_beta.resnet_v1_50,
    'resnet_v1_50_beta': resnet_v1_beta.resnet_v1_50_beta,
    'resnet_v1_101': resnet_v1_beta.resnet_v1_101,
    'resnet_v1_101_beta': resnet_v1_beta.resnet_v1_101_beta,
    'xception_41': xception.xception_41,
    'xception_65': xception.xception_65,
    'xception_71': xception.xception_71,
    'nas_pnasnet': nas_network.pnasnet,
    'nas_hnasnet': nas_network.hnasnet,
}

2-1-2. "mobilenet_v3_small_seg" Float32 regular training

$ python3 deeplab/train.py \
    --logtostderr \
    --training_number_of_steps=500000 \
    --train_split="train" \
    --model_variant="mobilenet_v3_small_seg" \
    --decoder_output_stride=16 \
    --train_crop_size="513,513" \
    --train_batch_size=8 \
    --dataset="pascal_voc_seg" \
    --save_interval_secs=300 \
    --save_summaries_secs=300 \
    --save_summaries_images=True \
    --log_steps=100 \
    --train_logdir=${PATH_TO_TRAIN_DIR} \
    --dataset_dir=${PATH_TO_DATASET}

2-1-3. "mobilenet_v3_large_seg" Float32 regular training

$ python3 deeplab/train.py \
    --logtostderr \
    --training_number_of_steps=1000000 \
    --train_split="train" \
    --model_variant="mobilenet_v3_large_seg" \
    --decoder_output_stride=16 \
    --train_crop_size="513,513" \
    --train_batch_size=8 \
    --dataset="pascal_voc_seg" \
    --save_interval_secs=300 \
    --save_summaries_secs=300 \
    --save_summaries_images=True \
    --log_steps=100 \
    --train_logdir=${PATH_TO_TRAIN_DIR} \
    --dataset_dir=${PATH_TO_DATASET}

2-1-4. Visualize training status

$ tensorboard \
  --logdir ${HOME}/deeplab/models/research/deeplab/datasets/pascal_voc_seg/exp/train_on_train_set/train

 
 

2-2. MobileNetV3+DeeplabV3+Cityscaps - Post-training quantization

2-2-1. Preparation

$ cd ~
$ mkdir -p git/deeplab && cd git/deeplab
$ git clone --depth 1 https://github.com/tensorflow/models.git
$ cd models/research/deeplab/datasets
$ mkdir cityscapes && cd cityscapes

# Clone the script to generate Cityscapes Dataset.
$ git clone --depth 1 https://github.com/mcordts/cityscapesScripts.git
$ mv cityscapesScripts cityscapesScripts_ && \
  mv cityscapesScripts_/cityscapesscripts . && \
  rm -rf cityscapesScripts_

# Download Cityscapes Dataset.
# https://www.cityscapes-dataset.com/
# You will need to sign up and issue a userID and password to download the data set.
$ wget --keep-session-cookies --save-cookies=cookies.txt \
  --post-data 'username=(userid)&password=(password)&submit=Login' \
  https://www.cityscapes-dataset.com/login/
$ wget --load-cookies cookies.txt \
  --content-disposition https://www.cityscapes-dataset.com/file-handling/?packageID=1
$ wget --load-cookies cookies.txt \
  --content-disposition https://www.cityscapes-dataset.com/file-handling/?packageID=3
$ unzip gtFine_trainvaltest.zip && rm gtFine_trainvaltest.zip
$ rm README && rm license.txt
$ unzip leftImg8bit_trainvaltest.zip && rm leftImg8bit_trainvaltest.zip
$ rm README && rm license.txt

# Convert Cityscapes Dataset to TFRecords format.
$ cd ..
$ sed -i -e "s/python/python3/g" convert_cityscapes.sh
$ export PYTHONPATH=${HOME}/git/deeplab/models/research/deeplab/datasets/cityscapes:${PYTHONPATH}
$ sh convert_cityscapes.sh

# Create a checkpoint storage folder for training. If training is not required,
# there is no need to carry out.
$ cd ../..
$ mkdir -p deeplab/datasets/cityscapes/exp/train_on_train_set/train && \
  mkdir -p deeplab/datasets/cityscapes/exp/train_on_train_set/eval && \
  mkdir -p deeplab/datasets/cityscapes/exp/train_on_train_set/vis

# Download the DeepLabV3 trained model of the MobileNetV3 backbone.
$ curl -sc /tmp/cookie \
  "https://drive.google.com/uc?export=download&id=1f5ccaJmJBYwBmHvRQ77yGIUcXnqQIRY_" > /dev/null
$ CODE="$(awk '/_warning_/ {print $NF}' /tmp/cookie)"
$ curl -Lb /tmp/cookie \
  "https://drive.google.com/uc?export=download&confirm=${CODE}&id=1f5ccaJmJBYwBmHvRQ77yGIUcXnqQIRY_" \
  -o deeplab_mnv3_small_cityscapes_trainfine_2019_11_15.tar.gz
$ tar -zxvf deeplab_mnv3_small_cityscapes_trainfine_2019_11_15.tar.gz
$ rm deeplab_mnv3_small_cityscapes_trainfine_2019_11_15.tar.gz

$ curl -sc /tmp/cookie \
  "https://drive.google.com/uc?export=download&id=1QxS3G55rUQvuiBF-hztQv5zCkfPfwlVU" > /dev/null
$ CODE="$(awk '/_warning_/ {print $NF}' /tmp/cookie)"
$ curl -Lb /tmp/cookie \
  "https://drive.google.com/uc?export=download&confirm=${CODE}&id=1QxS3G55rUQvuiBF-hztQv5zCkfPfwlVU" \
  -o deeplab_mnv3_large_cityscapes_trainfine_2019_11_15.tar.gz
$ tar -zxvf deeplab_mnv3_large_cityscapes_trainfine_2019_11_15.tar.gz
$ rm deeplab_mnv3_large_cityscapes_trainfine_2019_11_15.tar.gz

$ export PATH_TO_INITIAL_CHECKPOINT=${HOME}/git/deeplab/models/research/deeplab_mnv3_small_cityscapes_trainfine/model.ckpt
$ export PATH_TO_DATASET=${HOME}/git/deeplab/models/research/deeplab/datasets/cityscapes/tfrecord
$ export PYTHONPATH=${HOME}/git/deeplab/models/research:${HOME}/git/deeplab/models/research/deeplab:${HOME}/git/deeplab/models/research/slim:${PYTHONPATH}

# Fix a bug in the data generator.
$ sed -i -e \
  "s/splits_to_sizes={'train_fine': 2975,/splits_to_sizes={'train': 2975,/g" \
  deeplab/datasets/data_generator.py

# Back up the trained model.
$ cd ${HOME}/git/deeplab/models/research
$ cp deeplab/export_model.py deeplab/export_model.py_org
$ cp deeplab_mnv3_small_cityscapes_trainfine/frozen_inference_graph.pb \
  deeplab_mnv3_small_cityscapes_trainfine/frozen_inference_graph_org.pb
$ cp deeplab_mnv3_large_cityscapes_trainfine/frozen_inference_graph.pb \
  deeplab_mnv3_large_cityscapes_trainfine/frozen_inference_graph_org.pb

# Customize "export_model.py" according to the input resolution. Must be (multiple of 8 + 1).
#   (example.1) 769 = 8 * 96 + 1
#   (example.2) 512 = 8 * 64 + 1
#   (example.3) 320 = 8 * 40 + 1
# And it is necessary to change from tf.uint8 type to tf.float32 type.
$ sed -i -e \
  "s/tf.placeholder(tf.uint8, \[1, None, None, 3\], name=_INPUT_NAME)/tf.placeholder(tf.float32, \[1, 769, 769, 3\], name=_INPUT_NAME)/g" \
  deeplab/export_model.py

2-2-2. Parameter sheet

# crop_size and image_pooling_crop_size are multiples of --decoder_output_stride + 1
# 769 = 8 * 96 + 1
# 512 = 8 * 64 + 1
# 320 = 8 * 40 + 1

# --initialize_last_layer=True initializes the final layer with the weight of  
# tf_initial_checkpoint (inherits the weight)

# Named tuple to describe the dataset properties.
# deeplab/datasets/data_generator.py
DatasetDescriptor = collections.namedtuple(
    'DatasetDescriptor',
    [
        'splits_to_sizes',  # Splits of the dataset into training, val and test.
        'num_classes',  # Number of semantic classes, including the
                        # background class (if exists). For example, there
                        # are 20 foreground classes + 1 background class in
                        # the PASCAL VOC 2012 dataset. Thus, we set
                        # num_classes=21.
        'ignore_label',  # Ignore label value.
    ])

_CITYSCAPES_INFORMATION = DatasetDescriptor(
    splits_to_sizes={'train': 2975,
                     'train_coarse': 22973,
                     'trainval_fine': 3475,
                     'trainval_coarse': 23473,
                     'val_fine': 500,
                     'test_fine': 1525},
    num_classes=19,
    ignore_label=255,
)

_PASCAL_VOC_SEG_INFORMATION = DatasetDescriptor(
    splits_to_sizes={
        'train': 1464,
        'train_aug': 10582,
        'trainval': 2913,
        'val': 1449,
    },
    num_classes=21,
    ignore_label=255,
)

_ADE20K_INFORMATION = DatasetDescriptor(
    splits_to_sizes={
        'train': 20210,  # num of samples in images/training
        'val': 2000,  # num of samples in images/validation
    },
    num_classes=151,
    ignore_label=0,
)

_DATASETS_INFORMATION = {
    'cityscapes': _CITYSCAPES_INFORMATION,
    'pascal_voc_seg': _PASCAL_VOC_SEG_INFORMATION,
    'ade20k': _ADE20K_INFORMATION,
}

# A map from network name to network function. model_variant.
# deeplab/core/feature_extractor.py
networks_map = {
    'mobilenet_v2': _mobilenet_v2,
    'mobilenet_v3_large_seg': mobilenet_v3_large_seg,
    'mobilenet_v3_small_seg': mobilenet_v3_small_seg,
    'resnet_v1_18': resnet_v1_beta.resnet_v1_18,
    'resnet_v1_18_beta': resnet_v1_beta.resnet_v1_18_beta,
    'resnet_v1_50': resnet_v1_beta.resnet_v1_50,
    'resnet_v1_50_beta': resnet_v1_beta.resnet_v1_50_beta,
    'resnet_v1_101': resnet_v1_beta.resnet_v1_101,
    'resnet_v1_101_beta': resnet_v1_beta.resnet_v1_101_beta,
    'xception_41': xception.xception_41,
    'xception_65': xception.xception_65,
    'xception_71': xception.xception_71,
    'nas_pnasnet': nas_network.pnasnet,
    'nas_hnasnet': nas_network.hnasnet,
}

2-2-3. "mobilenet_v3_small_seg" Export Model

Generate Freeze Graph (.pb) with INPUT Placeholder changed from checkpoint file (.ckpt).

$ python3 deeplab/export_model.py \
    --checkpoint_path=./deeplab_mnv3_small_cityscapes_trainfine/model.ckpt \
    --export_path=./deeplab_mnv3_small_cityscapes_trainfine/frozen_inference_graph.pb \
    --num_classes=19 \
    --crop_size=769 \
    --crop_size=769 \
    --model_variant="mobilenet_v3_small_seg" \
    --image_pooling_crop_size="769,769" \
    --image_pooling_stride=4,5 \
    --aspp_convs_filters=128 \
    --aspp_with_concat_projection=0 \
    --aspp_with_squeeze_and_excitation=1 \
    --decoder_use_sum_merge=1 \
    --decoder_filters=19 \
    --decoder_output_is_logits=1 \
    --image_se_uses_qsigmoid=1 \
    --image_pyramid=1 \
    --decoder_output_stride=8

2-2-4. "mobilenet_v3_large_seg" Export Model

Generate Freeze Graph (.pb) with INPUT Placeholder changed from checkpoint file (.ckpt).

$ python3 deeplab/export_model.py \
    --checkpoint_path=./deeplab_mnv3_large_cityscapes_trainfine/model.ckpt \
    --export_path=./deeplab_mnv3_large_cityscapes_trainfine/frozen_inference_graph.pb \
    --num_classes=19 \
    --crop_size=769 \
    --crop_size=769 \
    --model_variant="mobilenet_v3_large_seg" \
    --image_pooling_crop_size="769,769" \
    --image_pooling_stride=4,5 \
    --aspp_convs_filters=128 \
    --aspp_with_concat_projection=0 \
    --aspp_with_squeeze_and_excitation=1 \
    --decoder_use_sum_merge=1 \
    --decoder_filters=19 \
    --decoder_output_is_logits=1 \
    --image_se_uses_qsigmoid=1 \
    --image_pyramid=1 \
    --decoder_output_stride=8

If you follow the Google Colaboratory sample procedure, copy the "deeplab_mnv3_small_cityscapes_trainfine" folder and "deeplab_mnv3_large_cityscapes_trainfine" to your Google Drive "My Drive". It is not necessary if all procedures described in Google Colaboratory are performed in a PC environment. 001
002

2-2-5. Google Colaboratory - Post-training quantization - post_training_integer_quant.ipynb

  • Weight Quantization
  • Integer Quantization
  • Full Integer Quantization

https://colab.research.google.com/drive/1TtCJ-uMNTArpZxrf5DCNbZdn08DsiW8F
 
 

2-3. MobileNetV3+DeeplabV3+Cityscaps - Quantization-aware training

2-3-1. "mobilenet_v3_small_seg" Quantization-aware training

$ cd ${HOME}/git/deeplab/models/research
$ export PATH_TO_TRAINED_FLOAT_MODEL=${HOME}/git/deeplab/models/research/deeplab_mnv3_small_cityscapes_trainfine/model.ckpt
$ export PATH_TO_TRAIN_DIR=${HOME}/git/deeplab/models/research/deeplab/datasets/cityscapes/exp/train_on_train_set/train
$ export PATH_TO_DATASET=${HOME}/git/deeplab/models/research/deeplab/datasets/cityscapes/tfrecord

# deeplab_mnv3_small_cityscapes_trainfine
$ python3 deeplab/train.py \
    --logtostderr \
    --training_number_of_steps=5000 \
    --train_split="train" \
    --model_variant="mobilenet_v3_small_seg" \
    --train_crop_size="769,769" \
    --train_batch_size=8 \
    --dataset="cityscapes" \
    --initialize_last_layer=False \
    --base_learning_rate=3e-5 \
    --quantize_delay_step=0 \
    --image_pooling_crop_size="769,769" \
    --image_pooling_stride=4,5 \
    --aspp_convs_filters=128 \
    --aspp_with_concat_projection=0 \
    --aspp_with_squeeze_and_excitation=1 \
    --decoder_use_sum_merge=1 \
    --decoder_filters=19 \
    --decoder_output_is_logits=1 \
    --image_se_uses_qsigmoid=1 \
    --image_pyramid=1 \
    --decoder_output_stride=8 \
    --save_interval_secs=300 \
    --save_summaries_secs=300 \
    --save_summaries_images=True \
    --log_steps=100 \
    --tf_initial_checkpoint=${PATH_TO_TRAINED_FLOAT_MODEL} \
    --train_logdir=${PATH_TO_TRAIN_DIR} \
    --dataset_dir=${PATH_TO_DATASET}

2-3-2. "mobilenet_v3_large_seg" Quantization-aware training

$ cd ${HOME}/git/deeplab/models/research
$ export PATH_TO_TRAINED_FLOAT_MODEL=${HOME}/git/deeplab/models/research/deeplab_mnv3_large_cityscapes_trainfine/model.ckpt
$ export PATH_TO_TRAIN_DIR=${HOME}/git/deeplab/models/research/deeplab/datasets/cityscapes/exp/train_on_train_set/train
$ export PATH_TO_DATASET=${HOME}/git/deeplab/models/research/deeplab/datasets/cityscapes/tfrecord

# deeplab_mnv3_large_cityscapes_trainfine
$ python3 deeplab/train.py \
    --logtostderr \
    --training_number_of_steps=4350 \
    --train_split="train" \
    --model_variant="mobilenet_v3_large_seg" \
    --train_crop_size="769,769" \
    --train_batch_size=8 \
    --dataset="cityscapes" \
    --initialize_last_layer=False \
    --base_learning_rate=3e-5 \
    --quantize_delay_step=0 \
    --image_pooling_crop_size="769,769" \
    --image_pooling_stride=4,5 \
    --aspp_convs_filters=128 \
    --aspp_with_concat_projection=0 \
    --aspp_with_squeeze_and_excitation=1 \
    --decoder_use_sum_merge=1 \
    --decoder_filters=19 \
    --decoder_output_is_logits=1 \
    --image_se_uses_qsigmoid=1 \
    --image_pyramid=1 \
    --decoder_output_stride=8 \
    --save_interval_secs=300 \
    --save_summaries_secs=300 \
    --save_summaries_images=True \
    --log_steps=100 \
    --tf_initial_checkpoint=${PATH_TO_TRAINED_FLOAT_MODEL} \
    --train_logdir=${PATH_TO_TRAIN_DIR} \
    --dataset_dir=${PATH_TO_DATASET}

The orange line is "deeplab_mnv3_small_cityscapes_trainfine" loss.
The blue line is "deeplab_mnv3_large_cityscapes_trainfine" loss.
003
 
 

2-4. MobileNetV2+DeeplabV3+coco/voc - Post-training quantization

2-4-1. Preparation

$ cd ${HOME}/git/deeplab/models/research

$ wget http://download.tensorflow.org/models/deeplabv3_mnv2_dm05_pascal_trainaug_2018_10_01.tar.gz
$ tar -zxvf deeplabv3_mnv2_dm05_pascal_trainaug_2018_10_01.tar.gz
$ rm deeplabv3_mnv2_dm05_pascal_trainaug_2018_10_01.tar.gz

$ wget http://download.tensorflow.org/models/deeplabv3_mnv2_dm05_pascal_trainval_2018_10_01.tar.gz
$ tar -zxvf deeplabv3_mnv2_dm05_pascal_trainval_2018_10_01.tar.gz
$ rm deeplabv3_mnv2_dm05_pascal_trainval_2018_10_01.tar.gz

$ wget http://download.tensorflow.org/models/deeplabv3_mnv2_pascal_train_aug_2018_01_29.tar.gz
$ tar -zxvf deeplabv3_mnv2_pascal_train_aug_2018_01_29.tar.gz
$ rm deeplabv3_mnv2_pascal_train_aug_2018_01_29.tar.gz

$ sed -i -e \
  "s/tf.placeholder(tf.uint8, \[1, None, None, 3\], name=_INPUT_NAME)/tf.placeholder(tf.float32, \[1, 257, 257, 3\], name=_INPUT_NAME)/g" \
  deeplab/export_model.py

$ export PYTHONPATH=${HOME}/git/deeplab/models/research:${HOME}/git/deeplab/models/research/deeplab:${HOME}/git/deeplab/models/research/slim:${PYTHONPATH}

$ python3 deeplab/export_model.py \
  --checkpoint_path=./deeplabv3_mnv2_dm05_pascal_trainaug/model.ckpt \
  --export_path=./deeplabv3_mnv2_dm05_pascal_trainaug/frozen_inference_graph.pb \
  --model_variant="mobilenet_v2" \
  --crop_size=257 \
  --crop_size=257 \
  --depth_multiplier=0.5

$ python3 deeplab/export_model.py \
  --checkpoint_path=./deeplabv3_mnv2_dm05_pascal_trainval/model.ckpt \
  --export_path=./deeplabv3_mnv2_dm05_pascal_trainval/frozen_inference_graph.pb \
  --model_variant="mobilenet_v2" \
  --crop_size=257 \
  --crop_size=257 \
  --depth_multiplier=0.5

$ python3 deeplab/export_model.py \
  --checkpoint_path=./deeplabv3_mnv2_pascal_train_aug/model.ckpt-30000 \
  --export_path=./deeplabv3_mnv2_pascal_train_aug/frozen_inference_graph.pb \
  --model_variant="mobilenet_v2" \
  --crop_size=257 \
  --crop_size=257

2-5. MobileNetV3-SSD+coco - Post-training quantization

2-5-1. Preparation

$ cd ~
$ sudo pip3 install tensorflow-gpu==1.15.0
$ git clone --depth 1 https://github.com/tensorflow/models.git
$ cd models/research

$ git clone https://github.com/cocodataset/cocoapi.git
$ cd cocoapi/PythonAPI
$ make
$ cp -r pycocotools ../..
$ cd ../..
$ wget -O protobuf.zip https://github.com/google/protobuf/releases/download/v3.0.0/protoc-3.0.0-linux-x86_64.zip
$ unzip protobuf.zip
$ ./bin/protoc object_detection/protos/*.proto --python_out=.

$ sudo apt-get install -y protobuf-compiler python3-pil python3-lxml python3-tk
$ sudo -H pip3 install Cython contextlib2 jupyter matplotlib

$ export PYTHONPATH=${PWD}:${PWD}/object_detection:${PWD}/slim:${PYTHONPATH}

$ mkdir -p ssd_mobilenet_v3_small_coco_2019_08_14 && cd ssd_mobilenet_v3_small_coco_2019_08_14
$ curl -sc /tmp/cookie "https://drive.google.com/uc?export=download&id=1uqaC0Y-yRtzkpu1EuZ3BzOyh9-i_3Qgi" > /dev/null
$ CODE="$(awk '/_warning_/ {print $NF}' /tmp/cookie)"
$ curl -Lb /tmp/cookie "https://drive.google.com/uc?export=download&confirm=${CODE}&id=1uqaC0Y-yRtzkpu1EuZ3BzOyh9-i_3Qgi" -o ssd_mobilenet_v3_small_coco_2019_08_14.tar.gz
$ tar -zxvf ssd_mobilenet_v3_small_coco_2019_08_14.tar.gz
$ rm ssd_mobilenet_v3_small_coco_2019_08_14.tar.gz
$ cd ..

$ mkdir -p ssd_mobilenet_v3_large_coco_2019_08_14 && cd ssd_mobilenet_v3_large_coco_2019_08_14
$ curl -sc /tmp/cookie "https://drive.google.com/uc?export=download&id=1NGLjKRWDQZ_kibQHlLZ7Eetuuz1waC7X" > /dev/null
$ CODE="$(awk '/_warning_/ {print $NF}' /tmp/cookie)"
$ curl -Lb /tmp/cookie "https://drive.google.com/uc?export=download&confirm=${CODE}&id=1NGLjKRWDQZ_kibQHlLZ7Eetuuz1waC7X" -o ssd_mobilenet_v3_large_coco_2019_08_14.tar.gz
$ tar -zxvf ssd_mobilenet_v3_large_coco_2019_08_14.tar.gz
$ rm ssd_mobilenet_v3_large_coco_2019_08_14.tar.gz
$ cd ..

2-5-2. Create a conversion script from checkpoint format to saved_model format

import tensorflow as tf
import os
import shutil
from tensorflow.python.saved_model import tag_constants
from tensorflow.python.tools import freeze_graph
from tensorflow.python import ops
from tensorflow.tools.graph_transforms import TransformGraph

def freeze_model(saved_model_dir, output_node_names, output_filename):
  output_graph_filename = os.path.join(saved_model_dir, output_filename)
  initializer_nodes = ''
  freeze_graph.freeze_graph(
      input_saved_model_dir=saved_model_dir,
      output_graph=output_graph_filename,
      saved_model_tags = tag_constants.SERVING,
      output_node_names=output_node_names,
      initializer_nodes=initializer_nodes,
      input_graph=None,
      input_saver=False,
      input_binary=False,
      input_checkpoint=None,
      restore_op_name=None,
      filename_tensor_name=None,
      clear_devices=True,
      input_meta_graph=False,
  )

def get_graph_def_from_file(graph_filepath):
  tf.reset_default_graph()
  with ops.Graph().as_default():
    with tf.gfile.GFile(graph_filepath, 'rb') as f:
      graph_def = tf.GraphDef()
      graph_def.ParseFromString(f.read())
      return graph_def

def optimize_graph(model_dir, graph_filename, transforms, input_name, output_names, outname='optimized_model.pb'):
  input_names = [input_name] # change this as per how you have saved the model
  graph_def = get_graph_def_from_file(os.path.join(model_dir, graph_filename))
  optimized_graph_def = TransformGraph(
      graph_def,
      input_names,  
      output_names,
      transforms)
  tf.train.write_graph(optimized_graph_def,
                      logdir=model_dir,
                      as_text=False,
                      name=outname)
  print('Graph optimized!')

def convert_graph_def_to_saved_model(export_dir, graph_filepath, input_name, outputs):
  graph_def = get_graph_def_from_file(graph_filepath)
  with tf.Session(graph=tf.Graph()) as session:
    tf.import_graph_def(graph_def, name='')
    tf.compat.v1.saved_model.simple_save(
        session,
        export_dir,# change input_image to node.name if you know the name
        inputs={input_name: session.graph.get_tensor_by_name('{}:0'.format(node.name))
            for node in graph_def.node if node.op=='Placeholder'},
        outputs={t.rstrip(":0"):session.graph.get_tensor_by_name(t) for t in outputs}
    )
    print('Optimized graph converted to SavedModel!')

tf.compat.v1.enable_eager_execution()

# Look up the name of the placeholder for the input node
graph_def=get_graph_def_from_file('./ssd_mobilenet_v3_small_coco_2019_08_14/frozen_inference_graph.pb')
input_name_small=""
for node in graph_def.node:
    if node.op=='Placeholder':
        print("##### ssd_mobilenet_v3_small_coco_2019_08_14 - Input Node Name #####", node.name) # this will be the input node
        input_name_small=node.name

# Look up the name of the placeholder for the input node
graph_def=get_graph_def_from_file('./ssd_mobilenet_v3_large_coco_2019_08_14/frozen_inference_graph.pb')
input_name_large=""
for node in graph_def.node:
    if node.op=='Placeholder':
        print("##### ssd_mobilenet_v3_large_coco_2019_08_14 - Input Node Name #####", node.name) # this will be the input node
        input_name_large=node.name

# ssd_mobilenet_v3 output names
output_node_names = ['raw_outputs/class_predictions','raw_outputs/box_encodings']
outputs = ['raw_outputs/class_predictions:0','raw_outputs/box_encodings:0']

# Optimizing the graph via TensorFlow library
transforms = []
optimize_graph('./ssd_mobilenet_v3_small_coco_2019_08_14', 'frozen_inference_graph.pb', transforms, input_name_small, output_node_names, outname='optimized_model_small.pb')
optimize_graph('./ssd_mobilenet_v3_large_coco_2019_08_14', 'frozen_inference_graph.pb', transforms, input_name_large, output_node_names, outname='optimized_model_large.pb')

# convert this to a s TF Serving compatible mode - ssd_mobilenet_v3_small_coco_2019_08_14
shutil.rmtree('./ssd_mobilenet_v3_small_coco_2019_08_14/0', ignore_errors=True)
convert_graph_def_to_saved_model('./ssd_mobilenet_v3_small_coco_2019_08_14/0',
                                 './ssd_mobilenet_v3_small_coco_2019_08_14/optimized_model_small.pb', input_name_small, outputs)

# convert this to a s TF Serving compatible mode - ssd_mobilenet_v3_large_coco_2019_08_14
shutil.rmtree('./ssd_mobilenet_v3_large_coco_2019_08_14/0', ignore_errors=True)
convert_graph_def_to_saved_model('./ssd_mobilenet_v3_large_coco_2019_08_14/0',
                                 './ssd_mobilenet_v3_large_coco_2019_08_14/optimized_model_large.pb', input_name_large, outputs)

2-5-3. Confirm the structure of saved_model 【ssd_mobilenet_v3_small_coco_2019_08_14】

$ saved_model_cli show --dir ./ssd_mobilenet_v3_small_coco_2019_08_14/0 --all

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['normalized_input_image_tensor'] tensor_info:
        dtype: DT_FLOAT
        shape: (1, 320, 320, 3)
        name: normalized_input_image_tensor:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['raw_outputs/box_encodings'] tensor_info:
        dtype: DT_FLOAT
        shape: (1, 2034, 4)
        name: raw_outputs/box_encodings:0
    outputs['raw_outputs/class_predictions'] tensor_info:
        dtype: DT_FLOAT
        shape: (1, 2034, 91)
        name: raw_outputs/class_predictions:0
  Method name is: tensorflow/serving/predict

2-5-4. Confirm the structure of saved_model 【ssd_mobilenet_v3_large_coco_2019_08_14】

$ saved_model_cli show --dir ./ssd_mobilenet_v3_large_coco_2019_08_14/0 --all

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['normalized_input_image_tensor'] tensor_info:
        dtype: DT_FLOAT
        shape: (1, 320, 320, 3)
        name: normalized_input_image_tensor:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['raw_outputs/box_encodings'] tensor_info:
        dtype: DT_FLOAT
        shape: (1, 2034, 4)
        name: raw_outputs/box_encodings:0
    outputs['raw_outputs/class_predictions'] tensor_info:
        dtype: DT_FLOAT
        shape: (1, 2034, 91)
        name: raw_outputs/class_predictions:0
  Method name is: tensorflow/serving/predict

2-5-5. Creating the destination path for the calibration test dataset 6GB

$ curl -sc /tmp/cookie "https://drive.google.com/uc?export=download&id=1Uk9F4Tc-9UgnvARIVkloSoePUynyST6E" > /dev/null
$ CODE="$(awk '/_warning_/ {print $NF}' /tmp/cookie)"
$ curl -Lb /tmp/cookie "https://drive.google.com/uc?export=download&confirm=${CODE}&id=1Uk9F4Tc-9UgnvARIVkloSoePUynyST6E" -o TFDS.tar.gz
$ tar -zxvf TFDS.tar.gz
$ rm TFDS.tar.gz

2-5-6. Quantization

2-5-6-1. ssd_mobilenet_v3_small_coco_2019_08_14
import tensorflow as tf
import tensorflow_datasets as tfds
import numpy as np

def representative_dataset_gen():
  for data in raw_test_data.take(100):
    image = data['image'].numpy()
    image = tf.image.resize(image, (320, 320))
    image = image[np.newaxis,:,:,:]
    yield [image]

tf.compat.v1.enable_eager_execution()

# Generating a calibration data set
#raw_test_data, info = tfds.load(name="coco/2017", with_info=True, split="test", data_dir="./TFDS")
raw_test_data, info = tfds.load(name="coco/2017", with_info=True, split="test", data_dir="./TFDS", download=False)
print(info)

# Weight Quantization - Input/Output=float32
converter = tf.lite.TFLiteConverter.from_saved_model('./ssd_mobilenet_v3_small_coco_2019_08_14/0')
converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]
tflite_quant_model = converter.convert()
with open('./ssd_mobilenet_v3_small_coco_2019_08_14/mobilenet_v3_small_weight_quant.tflite', 'wb') as w:
    w.write(tflite_quant_model)
print("Weight Quantization complete! - mobilenet_v3_small_weight_quant.tflite")

# Integer Quantization - Input/Output=float32
converter = tf.lite.TFLiteConverter.from_saved_model('./ssd_mobilenet_v3_small_coco_2019_08_14/0')
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset_gen
tflite_quant_model = converter.convert()
with open('./ssd_mobilenet_v3_small_coco_2019_08_14/mobilenet_v3_small_integer_quant.tflite', 'wb') as w:
    w.write(tflite_quant_model)
print("Integer Quantization complete! - mobilenet_v3_small_integer_quant.tflite")

# Full Integer Quantization - Input/Output=int8
converter = tf.lite.TFLiteConverter.from_saved_model('./ssd_mobilenet_v3_small_coco_2019_08_14/0')
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
tflite_quant_model = converter.convert()
with open('./ssd_mobilenet_v3_small_coco_2019_08_14/mobilenet_v3_small_full_integer_quant.tflite', 'wb') as w:
    w.write(tflite_quant_model)
print("Full Integer Quantization complete! - mobilenet_v3_small_full_integer_quant.tflite")
2-5-6-2. ssd_mobilenet_v3_large_coco_2019_08_14
import tensorflow as tf
import tensorflow_datasets as tfds
import numpy as np

def representative_dataset_gen():
  for data in raw_test_data.take(100):
    image = data['image'].numpy()
    image = tf.image.resize(image, (320, 320))
    image = image[np.newaxis,:,:,:]
    yield [image]

tf.compat.v1.enable_eager_execution()

# Generating a calibration data set
#raw_test_data, info = tfds.load(name="coco/2017", with_info=True, split="test", data_dir="./TFDS")
raw_test_data, info = tfds.load(name="coco/2017", with_info=True, split="test", data_dir="./TFDS", download=False)

# Weight Quantization - Input/Output=float32
converter = tf.lite.TFLiteConverter.from_saved_model('./ssd_mobilenet_v3_large_coco_2019_08_14/0')
converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]
tflite_quant_model = converter.convert()
with open('./ssd_mobilenet_v3_large_coco_2019_08_14/mobilenet_v3_large_weight_quant.tflite', 'wb') as w:
    w.write(tflite_quant_model)
print("Weight Quantization complete! - mobilenet_v3_large_weight_quant.tflite")

# Integer Quantization - Input/Output=float32
converter = tf.lite.TFLiteConverter.from_saved_model('./ssd_mobilenet_v3_large_coco_2019_08_14/0')
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset_gen
tflite_quant_model = converter.convert()
with open('./ssd_mobilenet_v3_large_coco_2019_08_14/mobilenet_v3_large_integer_quant.tflite', 'wb') as w:
    w.write(tflite_quant_model)
print("Integer Quantization complete! - mobilenet_v3_large_integer_quant.tflite")

# Full Integer Quantization - Input/Output=int8
converter = tf.lite.TFLiteConverter.from_saved_model('./ssd_mobilenet_v3_large_coco_2019_08_14/0')
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
tflite_quant_model = converter.convert()
with open('./ssd_mobilenet_v3_large_coco_2019_08_14/mobilenet_v3_large_full_integer_quant.tflite', 'wb') as w:
    w.write(tflite_quant_model)
print("Full Integer Quantization complete! - mobilenet_v3_large_full_integer_quant.tflite")

2-6. MobileNetV2-SSDLite+VOC - Training -> Integer Quantization

2-6-1. Training

Learning with the MobileNetV2-SSDLite Pascal-VOC dataset [Remake of Docker version]

2-6-2. Export model (--add_postprocessing_op=True)

06_mobilenetv2-ssdlite/02_voc/01_float32/00_export_tflite_model.txt

2-6-3. Integer Quantization

06_mobilenetv2-ssdlite/02_voc/01_float32/03_integer_quantization_with_postprocess.py

3. TFLite Model Benchmark

$ sudo apt-get install python-future

## Bazel for Ubuntu18.04 x86_64 install
$ wget https://github.com/bazelbuild/bazel/releases/download/2.0.0/bazel-2.0.0-installer-linux-x86_64.sh
$ sudo chmod +x bazel-2.0.0-installer-linux-x86_64.sh
$ ./bazel-2.0.0-installer-linux-x86_64.sh
$ sudo apt-get install -y openjdk-8-jdk

## Bazel for RaspberryPi3/4 Raspbian/Debian Buster armhf install
$ wget https://github.com/PINTO0309/Bazel_bin/raw/master/2.0.0/Raspbian_Debian_Buster_armhf/openjdk-8-jdk/install.sh
$ ./install.sh
$ curl -sc /tmp/cookie \
  "https://drive.google.com/uc?export=download&id=1LQUSal55R6fmawZS9zZuk6-5ZFOdUqRK" > /dev/null
$ CODE="$(awk '/_warning_/ {print $NF}' /tmp/cookie)"
$ curl -Lb /tmp/cookie \
  "https://drive.google.com/uc?export=download&confirm=${CODE}&id=1LQUSal55R6fmawZS9zZuk6-5ZFOdUqRK" \
  -o adoptopenjdk-8-hotspot_8u222-b10-2_armhf.deb
$ sudo apt-get install -y ./adoptopenjdk-8-hotspot_8u222-b10-2_armhf.deb

## Bazel for RaspberryPi3/4 Raspbian/Debian Buster aarch64 install
$ wget https://github.com/PINTO0309/Bazel_bin/raw/master/2.0.0/Raspbian_Debian_Buster_aarch64/openjdk-8-jdk/install.sh
$ ./install.sh
$ curl -sc /tmp/cookie \
  "https://drive.google.com/uc?export=download&id=1VwLxzT3EOTbhSzwvRF2H4ChTQyTQBt3x" > /dev/null
$ CODE="$(awk '/_warning_/ {print $NF}' /tmp/cookie)"
$ curl -Lb /tmp/cookie \
  "https://drive.google.com/uc?export=download&confirm=${CODE}&id=1VwLxzT3EOTbhSzwvRF2H4ChTQyTQBt3x" \
  -o adoptopenjdk-8-hotspot_8u222-b10-2_arm64.deb
$ sudo apt-get install -y ./adoptopenjdk-8-hotspot_8u222-b10-2_arm64.deb

## Clone Tensorflow v2.1.0+
$ git clone --depth 1 https://github.com/tensorflow/tensorflow.git
$ cd tensorflow

## Build and run TFLite Model Benchmark Tool
$ bazel run -c opt tensorflow/lite/tools/benchmark:benchmark_model -- \
  --graph=${HOME}/Downloads/deeplabv3_257_mv_gpu.tflite \
  --num_threads=4 \
  --warmup_runs=1 \
  --enable_op_profiling=true

$ bazel run -c opt tensorflow/lite/tools/benchmark:benchmark_model -- \
  --graph=${HOME}/Downloads/deeplabv3_257_mv_gpu.tflite \
  --num_threads=4 \
  --warmup_runs=1 \
  --use_xnnpack=true \
  --enable_op_profiling=true

$ bazel run -c opt tensorflow/lite/tools/benchmark:benchmark_model_plus_flex -- \
  --graph=${HOME}/git/tf-monodepth2/deeplabv3_257_mv_gpu.tflite \
  --num_threads=4 \
  --warmup_runs=1 \
  --enable_op_profiling=true

$ bazel run -c opt tensorflow/lite/tools/benchmark:benchmark_model_plus_flex -- \
  --graph=${HOME}/git/tf-monodepth2/deeplabv3_257_mv_gpu.tflite \
  --num_threads=4 \
  --warmup_runs=1 \
  --use_xnnpack=true \
  --enable_op_profiling=true
x86_64 deeplab_mnv3_small_weight_quant_769.tflite Benchmark
Number of nodes executed: 171
============================== Summary by node type ==============================
	             [Node type]	  [count]	  [avg ms]	    [avg %]	    [cdf %]	  [mem KB]	[times called]
	                 CONV_2D	       45	  1251.486	    67.589%	    67.589%	     0.000	        0
	       DEPTHWISE_CONV_2D	       11	   438.764	    23.696%	    91.286%	     0.000	        0
	              HARD_SWISH	       16	    54.855	     2.963%	    94.248%	     0.000	        0
	                 ARG_MAX	        1	    24.850	     1.342%	    95.591%	     0.000	        0
	         RESIZE_BILINEAR	        5	    23.805	     1.286%	    96.876%	     0.000	        0
	                     MUL	       30	    14.914	     0.805%	    97.682%	     0.000	        0
	                     ADD	       18	    10.646	     0.575%	    98.257%	     0.000	        0
	       SPACE_TO_BATCH_ND	        7	     9.567	     0.517%	    98.773%	     0.000	        0
	       BATCH_TO_SPACE_ND	        7	     7.431	     0.401%	    99.175%	     0.000	        0
	                     SUB	        2	     6.131	     0.331%	    99.506%	     0.000	        0
	         AVERAGE_POOL_2D	       10	     5.435	     0.294%	    99.799%	     0.000	        0
	                 RESHAPE	        6	     2.171	     0.117%	    99.916%	     0.000	        0
	                     PAD	        1	     0.660	     0.036%	    99.952%	     0.000	        0
	                    CAST	        2	     0.601	     0.032%	    99.985%	     0.000	        0
	           STRIDED_SLICE	        1	     0.277	     0.015%	   100.000%	     0.000	        0
	        Misc Runtime Ops	        1	     0.008	     0.000%	   100.000%	    33.552	        0
	              DEQUANTIZE	        8	     0.000	     0.000%	   100.000%	     0.000	        0

Timings (microseconds): count=52 first=224 curr=1869070 min=224 max=2089397 avg=1.85169e+06 std=373988
Memory (bytes): count=0
171 nodes observed
x86_64 deeplab_mnv3_large_weight_quant_769.tflite Benchmark
Number of nodes executed: 194
============================== Summary by node type ==============================
	             [Node type]	  [count]	  [avg ms]	    [avg %]	    [cdf %]	  [mem KB]	[times called]
	                 CONV_2D	       51	  4123.348	    82.616%	    82.616%	     0.000	        0
	       DEPTHWISE_CONV_2D	       15	   628.139	    12.586%	    95.202%	     0.000	        0
	              HARD_SWISH	       15	    90.448	     1.812%	    97.014%	     0.000	        0
	                     MUL	       32	    29.393	     0.589%	    97.603%	     0.000	        0
	                 ARG_MAX	        1	    22.866	     0.458%	    98.061%	     0.000	        0
	                     ADD	       25	    22.860	     0.458%	    98.519%	     0.000	        0
	         RESIZE_BILINEAR	        5	    22.494	     0.451%	    98.970%	     0.000	        0
	       SPACE_TO_BATCH_ND	        8	    18.518	     0.371%	    99.341%	     0.000	        0
	       BATCH_TO_SPACE_ND	        8	    15.522	     0.311%	    99.652%	     0.000	        0
	         AVERAGE_POOL_2D	        9	     7.855	     0.157%	    99.809%	     0.000	        0
	                     SUB	        2	     5.896	     0.118%	    99.928%	     0.000	        0
	                 RESHAPE	        6	     2.133	     0.043%	    99.970%	     0.000	        0
	                     PAD	        1	     0.631	     0.013%	    99.983%	     0.000	        0
	                    CAST	        2	     0.575	     0.012%	    99.994%	     0.000	        0
	           STRIDED_SLICE	        1	     0.260	     0.005%	   100.000%	     0.000	        0
	        Misc Runtime Ops	        1	     0.012	     0.000%	   100.000%	    38.304	        0
	              DEQUANTIZE	       12	     0.003	     0.000%	   100.000%	     0.000	        0

Timings (microseconds): count=31 first=193 curr=5276579 min=193 max=5454605 avg=4.99104e+06 std=1311782
Memory (bytes): count=0
194 nodes observed
Ubuntu 19.10 aarch64 + RaspberryPi4 mobilenet_v3_small_full_integer_quant.tflite Benchmark
Number of nodes executed: 176
============================== Summary by node type ==============================
	             [Node type]	  [count]	  [avg ms]	    [avg %]	    [cdf %]	  [mem KB]	[times called]
	                 CONV_2D	       61	    10.255	    36.582%	    36.582%	     0.000	       61
	       DEPTHWISE_CONV_2D	       27	     5.058	    18.043%	    54.625%	     0.000	       27
	                     MUL	       26	     5.056	    18.036%	    72.661%	     0.000	       26
	                     ADD	       14	     4.424	    15.781%	    88.442%	     0.000	       14
	                QUANTIZE	       13	     1.633	     5.825%	    94.267%	     0.000	       13
	              HARD_SWISH	       10	     0.918	     3.275%	    97.542%	     0.000	       10
	                LOGISTIC	        1	     0.376	     1.341%	    98.883%	     0.000	        1
	         AVERAGE_POOL_2D	        9	     0.199	     0.710%	    99.593%	     0.000	        9
	           CONCATENATION	        2	     0.084	     0.300%	    99.893%	     0.000	        2
	                 RESHAPE	       13	     0.030	     0.107%	   100.000%	     0.000	       13

Timings (microseconds): count=50 first=28827 curr=28176 min=27916 max=28827 avg=28121.2 std=165
Memory (bytes): count=0
176 nodes observed
Ubuntu 19.10 aarch64 + RaspberryPi4 mobilenet_v3_small_weight_quant.tflite Benchmark
Number of nodes executed: 186
============================== Summary by node type ==============================
	             [Node type]	  [count]	  [avg ms]	    [avg %]	    [cdf %]	  [mem KB]	[times called]
	                 CONV_2D	       61	    82.600	    79.265%	    79.265%	     0.000	       61
	       DEPTHWISE_CONV_2D	       27	     8.198	     7.867%	    87.132%	     0.000	       27
	                     MUL	       26	     4.866	     4.670%	    91.802%	     0.000	       26
	                     ADD	       14	     4.863	     4.667%	    96.469%	     0.000	       14
	                LOGISTIC	        1	     1.645	     1.579%	    98.047%	     0.000	        1
	         AVERAGE_POOL_2D	        9	     0.761	     0.730%	    98.777%	     0.000	        9
	              HARD_SWISH	       10	     0.683	     0.655%	    99.433%	     0.000	       10
	           CONCATENATION	        2	     0.415	     0.398%	    99.831%	     0.000	        2
	                 RESHAPE	       13	     0.171	     0.164%	    99.995%	     0.000	       13
	              DEQUANTIZE	       23	     0.005	     0.005%	   100.000%	     0.000	       23

Timings (microseconds): count=50 first=103867 curr=103937 min=103708 max=118926 avg=104299 std=2254
Memory (bytes): count=0
186 nodes observed
Ubuntu 19.10 aarch64 + RaspberryPi4 Posenet model-mobilenet_v1_101_257_integer_quant.tflite Benchmark
Number of nodes executed: 38
============================== Summary by node type ==============================
	             [Node type]	  [count]	  [avg ms]	    [avg %]	    [cdf %]	  [mem KB]	[times called]
	                 CONV_2D	       18	    31.906	    83.360%	    83.360%	     0.000	        0
	       DEPTHWISE_CONV_2D	       13	     5.959	    15.569%	    98.929%	     0.000	        0
	                QUANTIZE	        1	     0.223	     0.583%	    99.511%	     0.000	        0
	        Misc Runtime Ops	        1	     0.148	     0.387%	    99.898%	    96.368	        0
	              DEQUANTIZE	        4	     0.030	     0.078%	    99.976%	     0.000	        0
	                LOGISTIC	        1	     0.009	     0.024%	   100.000%	     0.000	        0

Timings (microseconds): count=70 first=519 curr=53370 min=519 max=53909 avg=38296 std=23892
Memory (bytes): count=0
38 nodes observed
Ubuntu 19.10 aarch64 + RaspberryPi4 MobileNetV2-SSDLite ssdlite_mobilenet_v2_coco_300_integer_quant.tflite Benchmark
Number of nodes executed: 128
============================== Summary by node type ==============================
	             [Node type]	  [count]	  [avg ms]	    [avg %]	    [cdf %]	  [mem KB]	[times called]
	                 CONV_2D	       55	    27.253	    71.185%	    71.185%	     0.000	        0
	       DEPTHWISE_CONV_2D	       33	     8.024	    20.959%	    92.143%	     0.000	        0
	                     ADD	       10	     1.565	     4.088%	    96.231%	     0.000	        0
	                QUANTIZE	       11	     0.546	     1.426%	    97.657%	     0.000	        0
	        Misc Runtime Ops	        1	     0.368	     0.961%	    98.618%	   250.288	        0
	                LOGISTIC	        1	     0.253	     0.661%	    99.279%	     0.000	        0
	              DEQUANTIZE	        2	     0.168	     0.439%	    99.718%	     0.000	        0
	           CONCATENATION	        2	     0.077	     0.201%	    99.919%	     0.000	        0
	                 RESHAPE	       13	     0.031	     0.081%	   100.000%	     0.000	        0

Timings (microseconds): count=70 first=1289 curr=53049 min=1289 max=53590 avg=38345.2 std=23436
Memory (bytes): count=0
128 nodes observed
Ubuntu 19.10 aarch64 + RaspberryPi4 ml-sound-classifier mobilenetv2_fsd2018_41cls_weight_quant.tflite Benchmark
Number of nodes executed: 111
============================== Summary by node type ==============================
	             [Node type]	  [count]	  [avg ms]	    [avg %]	    [cdf %]	  [mem KB]	[times called]
	                 MINIMUM	       35	    10.020	    45.282%	    45.282%	     0.000	       35
	                 CONV_2D	       34	     8.376	    37.852%	    83.134%	     0.000	       34
	       DEPTHWISE_CONV_2D	       18	     1.685	     7.615%	    90.749%	     0.000	       18
	                    MEAN	        1	     1.422	     6.426%	    97.176%	     0.000	        1
	         FULLY_CONNECTED	        2	     0.589	     2.662%	    99.837%	     0.000	        2
	                     ADD	       10	     0.031	     0.140%	    99.977%	     0.000	       10
	                 SOFTMAX	        1	     0.005	     0.023%	   100.000%	     0.000	        1
	              DEQUANTIZE	       10	     0.000	     0.000%	   100.000%	     0.000	       10

Timings (microseconds): count=50 first=22417 curr=22188 min=22041 max=22417 avg=22182 std=70
Memory (bytes): count=0
111 nodes observed
Ubuntu 19.10 aarch64 + RaspberryPi4 ml-sound-classifier mobilenetv2_fsd2018_41cls_integer_quant.tflite Benchmark
Number of nodes executed: 173
============================== Summary by node type ==============================
	             [Node type]	  [count]	  [avg ms]	    [avg %]	    [cdf %]	  [mem KB]	[times called]
	                QUANTIZE	       70	     1.117	    23.281%	    23.281%	     0.000	        0
	                 MINIMUM	       35	     1.104	    23.010%	    46.290%	     0.000	        0
	                 CONV_2D	       34	     0.866	    18.049%	    64.339%	     0.000	        0
	                    MEAN	        1	     0.662	    13.797%	    78.137%	     0.000	        0
	       DEPTHWISE_CONV_2D	       18	     0.476	     9.921%	    88.058%	     0.000	        0
	         FULLY_CONNECTED	        2	     0.251	     5.231%	    93.289%	     0.000	        0
	        Misc Runtime Ops	        1	     0.250	     5.211%	    98.499%	    71.600	        0
	                     ADD	       10	     0.071	     1.480%	    99.979%	     0.000	        0
	                 SOFTMAX	        1	     0.001	     0.021%	   100.000%	     0.000	        0
	              DEQUANTIZE	        1	     0.000	     0.000%	   100.000%	     0.000	        0

Timings (microseconds): count=198 first=477 curr=9759 min=477 max=10847 avg=4876.6 std=4629
Memory (bytes): count=0
173 nodes observed
Raspbian Buster aarch64 + RaspberryPi4 deeplabv3_mnv2_pascal_trainval_257_integer_quant.tflite Benchmark
Number of nodes executed: 82
============================== Summary by node type ==============================
	             [Node type]	  [count]	  [avg ms]	    [avg %]	    [cdf %]	  [mem KB]	[times called]
	                 CONV_2D	       38	   103.576	    56.077%	    56.077%	     0.000	       38
	       DEPTHWISE_CONV_2D	       17	    33.151	    17.948%	    74.026%	     0.000	       17
	         RESIZE_BILINEAR	        3	    15.143	     8.199%	    82.224%	     0.000	        3
	                     SUB	        2	    10.908	     5.906%	    88.130%	     0.000	        2
	                     ADD	       11	     9.821	     5.317%	    93.447%	     0.000	       11
	                 ARG_MAX	        1	     8.824	     4.777%	    98.225%	     0.000	        1
	                     PAD	        1	     1.024	     0.554%	    98.779%	     0.000	        1
	                QUANTIZE	        2	     0.941	     0.509%	    99.289%	     0.000	        2
	                     MUL	        1	     0.542	     0.293%	    99.582%	     0.000	        1
	           CONCATENATION	        1	     0.365	     0.198%	    99.780%	     0.000	        1
	         AVERAGE_POOL_2D	        1	     0.150	     0.081%	    99.861%	     0.000	        1
	                 RESHAPE	        2	     0.129	     0.070%	    99.931%	     0.000	        2
	             EXPAND_DIMS	        2	     0.128	     0.069%	   100.000%	     0.000	        2

Timings (microseconds): count=50 first=201226 curr=176476 min=176476 max=201226 avg=184741 std=4791
Memory (bytes): count=0
82 nodes observed
Ubuntu 18.04 x86_64 + XNNPACK enabled + 10 Threads deeplabv3_257_mv_gpu.tflite Benchmark
Number of nodes executed: 8
============================== Summary by node type ==============================
	             [Node type]	  [count]	  [avg ms]	    [avg %]	    [cdf %]	  [mem KB]	[times called]
	                DELEGATE	        3	     6.716	    61.328%	    61.328%	     0.000	        3
	         RESIZE_BILINEAR	        3	     3.965	    36.207%	    97.534%	     0.000	        3
	           CONCATENATION	        1	     0.184	     1.680%	    99.215%	     0.000	        1
	         AVERAGE_POOL_2D	        1	     0.086	     0.785%	   100.000%	     0.000	        1

Timings (microseconds): count=91 first=11051 curr=10745 min=10521 max=12552 avg=10955.4 std=352
Memory (bytes): count=0
8 nodes observed

Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Peak memory footprint (MB): init=3.58203 overall=56.0703
Ubuntu 18.04 x86_64 + XNNPACK disabled + 10 Threads deeplabv3_257_mv_gpu.tflite Benchmark
Number of nodes executed: 70
============================== Summary by node type ==============================
	             [Node type]	  [count]	  [avg ms]	    [avg %]	    [cdf %]	  [mem KB]	[times called]
	       DEPTHWISE_CONV_2D	       17	    41.704	    68.372%	    68.372%	     0.000	       17
	                 CONV_2D	       38	    15.932	    26.120%	    94.491%	     0.000	       38
	         RESIZE_BILINEAR	        3	     3.060	     5.017%	    99.508%	     0.000	        3
	                     ADD	       10	     0.149	     0.244%	    99.752%	     0.000	       10
	           CONCATENATION	        1	     0.109	     0.179%	    99.931%	     0.000	        1
	         AVERAGE_POOL_2D	        1	     0.042	     0.069%	   100.000%	     0.000	        1

Timings (microseconds): count=50 first=59929 curr=60534 min=59374 max=63695 avg=61031.6 std=1182
Memory (bytes): count=0
70 nodes observed

Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Peak memory footprint (MB): init=0 overall=13.7109
Ubuntu 18.04 x86_64 + XNNPACK enabled + 4 Threads Faster-Grad-CAM weights_weight_quant.tflite Benchmark
umber of nodes executed: 74
============================== Summary by node type ==============================
	             [Node type]	  [count]	  [avg ms]	    [avg %]	    [cdf %]	  [mem KB]	[times called]
	                 CONV_2D	       31	     4.947	    77.588%	    77.588%	     0.000	       31
	                DELEGATE	       17	     0.689	    10.806%	    88.394%	     0.000	       17
	       DEPTHWISE_CONV_2D	       10	     0.591	     9.269%	    97.663%	     0.000	       10
	                    MEAN	        1	     0.110	     1.725%	    99.388%	     0.000	        1
	                     PAD	        5	     0.039	     0.612%	   100.000%	     0.000	        5
	              DEQUANTIZE	       10	     0.000	     0.000%	   100.000%	     0.000	       10

Timings (microseconds): count=155 first=6415 curr=6443 min=6105 max=6863 avg=6409.22 std=69
Memory (bytes): count=0
74 nodes observed
RaspberryPi4 + Ubuntu 19.10 aarch64 + 4 Threads Faster-Grad-CAM weights_integer_quant.tflite Benchmark
Number of nodes executed: 72
============================== Summary by node type ==============================
	             [Node type]	  [count]	  [avg ms]	    [avg %]	    [cdf %]	  [mem KB]	[times called]
	                 CONV_2D	       35	     0.753	    34.958%	    34.958%	     0.000	        0
	                     PAD	        5	     0.395	    18.338%	    53.296%	     0.000	        0
	                    MEAN	        1	     0.392	    18.199%	    71.495%	     0.000	        0
	        Misc Runtime Ops	        1	     0.282	    13.092%	    84.587%	    89.232	        0
	       DEPTHWISE_CONV_2D	       17	     0.251	    11.653%	    96.240%	     0.000	        0
	                     ADD	       10	     0.054	     2.507%	    98.747%	     0.000	        0
	                QUANTIZE	        1	     0.024	     1.114%	    99.861%	     0.000	        0
	              DEQUANTIZE	        2	     0.003	     0.139%	   100.000%	     0.000	        0

Timings (microseconds): count=472 first=564 curr=3809 min=564 max=3950 avg=2188.51 std=1625
Memory (bytes): count=0
72 nodes observed
Ubuntu 18.04 x86_64 + XNNPACK enabled + 4 Threads EfficientNet-lite efficientnet-lite0-fp32.tflite Benchmark
Number of nodes executed: 5
============================== Summary by node type ==============================
	             [Node type]	  [count]	  [avg ms]	    [avg %]	    [cdf %]	  [mem KB]	[times called]
	                DELEGATE	        2	     5.639	    95.706%	    95.706%	     0.000	        2
	         FULLY_CONNECTED	        1	     0.239	     4.056%	    99.762%	     0.000	        1
	         AVERAGE_POOL_2D	        1	     0.014	     0.238%	   100.000%	     0.000	        1
	                 RESHAPE	        1	     0.000	     0.000%	   100.000%	     0.000	        1

Timings (microseconds): count=168 first=5842 curr=5910 min=5749 max=6317 avg=5894.55 std=100
Memory (bytes): count=0
5 nodes observed
Ubuntu 18.04 x86_64 + XNNPACK enabled + 4 Threads EfficientNet-lite efficientnet-lite4-fp32.tflite Benchmark
Number of nodes executed: 5
============================== Summary by node type ==============================
	             [Node type]	  [count]	  [avg ms]	    [avg %]	    [cdf %]	  [mem KB]	[times called]
	                DELEGATE	        2	    33.720	    99.235%	    99.235%	     0.000	        2
	         FULLY_CONNECTED	        1	     0.231	     0.680%	    99.915%	     0.000	        1
	         AVERAGE_POOL_2D	        1	     0.029	     0.085%	   100.000%	     0.000	        1
	                 RESHAPE	        1	     0.000	     0.000%	   100.000%	     0.000	        1

Timings (microseconds): count=50 first=32459 curr=34867 min=31328 max=35730 avg=33983.5 std=1426
Memory (bytes): count=0
5 nodes observed

4. Reference articles

  1. [deeplab] what's the parameters of the mobilenetv3 pretrained model?
  2. When you want to fine-tune DeepLab on other datasets, there are a few cases
  3. [deeplab] Training deeplab model with ADE20K dataset
  4. Running DeepLab on PASCAL VOC 2012 Semantic Segmentation Dataset
  5. Quantize DeepLab model for faster on-device inference
  6. https://github.com/tensorflow/models/blob/master/research/deeplab/g3doc/model_zoo.md
  7. https://github.com/tensorflow/models/blob/master/research/deeplab/g3doc/quantize.md
  8. the quantized form of Shape operation is not yet implemented
  9. Post-training quantization
  10. Converter command line reference
  11. Quantization-aware training
  12. Converting a .pb file to .meta in TF 1.3
  13. Minimal code to load a trained TensorFlow model from a checkpoint and export it with SavedModelBuilder
  14. How to restore Tensorflow model from .pb file in python?
  15. Error with tag-sets when serving model using tensorflow_model_server tool
  16. ValueError: No 'serving_default' in the SavedModel's SignatureDefs. Possible values are 'name_of_my_model'
  17. kerasのモデルをデプロイする手順 - Signature作成方法解説
  18. TensorFlow で学習したモデルのグラフを tf.train.import_meta_graph でロードする
  19. Tensorflowのグラフ操作 Part1
  20. Configure input_map when importing a tensorflow model from metagraph file
  21. TFLite Model Benchmark Tool
  22. How to install Ubuntu 19.10 aarch64 (64bit) on RaspberryPi4
  23. https://github.com/rwightman/posenet-python.git

pinto_model_zoo's People

Contributors

pinto0309 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.