| { |
| "nbformat": 4, |
| "nbformat_minor": 0, |
| "metadata": { |
| "colab": { |
| "name": "post-training--integer-quant.ipynb", |
| "version": "0.3.2", |
| "provenance": [], |
| "private_outputs": true, |
| "collapsed_sections": [], |
| "toc_visible": true |
| }, |
| "kernelspec": { |
| "display_name": "Python 2", |
| "name": "python2" |
| } |
| }, |
| "cells": [ |
| { |
| "cell_type": "markdown", |
| "metadata": { |
| "colab_type": "text", |
| "id": "6Y8E0lw5eYWm" |
| }, |
| "source": [ |
| "# Post Training Integer Quantization" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": { |
| "colab_type": "text", |
| "id": "CIGrZZPTZVeO" |
| }, |
| "source": [ |
| "<table class=\"tfo-notebook-buttons\" align=\"left\">\n", |
| " <td>\n", |
| " <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/tensorflow/blob/master/tensorflow/lite/tutorials/post_training_integer_quant.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n", |
| " </td>\n", |
| " <td>\n", |
| " <a target=\"_blank\" href=\"https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/tutorials/post_training_integer_quant.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n", |
| " </td>\n", |
| "</table>" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": { |
| "colab_type": "text", |
| "id": "BTC1rDAuei_1" |
| }, |
| "source": [ |
| "## Overview\n", |
| "\n", |
| "[TensorFlow Lite](https://www.tensorflow.org/lite/) now supports\n", |
| "converting an entire model (weights and activations) to 8-bit during model conversion from TensorFlow to TensorFlow Lite's flat buffer format. This results in a 4x reduction in model size and a 3 to 4x performance improvement on CPU performance. In addition, this fully quantized model can be consumed by integer-only hardware accelerators.\n", |
| "\n", |
| "In contrast to [post-training \"on-the-fly\" quantization](https://colab.sandbox.google.com/github/tensorflow/tensorflow/blob/master/tensorflow/lite/tutorials/post_training_quant.ipynb)\n", |
| ", which only stores weights as 8-bit ints, in this technique all weights *and* activations are quantized statically during model conversion.\n", |
| "\n", |
| "In this tutorial, we train an MNIST model from scratch, check its accuracy in TensorFlow, and then convert the saved model into a Tensorflow Lite flatbuffer\n", |
| "with full quantization. We finally check the\n", |
| "accuracy of the converted model and compare it to the original saved model. We\n", |
| "run the training script [mnist.py](https://github.com/tensorflow/models/blob/master/official/mnist/mnist.py) from\n", |
| "[Tensorflow official MNIST tutorial](https://github.com/tensorflow/models/tree/master/official/mnist).\n" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": { |
| "colab_type": "text", |
| "id": "2XsEP17Zelz9" |
| }, |
| "source": [ |
| "## Building an MNIST model" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": { |
| "colab_type": "text", |
| "id": "dDqqUIZjZjac" |
| }, |
| "source": [ |
| "### Setup" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "metadata": { |
| "colab_type": "code", |
| "id": "gyqAw1M9lyab", |
| "colab": {} |
| }, |
| "source": [ |
| "! pip uninstall -y tensorflow\n", |
| "! pip install -U tf-nightly" |
| ], |
| "execution_count": 0, |
| "outputs": [] |
| }, |
| { |
| "cell_type": "code", |
| "metadata": { |
| "colab_type": "code", |
| "id": "WsN6s5L1ieNl", |
| "colab": {} |
| }, |
| "source": [ |
| "import tensorflow as tf\n", |
| "tf.enable_eager_execution()" |
| ], |
| "execution_count": 0, |
| "outputs": [] |
| }, |
| { |
| "cell_type": "code", |
| "metadata": { |
| "colab_type": "code", |
| "id": "00U0taBoe-w7", |
| "colab": {} |
| }, |
| "source": [ |
| "! git clone --depth 1 https://github.com/tensorflow/models" |
| ], |
| "execution_count": 0, |
| "outputs": [] |
| }, |
| { |
| "cell_type": "code", |
| "metadata": { |
| "colab_type": "code", |
| "id": "4XZPtSh-fUOc", |
| "colab": {} |
| }, |
| "source": [ |
| "import sys\n", |
| "import os\n", |
| "\n", |
| "if sys.version_info.major >= 3:\n", |
| " import pathlib\n", |
| "else:\n", |
| " import pathlib2 as pathlib\n", |
| "\n", |
| "# Add `models` to the python path.\n", |
| "models_path = os.path.join(os.getcwd(), \"models\")\n", |
| "sys.path.append(models_path)" |
| ], |
| "execution_count": 0, |
| "outputs": [] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": { |
| "colab_type": "text", |
| "id": "eQ6Q0qqKZogR" |
| }, |
| "source": [ |
| "### Train and export the model" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "metadata": { |
| "colab_type": "code", |
| "id": "eMsw_6HujaqM", |
| "colab": {} |
| }, |
| "source": [ |
| "saved_models_root = \"/tmp/mnist_saved_model\"" |
| ], |
| "execution_count": 0, |
| "outputs": [] |
| }, |
| { |
| "cell_type": "code", |
| "metadata": { |
| "colab_type": "code", |
| "id": "hWSAjQWagIHl", |
| "colab": {} |
| }, |
| "source": [ |
| "# The above path addition is not visible to subprocesses, add the path for the subprocess as well.\n", |
| "# Note: channels_last is required here or the conversion may fail. \n", |
| "!PYTHONPATH={models_path} python models/official/mnist/mnist.py --train_epochs=1 --export_dir {saved_models_root} --data_format=channels_last" |
| ], |
| "execution_count": 0, |
| "outputs": [] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": { |
| "colab_type": "text", |
| "id": "5NMaNZQCkW9X" |
| }, |
| "source": [ |
| "For the example, we only trained the model for a single epoch, so it only trains to ~96% accuracy.\n", |
| "\n" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": { |
| "colab_type": "text", |
| "id": "xl8_fzVAZwOh" |
| }, |
| "source": [ |
| "### Convert to a TensorFlow Lite model\n", |
| "\n", |
| "The `savedmodel` directory is named with a timestamp. Select the most recent one: " |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "metadata": { |
| "colab_type": "code", |
| "id": "Xp5oClaZkbtn", |
| "colab": {} |
| }, |
| "source": [ |
| "saved_model_dir = str(sorted(pathlib.Path(saved_models_root).glob(\"*\"))[-1])\n", |
| "saved_model_dir" |
| ], |
| "execution_count": 0, |
| "outputs": [] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": { |
| "colab_type": "text", |
| "id": "AT8BgkKmljOy" |
| }, |
| "source": [ |
| "Using the [Python `TFLiteConverter`](https://www.tensorflow.org/lite/convert/python_api), the saved model can be converted into a TensorFlow Lite model.\n", |
| "\n", |
| "First load the model using the `TFLiteConverter`:" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "metadata": { |
| "colab_type": "code", |
| "id": "_i8B2nDZmAgQ", |
| "colab": {} |
| }, |
| "source": [ |
| "import tensorflow as tf\n", |
| "tf.enable_eager_execution()\n", |
| "tf.logging.set_verbosity(tf.logging.DEBUG)\n", |
| "\n", |
| "converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)\n", |
| "tflite_model = converter.convert()" |
| ], |
| "execution_count": 0, |
| "outputs": [] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": { |
| "colab_type": "text", |
| "id": "F2o2ZfF0aiCx" |
| }, |
| "source": [ |
| "Write it out to a `.tflite` file:" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "metadata": { |
| "colab_type": "code", |
| "id": "vptWZq2xnclo", |
| "colab": {} |
| }, |
| "source": [ |
| "tflite_models_dir = pathlib.Path(\"/tmp/mnist_tflite_models/\")\n", |
| "tflite_models_dir.mkdir(exist_ok=True, parents=True)" |
| ], |
| "execution_count": 0, |
| "outputs": [] |
| }, |
| { |
| "cell_type": "code", |
| "metadata": { |
| "colab_type": "code", |
| "id": "Ie9pQaQrn5ue", |
| "colab": {} |
| }, |
| "source": [ |
| "tflite_model_file = tflite_models_dir/\"mnist_model.tflite\"\n", |
| "tflite_model_file.write_bytes(tflite_model)" |
| ], |
| "execution_count": 0, |
| "outputs": [] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": { |
| "colab_type": "text", |
| "id": "7BONhYtYocQY" |
| }, |
| "source": [ |
| "To instead quantize the model on export, first set the `optimizations` flag to optimize for size:" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "metadata": { |
| "colab_type": "code", |
| "id": "HEZ6ET1AHAS3", |
| "colab": {} |
| }, |
| "source": [ |
| "tf.logging.set_verbosity(tf.logging.INFO)\n", |
| "converter.optimizations = [tf.lite.Optimize.DEFAULT]" |
| ], |
| "execution_count": 0, |
| "outputs": [] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": { |
| "id": "rTe8avZJHMDO", |
| "colab_type": "text" |
| }, |
| "source": [ |
| "Now, construct and provide a representative dataset, this is used to get the dynamic range of activations." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "metadata": { |
| "id": "FiwiWU3gHdkW", |
| "colab_type": "code", |
| "colab": {} |
| }, |
| "source": [ |
| "mnist_train, _ = tf.keras.datasets.mnist.load_data()\n", |
| "images = tf.cast(mnist_train[0], tf.float32)/255.0\n", |
| "mnist_ds = tf.data.Dataset.from_tensor_slices((images)).batch(1)\n", |
| "def representative_data_gen():\n", |
| " for input_value in mnist_ds.take(100):\n", |
| " yield [input_value]\n", |
| "\n", |
| "converter.representative_dataset = representative_data_gen" |
| ], |
| "execution_count": 0, |
| "outputs": [] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": { |
| "id": "xW84iMYjHd9t", |
| "colab_type": "text" |
| }, |
| "source": [ |
| "Finally, convert the model like usual. Note, by default the converted model will still use float input and outputs for invocation convenience." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "metadata": { |
| "id": "yuNfl3CoHNK3", |
| "colab_type": "code", |
| "colab": {} |
| }, |
| "source": [ |
| "tflite_quant_model = converter.convert()\n", |
| "tflite_model_quant_file = tflite_models_dir/\"mnist_model_quant.tflite\"\n", |
| "tflite_model_quant_file.write_bytes(tflite_quant_model)" |
| ], |
| "execution_count": 0, |
| "outputs": [] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": { |
| "colab_type": "text", |
| "id": "PhMmUTl4sbkz" |
| }, |
| "source": [ |
| "Note how the resulting file is approximately `1/4` the size." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "metadata": { |
| "colab_type": "code", |
| "id": "JExfcfLDscu4", |
| "colab": {} |
| }, |
| "source": [ |
| "!ls -lh {tflite_models_dir}" |
| ], |
| "execution_count": 0, |
| "outputs": [] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": { |
| "colab_type": "text", |
| "id": "L8lQHMp_asCq" |
| }, |
| "source": [ |
| "## Run the TensorFlow Lite models" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": { |
| "colab_type": "text", |
| "id": "-5l6-ciItvX6" |
| }, |
| "source": [ |
| "We can run the TensorFlow Lite model using the Python TensorFlow Lite\n", |
| "Interpreter. \n", |
| "\n", |
| "### Load the test data\n", |
| "\n", |
| "First, let's load the MNIST test data to feed to the model:" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "metadata": { |
| "colab_type": "code", |
| "id": "eTIuU07NuKFL", |
| "colab": {} |
| }, |
| "source": [ |
| "import numpy as np\n", |
| "_, mnist_test = tf.keras.datasets.mnist.load_data()\n", |
| "images, labels = tf.cast(mnist_test[0], tf.float32)/255.0, mnist_test[1]\n", |
| "\n", |
| "mnist_ds = tf.data.Dataset.from_tensor_slices((images, labels)).batch(1)" |
| ], |
| "execution_count": 0, |
| "outputs": [] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": { |
| "colab_type": "text", |
| "id": "Ap_jE7QRvhPf" |
| }, |
| "source": [ |
| "### Load the model into the interpreters" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "metadata": { |
| "colab_type": "code", |
| "id": "Jn16Rc23zTss", |
| "colab": {} |
| }, |
| "source": [ |
| "interpreter = tf.lite.Interpreter(model_path=str(tflite_model_file))\n", |
| "interpreter.allocate_tensors()" |
| ], |
| "execution_count": 0, |
| "outputs": [] |
| }, |
| { |
| "cell_type": "code", |
| "metadata": { |
| "colab_type": "code", |
| "id": "J8Pztk1mvNVL", |
| "colab": {} |
| }, |
| "source": [ |
| "interpreter_quant = tf.lite.Interpreter(model_path=str(tflite_model_quant_file))\n", |
| "interpreter_quant.allocate_tensors()" |
| ], |
| "execution_count": 0, |
| "outputs": [] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": { |
| "colab_type": "text", |
| "id": "2opUt_JTdyEu" |
| }, |
| "source": [ |
| "### Test the models on one image" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "metadata": { |
| "colab_type": "code", |
| "id": "AKslvo2kwWac", |
| "colab": {} |
| }, |
| "source": [ |
| "for img, label in mnist_ds:\n", |
| " break\n", |
| "\n", |
| "interpreter.set_tensor(interpreter.get_input_details()[0][\"index\"], img)\n", |
| "interpreter.invoke()\n", |
| "predictions = interpreter.get_tensor(\n", |
| " interpreter.get_output_details()[0][\"index\"])" |
| ], |
| "execution_count": 0, |
| "outputs": [] |
| }, |
| { |
| "cell_type": "code", |
| "metadata": { |
| "colab_type": "code", |
| "id": "XZClM2vo3_bm", |
| "colab": {} |
| }, |
| "source": [ |
| "import matplotlib.pylab as plt\n", |
| "\n", |
| "plt.imshow(img[0])\n", |
| "template = \"True:{true}, predicted:{predict}\"\n", |
| "_ = plt.title(template.format(true= str(label[0].numpy()),\n", |
| " predict=str(predictions[0])))\n", |
| "plt.grid(False)" |
| ], |
| "execution_count": 0, |
| "outputs": [] |
| }, |
| { |
| "cell_type": "code", |
| "metadata": { |
| "colab_type": "code", |
| "id": "3gwhv4lKbYZ4", |
| "colab": {} |
| }, |
| "source": [ |
| "interpreter_quant.set_tensor(\n", |
| " interpreter_quant.get_input_details()[0][\"index\"], img)\n", |
| "interpreter_quant.invoke()\n", |
| "predictions = interpreter_quant.get_tensor(\n", |
| " interpreter_quant.get_output_details()[0][\"index\"])" |
| ], |
| "execution_count": 0, |
| "outputs": [] |
| }, |
| { |
| "cell_type": "code", |
| "metadata": { |
| "colab_type": "code", |
| "id": "CIH7G_MwbY2x", |
| "colab": {} |
| }, |
| "source": [ |
| "plt.imshow(img[0])\n", |
| "template = \"True:{true}, predicted:{predict}\"\n", |
| "_ = plt.title(template.format(true= str(label[0].numpy()),\n", |
| " predict=str(predictions[0])))\n", |
| "plt.grid(False)" |
| ], |
| "execution_count": 0, |
| "outputs": [] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": { |
| "colab_type": "text", |
| "id": "LwN7uIdCd8Gw" |
| }, |
| "source": [ |
| "### Evaluate the models" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "metadata": { |
| "colab_type": "code", |
| "id": "05aeAuWjvjPx", |
| "colab": {} |
| }, |
| "source": [ |
| "def eval_model(interpreter, mnist_ds):\n", |
| " total_seen = 0\n", |
| " num_correct = 0\n", |
| "\n", |
| " input_index = interpreter.get_input_details()[0][\"index\"]\n", |
| " output_index = interpreter.get_output_details()[0][\"index\"]\n", |
| " for img, label in mnist_ds:\n", |
| " total_seen += 1\n", |
| " interpreter.set_tensor(input_index, img)\n", |
| " interpreter.invoke()\n", |
| " predictions = interpreter.get_tensor(output_index)\n", |
| " if predictions == label.numpy():\n", |
| " num_correct += 1\n", |
| "\n", |
| " if total_seen % 500 == 0:\n", |
| " print(\"Accuracy after %i images: %f\" %\n", |
| " (total_seen, float(num_correct) / float(total_seen)))\n", |
| "\n", |
| " return float(num_correct) / float(total_seen)" |
| ], |
| "execution_count": 0, |
| "outputs": [] |
| }, |
| { |
| "cell_type": "code", |
| "metadata": { |
| "colab_type": "code", |
| "id": "T5mWkSbMcU5z", |
| "colab": {} |
| }, |
| "source": [ |
| "print(eval_model(interpreter, mnist_ds))" |
| ], |
| "execution_count": 0, |
| "outputs": [] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": { |
| "colab_type": "text", |
| "id": "Km3cY9ry8ZlG" |
| }, |
| "source": [ |
| "We can repeat the evaluation on the fully quantized model to obtain:\n" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "metadata": { |
| "colab_type": "code", |
| "id": "-9cnwiPp6EGm", |
| "colab": {} |
| }, |
| "source": [ |
| "# NOTE: Colab runs on server CPUs. At the time of writing this, TensorFlow Lite\n", |
| "# doesn't have super optimized server CPU kernels. For this reason this may be\n", |
| "# slower than the above float interpreter. But for mobile CPUs, considerable\n", |
| "# speedup can be observed.\n", |
| "print(eval_model(interpreter_quant, mnist_ds))\n" |
| ], |
| "execution_count": 0, |
| "outputs": [] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": { |
| "colab_type": "text", |
| "id": "L7lfxkor8pgv" |
| }, |
| "source": [ |
| "In this example, we have fully quantized a model with no difference in the accuracy." |
| ] |
| } |
| ] |
| } |