blob: 1629b4c9af38b67a6388ab86cc81c2240c0b3b3b [file] [log] [blame]
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "post-training--integer-quant.ipynb",
"version": "0.3.2",
"provenance": [],
"private_outputs": true,
"collapsed_sections": [],
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 2",
"name": "python2"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "6Y8E0lw5eYWm"
},
"source": [
"# Post Training Integer Quantization"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "CIGrZZPTZVeO"
},
"source": [
"<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
" <td>\n",
" <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/tensorflow/blob/master/tensorflow/lite/tutorials/post_training_integer_quant.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
" </td>\n",
" <td>\n",
" <a target=\"_blank\" href=\"https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/tutorials/post_training_integer_quant.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n",
" </td>\n",
"</table>"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "BTC1rDAuei_1"
},
"source": [
"## Overview\n",
"\n",
"[TensorFlow Lite](https://www.tensorflow.org/lite/) now supports\n",
"converting an entire model (weights and activations) to 8-bit during model conversion from TensorFlow to TensorFlow Lite's flat buffer format. This results in a 4x reduction in model size and a 3 to 4x performance improvement on CPU performance. In addition, this fully quantized model can be consumed by integer-only hardware accelerators.\n",
"\n",
"In contrast to [post-training \"on-the-fly\" quantization](https://colab.sandbox.google.com/github/tensorflow/tensorflow/blob/master/tensorflow/lite/tutorials/post_training_quant.ipynb)\n",
", which only stores weights as 8-bit ints, in this technique all weights *and* activations are quantized statically during model conversion.\n",
"\n",
"In this tutorial, we train an MNIST model from scratch, check its accuracy in TensorFlow, and then convert the saved model into a Tensorflow Lite flatbuffer\n",
"with full quantization. We finally check the\n",
"accuracy of the converted model and compare it to the original saved model. We\n",
"run the training script [mnist.py](https://github.com/tensorflow/models/blob/master/official/mnist/mnist.py) from\n",
"[Tensorflow official MNIST tutorial](https://github.com/tensorflow/models/tree/master/official/mnist).\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "2XsEP17Zelz9"
},
"source": [
"## Building an MNIST model"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "dDqqUIZjZjac"
},
"source": [
"### Setup"
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "gyqAw1M9lyab",
"colab": {}
},
"source": [
"! pip uninstall -y tensorflow\n",
"! pip install -U tf-nightly"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "WsN6s5L1ieNl",
"colab": {}
},
"source": [
"import tensorflow as tf\n",
"tf.enable_eager_execution()"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "00U0taBoe-w7",
"colab": {}
},
"source": [
"! git clone --depth 1 https://github.com/tensorflow/models"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "4XZPtSh-fUOc",
"colab": {}
},
"source": [
"import sys\n",
"import os\n",
"\n",
"if sys.version_info.major >= 3:\n",
" import pathlib\n",
"else:\n",
" import pathlib2 as pathlib\n",
"\n",
"# Add `models` to the python path.\n",
"models_path = os.path.join(os.getcwd(), \"models\")\n",
"sys.path.append(models_path)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "eQ6Q0qqKZogR"
},
"source": [
"### Train and export the model"
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "eMsw_6HujaqM",
"colab": {}
},
"source": [
"saved_models_root = \"/tmp/mnist_saved_model\""
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "hWSAjQWagIHl",
"colab": {}
},
"source": [
"# The above path addition is not visible to subprocesses, add the path for the subprocess as well.\n",
"# Note: channels_last is required here or the conversion may fail. \n",
"!PYTHONPATH={models_path} python models/official/mnist/mnist.py --train_epochs=1 --export_dir {saved_models_root} --data_format=channels_last"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "5NMaNZQCkW9X"
},
"source": [
"For the example, we only trained the model for a single epoch, so it only trains to ~96% accuracy.\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "xl8_fzVAZwOh"
},
"source": [
"### Convert to a TensorFlow Lite model\n",
"\n",
"The `savedmodel` directory is named with a timestamp. Select the most recent one: "
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "Xp5oClaZkbtn",
"colab": {}
},
"source": [
"saved_model_dir = str(sorted(pathlib.Path(saved_models_root).glob(\"*\"))[-1])\n",
"saved_model_dir"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "AT8BgkKmljOy"
},
"source": [
"Using the [Python `TFLiteConverter`](https://www.tensorflow.org/lite/convert/python_api), the saved model can be converted into a TensorFlow Lite model.\n",
"\n",
"First load the model using the `TFLiteConverter`:"
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "_i8B2nDZmAgQ",
"colab": {}
},
"source": [
"import tensorflow as tf\n",
"tf.enable_eager_execution()\n",
"tf.logging.set_verbosity(tf.logging.DEBUG)\n",
"\n",
"converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)\n",
"tflite_model = converter.convert()"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "F2o2ZfF0aiCx"
},
"source": [
"Write it out to a `.tflite` file:"
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "vptWZq2xnclo",
"colab": {}
},
"source": [
"tflite_models_dir = pathlib.Path(\"/tmp/mnist_tflite_models/\")\n",
"tflite_models_dir.mkdir(exist_ok=True, parents=True)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "Ie9pQaQrn5ue",
"colab": {}
},
"source": [
"tflite_model_file = tflite_models_dir/\"mnist_model.tflite\"\n",
"tflite_model_file.write_bytes(tflite_model)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "7BONhYtYocQY"
},
"source": [
"To instead quantize the model on export, first set the `optimizations` flag to optimize for size:"
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "HEZ6ET1AHAS3",
"colab": {}
},
"source": [
"tf.logging.set_verbosity(tf.logging.INFO)\n",
"converter.optimizations = [tf.lite.Optimize.DEFAULT]"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "rTe8avZJHMDO",
"colab_type": "text"
},
"source": [
"Now, construct and provide a representative dataset, this is used to get the dynamic range of activations."
]
},
{
"cell_type": "code",
"metadata": {
"id": "FiwiWU3gHdkW",
"colab_type": "code",
"colab": {}
},
"source": [
"mnist_train, _ = tf.keras.datasets.mnist.load_data()\n",
"images = tf.cast(mnist_train[0], tf.float32)/255.0\n",
"mnist_ds = tf.data.Dataset.from_tensor_slices((images)).batch(1)\n",
"def representative_data_gen():\n",
" for input_value in mnist_ds.take(100):\n",
" yield [input_value]\n",
"\n",
"converter.representative_dataset = representative_data_gen"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "xW84iMYjHd9t",
"colab_type": "text"
},
"source": [
"Finally, convert the model like usual. Note, by default the converted model will still use float input and outputs for invocation convenience."
]
},
{
"cell_type": "code",
"metadata": {
"id": "yuNfl3CoHNK3",
"colab_type": "code",
"colab": {}
},
"source": [
"tflite_quant_model = converter.convert()\n",
"tflite_model_quant_file = tflite_models_dir/\"mnist_model_quant.tflite\"\n",
"tflite_model_quant_file.write_bytes(tflite_quant_model)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "PhMmUTl4sbkz"
},
"source": [
"Note how the resulting file is approximately `1/4` the size."
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "JExfcfLDscu4",
"colab": {}
},
"source": [
"!ls -lh {tflite_models_dir}"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "L8lQHMp_asCq"
},
"source": [
"## Run the TensorFlow Lite models"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "-5l6-ciItvX6"
},
"source": [
"We can run the TensorFlow Lite model using the Python TensorFlow Lite\n",
"Interpreter. \n",
"\n",
"### Load the test data\n",
"\n",
"First, let's load the MNIST test data to feed to the model:"
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "eTIuU07NuKFL",
"colab": {}
},
"source": [
"import numpy as np\n",
"_, mnist_test = tf.keras.datasets.mnist.load_data()\n",
"images, labels = tf.cast(mnist_test[0], tf.float32)/255.0, mnist_test[1]\n",
"\n",
"mnist_ds = tf.data.Dataset.from_tensor_slices((images, labels)).batch(1)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "Ap_jE7QRvhPf"
},
"source": [
"### Load the model into the interpreters"
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "Jn16Rc23zTss",
"colab": {}
},
"source": [
"interpreter = tf.lite.Interpreter(model_path=str(tflite_model_file))\n",
"interpreter.allocate_tensors()"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "J8Pztk1mvNVL",
"colab": {}
},
"source": [
"interpreter_quant = tf.lite.Interpreter(model_path=str(tflite_model_quant_file))\n",
"interpreter_quant.allocate_tensors()"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "2opUt_JTdyEu"
},
"source": [
"### Test the models on one image"
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "AKslvo2kwWac",
"colab": {}
},
"source": [
"for img, label in mnist_ds:\n",
" break\n",
"\n",
"interpreter.set_tensor(interpreter.get_input_details()[0][\"index\"], img)\n",
"interpreter.invoke()\n",
"predictions = interpreter.get_tensor(\n",
" interpreter.get_output_details()[0][\"index\"])"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "XZClM2vo3_bm",
"colab": {}
},
"source": [
"import matplotlib.pylab as plt\n",
"\n",
"plt.imshow(img[0])\n",
"template = \"True:{true}, predicted:{predict}\"\n",
"_ = plt.title(template.format(true= str(label[0].numpy()),\n",
" predict=str(predictions[0])))\n",
"plt.grid(False)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "3gwhv4lKbYZ4",
"colab": {}
},
"source": [
"interpreter_quant.set_tensor(\n",
" interpreter_quant.get_input_details()[0][\"index\"], img)\n",
"interpreter_quant.invoke()\n",
"predictions = interpreter_quant.get_tensor(\n",
" interpreter_quant.get_output_details()[0][\"index\"])"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "CIH7G_MwbY2x",
"colab": {}
},
"source": [
"plt.imshow(img[0])\n",
"template = \"True:{true}, predicted:{predict}\"\n",
"_ = plt.title(template.format(true= str(label[0].numpy()),\n",
" predict=str(predictions[0])))\n",
"plt.grid(False)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "LwN7uIdCd8Gw"
},
"source": [
"### Evaluate the models"
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "05aeAuWjvjPx",
"colab": {}
},
"source": [
"def eval_model(interpreter, mnist_ds):\n",
" total_seen = 0\n",
" num_correct = 0\n",
"\n",
" input_index = interpreter.get_input_details()[0][\"index\"]\n",
" output_index = interpreter.get_output_details()[0][\"index\"]\n",
" for img, label in mnist_ds:\n",
" total_seen += 1\n",
" interpreter.set_tensor(input_index, img)\n",
" interpreter.invoke()\n",
" predictions = interpreter.get_tensor(output_index)\n",
" if predictions == label.numpy():\n",
" num_correct += 1\n",
"\n",
" if total_seen % 500 == 0:\n",
" print(\"Accuracy after %i images: %f\" %\n",
" (total_seen, float(num_correct) / float(total_seen)))\n",
"\n",
" return float(num_correct) / float(total_seen)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "T5mWkSbMcU5z",
"colab": {}
},
"source": [
"print(eval_model(interpreter, mnist_ds))"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "Km3cY9ry8ZlG"
},
"source": [
"We can repeat the evaluation on the fully quantized model to obtain:\n"
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "-9cnwiPp6EGm",
"colab": {}
},
"source": [
"# NOTE: Colab runs on server CPUs. At the time of writing this, TensorFlow Lite\n",
"# doesn't have super optimized server CPU kernels. For this reason this may be\n",
"# slower than the above float interpreter. But for mobile CPUs, considerable\n",
"# speedup can be observed.\n",
"print(eval_model(interpreter_quant, mnist_ds))\n"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "L7lfxkor8pgv"
},
"source": [
"In this example, we have fully quantized a model with no difference in the accuracy."
]
}
]
}