blob: b76138781d892b90c2b44c58a932ae0b6b405c41 [file] [log] [blame]
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "_DDaAex5Q7u-"
},
"source": [
"##### Copyright 2019 The TensorFlow Authors."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "W1dWWdNHQ9L0"
},
"outputs": [],
"source": [
"#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"# you may not use this file except in compliance with the License.\n",
"# You may obtain a copy of the License at\n",
"#\n",
"# https://www.apache.org/licenses/LICENSE-2.0\n",
"#\n",
"# Unless required by applicable law or agreed to in writing, software\n",
"# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"# See the License for the specific language governing permissions and\n",
"# limitations under the License."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "6Y8E0lw5eYWm"
},
"source": [
"# Post-training integer quantization"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "CIGrZZPTZVeO"
},
"source": [
"<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
" <td>\n",
" <a target=\"_blank\" href=\"https://www.tensorflow.org/lite/performance/post_training_integer_quant\"><img src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" />View on TensorFlow.org</a>\n",
" </td>\n",
" <td>\n",
" <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/tensorflow/blob/master/tensorflow/lite/g3doc/performance/post_training_integer_quant.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
" </td>\n",
" <td>\n",
" <a target=\"_blank\" href=\"https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/g3doc/performance/post_training_integer_quant.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n",
" </td>\n",
" <td>\n",
" <a href=\"https://storage.googleapis.com/tensorflow_docs/tensorflow/lite/g3doc/performance/post_training_integer_quant.ipynb\"><img src=\"https://www.tensorflow.org/images/download_logo_32px.png\" />Download notebook</a>\n",
" </td>\n",
"</table>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "BTC1rDAuei_1"
},
"source": [
"## Overview\n",
"\n",
"Integer quantization is an optimization strategy that converts 32-bit floating-point numbers (such as weights and activation outputs) to the nearest 8-bit fixed-point numbers. This results in a smaller model and increased inferencing speed, which is valuable for low-power devices such as [microcontrollers](https://www.tensorflow.org/lite/microcontrollers). This data format is also required by integer-only accelerators such as the [Edge TPU](https://coral.ai/).\n",
"\n",
"In this tutorial, you'll train an MNIST model from scratch, convert it into a Tensorflow Lite file, and quantize it using [post-training quantization](https://www.tensorflow.org/lite/performance/post_training_quantization). Finally, you'll check the accuracy of the converted model and compare it to the original float model.\n",
"\n",
"You actually have several options as to how much you want to quantize a model. In this tutorial, you'll perform \"full integer quantization,\" which converts all weights and activation outputs into 8-bit integer data—whereas other strategies may leave some amount of data in floating-point.\n",
"\n",
"To learn more about the various quantization strategies, read about [TensorFlow Lite model optimization](https://www.tensorflow.org/lite/performance/model_optimization).\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "dDqqUIZjZjac"
},
"source": [
"## Setup"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "I0nR5AMEWq0H"
},
"source": [
"In order to quantize both the input and output tensors, we need to use APIs added in TensorFlow r2.3:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "WsN6s5L1ieNl"
},
"outputs": [],
"source": [
"import logging\n",
"logging.getLogger(\"tensorflow\").setLevel(logging.DEBUG)\n",
"\n",
"import tensorflow as tf\n",
"import numpy as np\n",
"assert float(tf.__version__[:3]) >= 2.3"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "2XsEP17Zelz9"
},
"source": [
"## Generate a TensorFlow Model"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5NMaNZQCkW9X"
},
"source": [
"We'll build a simple model to classify numbers from the [MNIST dataset](https://www.tensorflow.org/datasets/catalog/mnist).\n",
"\n",
"This training won't take long because you're training the model for just a 5 epochs, which trains to about ~98% accuracy."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "eMsw_6HujaqM"
},
"outputs": [],
"source": [
"# Load MNIST dataset\n",
"mnist = tf.keras.datasets.mnist\n",
"(train_images, train_labels), (test_images, test_labels) = mnist.load_data()\n",
"\n",
"# Normalize the input image so that each pixel value is between 0 to 1.\n",
"train_images = train_images.astype(np.float32) / 255.0\n",
"test_images = test_images.astype(np.float32) / 255.0\n",
"\n",
"# Define the model architecture\n",
"model = tf.keras.Sequential([\n",
" tf.keras.layers.InputLayer(input_shape=(28, 28)),\n",
" tf.keras.layers.Reshape(target_shape=(28, 28, 1)),\n",
" tf.keras.layers.Conv2D(filters=12, kernel_size=(3, 3), activation='relu'),\n",
" tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),\n",
" tf.keras.layers.Flatten(),\n",
" tf.keras.layers.Dense(10)\n",
"])\n",
"\n",
"# Train the digit classification model\n",
"model.compile(optimizer='adam',\n",
" loss=tf.keras.losses.SparseCategoricalCrossentropy(\n",
" from_logits=True),\n",
" metrics=['accuracy'])\n",
"model.fit(\n",
" train_images,\n",
" train_labels,\n",
" epochs=5,\n",
" validation_data=(test_images, test_labels)\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "KuTEoGFYd8aM"
},
"source": [
"## Convert to a TensorFlow Lite model"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "xl8_fzVAZwOh"
},
"source": [
"Now you can convert the trained model to TensorFlow Lite format using the [`TFLiteConverter`](https://www.tensorflow.org/lite/convert/python_api) API, and apply varying degrees of quantization.\n",
"\n",
"Beware that some versions of quantization leave some of the data in float format. So the following sections show each option with increasing amounts of quantization, until we get a model that's entirely int8 or uint8 data. (Notice we duplicate some code in each section so you can see all the quantization steps for each option.)\n",
"\n",
"First, here's a converted model with no quantization:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "_i8B2nDZmAgQ"
},
"outputs": [],
"source": [
"converter = tf.lite.TFLiteConverter.from_keras_model(model)\n",
"\n",
"tflite_model = converter.convert()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "7BONhYtYocQY"
},
"source": [
"It's now a TensorFlow Lite model, but it's still using 32-bit float values for all parameter data."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "jPYZwgZTwJMT"
},
"source": [
"### Convert using dynamic range quantization\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Hjvq1vpJd4U_"
},
"source": [
"Now let's enable the default `optimizations` flag to quantize all fixed parameters (such as weights):"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "HEZ6ET1AHAS3"
},
"outputs": [],
"source": [
"converter = tf.lite.TFLiteConverter.from_keras_model(model)\n",
"converter.optimizations = [tf.lite.Optimize.DEFAULT]\n",
"\n",
"tflite_model_quant = converter.convert()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "o5wuE-RcdX_3"
},
"source": [
"The model is now a bit smaller with quantized weights, but other variable data is still in float format."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "UgKDdnHQEhpb"
},
"source": [
"### Convert using float fallback quantization"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "rTe8avZJHMDO"
},
"source": [
"To quantize the variable data (such as model input/output and intermediates between layers), you need to provide a [`RepresentativeDataset`](https://www.tensorflow.org/api_docs/python/tf/lite/RepresentativeDataset). This is a generator function that provides a set of input data that's large enough to represent typical values. It allows the converter to estimate a dynamic range for all the variable data. (The dataset does not need to be unique compared to the training or evaluation dataset.)\n",
"To support multiple inputs, each representative data point is a list and elements in the list are fed to the model according to their indices.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "FiwiWU3gHdkW"
},
"outputs": [],
"source": [
"def representative_data_gen():\n",
" for input_value in tf.data.Dataset.from_tensor_slices(train_images).batch(1).take(100):\n",
" # Model has only one input so each data point has one element.\n",
" yield [input_value]\n",
"\n",
"converter = tf.lite.TFLiteConverter.from_keras_model(model)\n",
"converter.optimizations = [tf.lite.Optimize.DEFAULT]\n",
"converter.representative_dataset = representative_data_gen\n",
"\n",
"tflite_model_quant = converter.convert()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "_GC3HFlptf7x"
},
"source": [
"Now all weights and variable data are quantized, and the model is significantly smaller compared to the original TensorFlow Lite model.\n",
"\n",
"However, to maintain compatibility with applications that traditionally use float model input and output tensors, the TensorFlow Lite Converter leaves the model input and output tensors in float:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "id1OEKFELQwp"
},
"outputs": [],
"source": [
"interpreter = tf.lite.Interpreter(model_content=tflite_model_quant)\n",
"input_type = interpreter.get_input_details()[0]['dtype']\n",
"print('input: ', input_type)\n",
"output_type = interpreter.get_output_details()[0]['dtype']\n",
"print('output: ', output_type)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "RACBJuj2XO8x"
},
"source": [
"That's usually good for compatibility, but it won't be compatible with devices that perform only integer-based operations, such as the Edge TPU.\n",
"\n",
"Additionally, the above process may leave an operation in float format if TensorFlow Lite doesn't include a quantized implementation for that operation. This strategy allows conversion to complete so you have a smaller and more efficient model, but again, it won't be compatible with integer-only hardware. (All ops in this MNIST model have a quantized implementation.)\n",
"\n",
"So to ensure an end-to-end integer-only model, you need a couple more parameters..."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "FQgTqbvPvxGJ"
},
"source": [
"### Convert using integer-only quantization"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "mwR9keYAwArA"
},
"source": [
"To quantize the input and output tensors, and make the converter throw an error if it encounters an operation it cannot quantize, convert the model again with some additional parameters:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "kzjEjcDs3BHa"
},
"outputs": [],
"source": [
"def representative_data_gen():\n",
" for input_value in tf.data.Dataset.from_tensor_slices(train_images).batch(1).take(100):\n",
" yield [input_value]\n",
"\n",
"converter = tf.lite.TFLiteConverter.from_keras_model(model)\n",
"converter.optimizations = [tf.lite.Optimize.DEFAULT]\n",
"converter.representative_dataset = representative_data_gen\n",
"# Ensure that if any ops can't be quantized, the converter throws an error\n",
"converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]\n",
"# Set the input and output tensors to uint8 (APIs added in r2.3)\n",
"converter.inference_input_type = tf.uint8\n",
"converter.inference_output_type = tf.uint8\n",
"\n",
"tflite_model_quant = converter.convert()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "wYd6NxD03yjB"
},
"source": [
"The internal quantization remains the same as above, but you can see the input and output tensors are now integer format:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "PaNkOS-twz4k"
},
"outputs": [],
"source": [
"interpreter = tf.lite.Interpreter(model_content=tflite_model_quant)\n",
"input_type = interpreter.get_input_details()[0]['dtype']\n",
"print('input: ', input_type)\n",
"output_type = interpreter.get_output_details()[0]['dtype']\n",
"print('output: ', output_type)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "TO17AP84wzBb"
},
"source": [
"Now you have an integer quantized model that uses integer data for the model's input and output tensors, so it's compatible with integer-only hardware such as the [Edge TPU](https://coral.ai)."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "sse224YJ4KMm"
},
"source": [
"### Save the models as files"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "4_9nZ4nv4b9P"
},
"source": [
"You'll need a `.tflite` file to deploy your model on other devices. So let's save the converted models to files and then load them when we run inferences below."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "BEY59dC14uRv"
},
"outputs": [],
"source": [
"import pathlib\n",
"\n",
"tflite_models_dir = pathlib.Path(\"/tmp/mnist_tflite_models/\")\n",
"tflite_models_dir.mkdir(exist_ok=True, parents=True)\n",
"\n",
"# Save the unquantized/float model:\n",
"tflite_model_file = tflite_models_dir/\"mnist_model.tflite\"\n",
"tflite_model_file.write_bytes(tflite_model)\n",
"# Save the quantized model:\n",
"tflite_model_quant_file = tflite_models_dir/\"mnist_model_quant.tflite\"\n",
"tflite_model_quant_file.write_bytes(tflite_model_quant)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "9t9yaTeF9fyM"
},
"source": [
"## Run the TensorFlow Lite models"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "L8lQHMp_asCq"
},
"source": [
"Now we'll run inferences using the TensorFlow Lite [`Interpreter`](https://www.tensorflow.org/api_docs/python/tf/lite/Interpreter) to compare the model accuracies.\n",
"\n",
"First, we need a function that runs inference with a given model and images, and then returns the predictions:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "X092SbeWfd1A"
},
"outputs": [],
"source": [
"# Helper function to run inference on a TFLite model\n",
"def run_tflite_model(tflite_file, test_image_indices):\n",
" global test_images\n",
"\n",
" # Initialize the interpreter\n",
" interpreter = tf.lite.Interpreter(model_path=str(tflite_file))\n",
" interpreter.allocate_tensors()\n",
"\n",
" input_details = interpreter.get_input_details()[0]\n",
" output_details = interpreter.get_output_details()[0]\n",
"\n",
" predictions = np.zeros((len(test_image_indices),), dtype=int)\n",
" for i, test_image_index in enumerate(test_image_indices):\n",
" test_image = test_images[test_image_index]\n",
" test_label = test_labels[test_image_index]\n",
"\n",
" # Check if the input type is quantized, then rescale input data to uint8\n",
" if input_details['dtype'] == np.uint8:\n",
" input_scale, input_zero_point = input_details[\"quantization\"]\n",
" test_image = test_image / input_scale + input_zero_point\n",
"\n",
" test_image = np.expand_dims(test_image, axis=0).astype(input_details[\"dtype\"])\n",
" interpreter.set_tensor(input_details[\"index\"], test_image)\n",
" interpreter.invoke()\n",
" output = interpreter.get_tensor(output_details[\"index\"])[0]\n",
"\n",
" predictions[i] = output.argmax()\n",
"\n",
" return predictions\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "2opUt_JTdyEu"
},
"source": [
"### Test the models on one image\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "QpPpFPaz7eEM"
},
"source": [
"Now we'll compare the performance of the float model and quantized model:\n",
"+ `tflite_model_file` is the original TensorFlow Lite model with floating-point data.\n",
"+ `tflite_model_quant_file` is the last model we converted using integer-only quantization (it uses uint8 data for input and output).\n",
"\n",
"Let's create another function to print our predictions:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "zR2cHRUcUZ6e"
},
"outputs": [],
"source": [
"import matplotlib.pylab as plt\n",
"\n",
"# Change this to test a different image\n",
"test_image_index = 1\n",
"\n",
"## Helper function to test the models on one image\n",
"def test_model(tflite_file, test_image_index, model_type):\n",
" global test_labels\n",
"\n",
" predictions = run_tflite_model(tflite_file, [test_image_index])\n",
"\n",
" plt.imshow(test_images[test_image_index])\n",
" template = model_type + \" Model \\n True:{true}, Predicted:{predict}\"\n",
" _ = plt.title(template.format(true= str(test_labels[test_image_index]), predict=str(predictions[0])))\n",
" plt.grid(False)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "A5OTJ_6Vcslt"
},
"source": [
"Now test the float model:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "iTK0x980coto"
},
"outputs": [],
"source": [
"test_model(tflite_model_file, test_image_index, model_type=\"Float\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "o3N6-UGl1dfE"
},
"source": [
"And test the quantized model:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "rc1i9umMcp0t"
},
"outputs": [],
"source": [
"test_model(tflite_model_quant_file, test_image_index, model_type=\"Quantized\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "LwN7uIdCd8Gw"
},
"source": [
"### Evaluate the models on all images"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "RFKOD4DG8XmU"
},
"source": [
"Now let's run both models using all the test images we loaded at the beginning of this tutorial:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "05aeAuWjvjPx"
},
"outputs": [],
"source": [
"# Helper function to evaluate a TFLite model on all images\n",
"def evaluate_model(tflite_file, model_type):\n",
" global test_images\n",
" global test_labels\n",
"\n",
" test_image_indices = range(test_images.shape[0])\n",
" predictions = run_tflite_model(tflite_file, test_image_indices)\n",
"\n",
" accuracy = (np.sum(test_labels== predictions) * 100) / len(test_images)\n",
"\n",
" print('%s model accuracy is %.4f%% (Number of test samples=%d)' % (\n",
" model_type, accuracy, len(test_images)))"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "xnFilQpBuMh5"
},
"source": [
"Evaluate the float model:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "T5mWkSbMcU5z"
},
"outputs": [],
"source": [
"evaluate_model(tflite_model_file, model_type=\"Float\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Km3cY9ry8ZlG"
},
"source": [
"Evaluate the quantized model:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "-9cnwiPp6EGm"
},
"outputs": [],
"source": [
"evaluate_model(tflite_model_quant_file, model_type=\"Quantized\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "L7lfxkor8pgv"
},
"source": [
"So you now have an integer quantized a model with almost no difference in the accuracy, compared to the float model.\n",
"\n",
"To learn more about other quantization strategies, read about [TensorFlow Lite model optimization](https://www.tensorflow.org/lite/performance/model_optimization)."
]
}
],
"metadata": {
"colab": {
"collapsed_sections": [],
"name": "post_training_integer_quant.ipynb",
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 0
}