TensorFlow Lite provides all the tools you need to convert and run TensorFlow models on mobile, embedded, and IoT devices. The following guide walks through each step of the developer workflow and provides links to further instructions.
A TensorFlow model is a data structure that contains the logic and knowledge of a machine learning network trained to solve a particular problem. There are many ways to obtain a TensorFlow model, from using pre-trained models to training your own.
To use a model with TensorFlow Lite, you must convert a full TensorFlow model into the TensorFlow Lite format—you cannot create or train a model using TensorFlow Lite. So you must start with a regular TensorFlow model, and then convert the model.
Note: TensorFlow Lite supports a limited subset of TensorFlow operations, so not all models can be converted. For details, read about the TensorFlow Lite operator compatibility.
The TensorFlow Lite team provides a set of pre-trained models that solve a variety of machine learning problems. These models have been converted to work with TensorFlow Lite and are ready to use in your applications.
The pre-trained models include:
See our full list of pre-trained models in Models.
There are many other places you can obtain pre-trained TensorFlow models, including TensorFlow Hub. In most cases, these models will not be provided in the TensorFlow Lite format, and you'll have to convert them before use.
Transfer learning allows you to take a trained model and re-train it to perform another task. For example, an image classification model could be retrained to recognize new categories of image. Re-training takes less time and requires less data than training a model from scratch.
You can use transfer learning to customize pre-trained models to your application. Learn how to perform transfer learning in the Recognize flowers with TensorFlow codelab.
If you have designed and trained your own TensorFlow model, or you have trained a model obtained from another source, you must convert it to the TensorFlow Lite format.
TensorFlow Lite is designed to execute models efficiently on mobile and other embedded devices with limited compute and memory resources. Some of this efficiency comes from the use of a special format for storing models. TensorFlow models must be converted into this format before they can be used by TensorFlow Lite.
Converting models reduces their file size and introduces optimizations that do not affect accuracy. The TensorFlow Lite converter provides options that allow you to further reduce file size and increase speed of execution, with some trade-offs.
Note: TensorFlow Lite supports a limited subset of TensorFlow operations, so not all models can be converted. For details, read about the TensorFlow Lite operator compatibility.
The TensorFlow Lite converter is a tool available as a Python API that converts trained TensorFlow models into the TensorFlow Lite format. It can also introduce optimizations, which are covered in section 4, Optimize your model.
The following example shows a TensorFlow SavedModel
being converted into the TensorFlow Lite format:
import tensorflow as tf converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir) tflite_model = converter.convert() open("converted_model.tflite", "wb").write(tflite_model)
You can convert TensorFlow 2.0 models in a similar way.
The converter can also be used from the command line, but the Python API is recommended.
The converter can convert from a variety of input types.
When converting TensorFlow 1.x models, these are:
tf.Session
When converting TensorFlow 2.x models, these are:
The converter can be configured to apply various optimizations that can improve performance or reduce file size. This is covered in section 4, Optimize your model.
TensorFlow Lite currently supports a limited subset of TensorFlow operations. The long term goal is for all TensorFlow operations to be supported.
If the model you wish to convert contains unsupported operations, you can use TensorFlow Select to include operations from TensorFlow. This will result in a larger binary being deployed to devices.
Inference is the process of running data through a model to obtain predictions. It requires a model, an interpreter, and input data.
The TensorFlow Lite interpreter is a library that takes a model file, executes the operations it defines on input data, and provides access to the output.
The interpreter works across multiple platforms and provides a simple API for running TensorFlow Lite models from Java, Swift, Objective-C, C++, and Python.
The following code shows the interpreter being invoked from Java:
try (Interpreter interpreter = new Interpreter(tensorflow_lite_model_file)) { interpreter.run(input, output); }
Some devices provide hardware acceleration for machine learning operations. For example, most mobile phones have GPUs, which can perform floating point matrix operations faster than a CPU.
The speed-up can be substantial. For example, a MobileNet v1 image classification model runs 5.5x faster on a Pixel 3 phone when GPU acceleration is used.
The TensorFlow Lite interpreter can be configured with Delegates to make use of hardware acceleration on different devices. The GPU Delegate allows the interpreter to run appropriate operations on the device's GPU.
The following code shows the GPU Delegate being used from Java:
GpuDelegate delegate = new GpuDelegate(); Interpreter.Options options = (new Interpreter.Options()).addDelegate(delegate); Interpreter interpreter = new Interpreter(tensorflow_lite_model_file, options); try { interpreter.run(input, output); }
To add support for new hardware accelerators you can define your own delegate.
The TensorFlow Lite interpreter is easy to use from both major mobile platforms. To get started, explore the Android quickstart and iOS quickstart guides. Example applications are available for both platforms.
To obtain the required libraries, Android developers should use the TensorFlow Lite AAR. iOS developers should use the CocoaPods for Swift or Objective-C.
Embedded Linux is an important platform for deploying machine learning. To get started using Python to perform inference with your TensorFlow Lite models, follow the Python quickstart.
To instead install the C++ library, see the build instructions for Raspberry Pi or Arm64-based boards (for boards such as Odroid C2, Pine64, and NanoPi).
TensorFlow Lite for Microcontrollers is an experimental port of TensorFlow Lite aimed at microcontrollers and other devices with only kilobytes of memory.
If your model requires TensorFlow operations that are not yet implemented in TensorFlow Lite, you can use TensorFlow Select to use them in your model. You'll need to build a custom version of the interpreter that includes the TensorFlow operations.
You can use Custom operators to write your own operations, or port new operations into TensorFlow Lite.
Operator versions allows you to add new functionalities and parameters into existing operations.
TensorFlow Lite provides tools to optimize the size and performance of your models, often with minimal impact on accuracy. Optimized models may require slightly more complex training, conversion, or integration.
Machine learning optimization is an evolving field, and TensorFlow Lite's Model Optimization Toolkit is continually growing as new techniques are developed.
The goal of model optimization is to reach the ideal balance of performance, model size, and accuracy on a given device. Performance best practices can help guide you through this process.
By reducing the precision of values and operations within a model, quantization can reduce both the size of model and the time required for inference. For many models, there is only a minimal loss of accuracy.
The TensorFlow Lite converter makes it easy to quantize TensorFlow models. The following Python code quantizes a SavedModel
and saves it to disk:
import tensorflow as tf converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir) converter.optimizations = [tf.lite.Optimize.DEFAULT] tflite_quant_model = converter.convert() open("converted_model.tflite", "wb").write(tflite_quantized_model)
TensorFlow Lite supports reducing precision of values from full floating point to half-precision floats (float16) or 8-bit integers. There are trade-offs in model size and accuracy for each choice, and some operations have optimized implementations for these reduced precision types.
To learn more about quantization, see Post-training quantization.
The Model Optimization Toolkit is a set of tools and techniques designed to make it easy for developers to optimize their models. Many of the techniques can be applied to all TensorFlow models and are not specific to TensorFlow Lite, but they are especially valuable when running inference on devices with limited resources.
Now that you're familiar with TensorFlow Lite, explore some of the following resources: