Get started with TensorFlow Lite

TensorFlow Lite provides all the tools you need to convert and run TensorFlow models on mobile, embedded, and IoT devices. The following guide walks through each step of the developer workflow and provides links to further instructions.

1. Choose a model

A TensorFlow model is a data structure that contains the logic and knowledge of a machine learning network trained to solve a particular problem. There are many ways to obtain a TensorFlow model, from using pre-trained models to training your own.

To use a model with TensorFlow Lite, you must convert a full TensorFlow model into the TensorFlow Lite format—you cannot create or train a model using TensorFlow Lite. So you must start with a regular TensorFlow model, and then convert the model.

Note: TensorFlow Lite supports a limited subset of TensorFlow operations, so not all models can be converted. For details, read about the TensorFlow Lite operator compatibility.

Use a pre-trained model

The TensorFlow Lite team provides a set of pre-trained models that solve a variety of machine learning problems. These models have been converted to work with TensorFlow Lite and are ready to use in your applications.

The pre-trained models include:

See our full list of pre-trained models in Models.

Models from other sources

There are many other places you can obtain pre-trained TensorFlow models, including TensorFlow Hub. In most cases, these models will not be provided in the TensorFlow Lite format, and you'll have to convert them before use.

Re-train a model (transfer learning)

Transfer learning allows you to take a trained model and re-train it to perform another task. For example, an image classification model could be retrained to recognize new categories of image. Re-training takes less time and requires less data than training a model from scratch.

You can use transfer learning to customize pre-trained models to your application. Learn how to perform transfer learning in the Recognize flowers with TensorFlow codelab.

Train a custom model

If you have designed and trained your own TensorFlow model, or you have trained a model obtained from another source, you must convert it to the TensorFlow Lite format.

2. Convert the model

TensorFlow Lite is designed to execute models efficiently on mobile and other embedded devices with limited compute and memory resources. Some of this efficiency comes from the use of a special format for storing models. TensorFlow models must be converted into this format before they can be used by TensorFlow Lite.

Converting models reduces their file size and introduces optimizations that do not affect accuracy. The TensorFlow Lite converter provides options that allow you to further reduce file size and increase speed of execution, with some trade-offs.

Note: TensorFlow Lite supports a limited subset of TensorFlow operations, so not all models can be converted. For details, read about the TensorFlow Lite operator compatibility.

TensorFlow Lite converter

The TensorFlow Lite converter is a tool available as a Python API that converts trained TensorFlow models into the TensorFlow Lite format. It can also introduce optimizations, which are covered in section 4, Optimize your model.

The following example shows a TensorFlow SavedModel being converted into the TensorFlow Lite format:

import tensorflow as tf

converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
tflite_model = converter.convert()
open("converted_model.tflite", "wb").write(tflite_model)

You can convert TensorFlow 2.0 models in a similar way.

The converter can also be used from the command line, but the Python API is recommended.

Options

The converter can convert from a variety of input types.

When converting TensorFlow 1.x models, these are:

SavedModel directories
Frozen GraphDef (models generated by freeze_graph.py)
Keras HDF5 models
Models taken from a tf.Session

When converting TensorFlow 2.x models, these are:

The converter can be configured to apply various optimizations that can improve performance or reduce file size. This is covered in section 4, Optimize your model.

Ops compatibility

TensorFlow Lite currently supports a limited subset of TensorFlow operations. The long term goal is for all TensorFlow operations to be supported.

If the model you wish to convert contains unsupported operations, you can use TensorFlow Select to include operations from TensorFlow. This will result in a larger binary being deployed to devices.

3. Run inference with the model

Inference is the process of running data through a model to obtain predictions. It requires a model, an interpreter, and input data.

TensorFlow Lite interpreter

The TensorFlow Lite interpreter is a library that takes a model file, executes the operations it defines on input data, and provides access to the output.

The interpreter works across multiple platforms and provides a simple API for running TensorFlow Lite models from Java, Swift, Objective-C, C++, and Python.

The following code shows the interpreter being invoked from Java:

try (Interpreter interpreter = new Interpreter(tensorflow_lite_model_file)) {
  interpreter.run(input, output);
}

GPU acceleration and Delegates

Some devices provide hardware acceleration for machine learning operations. For example, most mobile phones have GPUs, which can perform floating point matrix operations faster than a CPU.

The speed-up can be substantial. For example, a MobileNet v1 image classification model runs 5.5x faster on a Pixel 3 phone when GPU acceleration is used.

The TensorFlow Lite interpreter can be configured with Delegates to make use of hardware acceleration on different devices. The GPU Delegate allows the interpreter to run appropriate operations on the device's GPU.

The following code shows the GPU Delegate being used from Java:

GpuDelegate delegate = new GpuDelegate();
Interpreter.Options options = (new Interpreter.Options()).addDelegate(delegate);
Interpreter interpreter = new Interpreter(tensorflow_lite_model_file, options);
try {
  interpreter.run(input, output);
}

To add support for new hardware accelerators you can define your own delegate.

Android and iOS

The TensorFlow Lite interpreter is easy to use from both major mobile platforms. To get started, explore the Android quickstart and iOS quickstart guides. Example applications are available for both platforms.

To obtain the required libraries, Android developers should use the TensorFlow Lite AAR. iOS developers should use the CocoaPods for Swift or Objective-C.

Linux

Embedded Linux is an important platform for deploying machine learning. To get started using Python to perform inference with your TensorFlow Lite models, follow the Python quickstart.

To instead install the C++ library, see the build instructions for Raspberry Pi or Arm64-based boards (for boards such as Odroid C2, Pine64, and NanoPi).

Microcontrollers

TensorFlow Lite for Microcontrollers is an experimental port of TensorFlow Lite aimed at microcontrollers and other devices with only kilobytes of memory.

Operations

If your model requires TensorFlow operations that are not yet implemented in TensorFlow Lite, you can use TensorFlow Select to use them in your model. You'll need to build a custom version of the interpreter that includes the TensorFlow operations.

You can use Custom operators to write your own operations, or port new operations into TensorFlow Lite.

Operator versions allows you to add new functionalities and parameters into existing operations.

4. Optimize your model

TensorFlow Lite provides tools to optimize the size and performance of your models, often with minimal impact on accuracy. Optimized models may require slightly more complex training, conversion, or integration.

Machine learning optimization is an evolving field, and TensorFlow Lite's Model Optimization Toolkit is continually growing as new techniques are developed.

Performance

The goal of model optimization is to reach the ideal balance of performance, model size, and accuracy on a given device. Performance best practices can help guide you through this process.

Quantization

By reducing the precision of values and operations within a model, quantization can reduce both the size of model and the time required for inference. For many models, there is only a minimal loss of accuracy.

The TensorFlow Lite converter makes it easy to quantize TensorFlow models. The following Python code quantizes a SavedModel and saves it to disk:

import tensorflow as tf

converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_quant_model = converter.convert()
open("converted_model.tflite", "wb").write(tflite_quantized_model)

TensorFlow Lite supports reducing precision of values from full floating point to half-precision floats (float16) or 8-bit integers. There are trade-offs in model size and accuracy for each choice, and some operations have optimized implementations for these reduced precision types.

To learn more about quantization, see Post-training quantization.

Model Optimization Toolkit

The Model Optimization Toolkit is a set of tools and techniques designed to make it easy for developers to optimize their models. Many of the techniques can be applied to all TensorFlow models and are not specific to TensorFlow Lite, but they are especially valuable when running inference on devices with limited resources.

Next steps

Now that you're familiar with TensorFlow Lite, explore some of the following resources:

If you're a mobile developer, visit Android quickstart or iOS quickstart.
If you're building Linux embedded devices, see the Python quickstart or C++ build instructions for Raspberry Pi and Arm64-based boards.
Explore our pre-trained models.
Try our example apps.