...

/

TF Lite Interpreter (Part 1)

TF Lite Interpreter (Part 1)

Learn to apply TF Lite Interpreter to make inferences on mobile devices.

To run a ML/DL model on a mobile device, we first define, compile, and train a TF model. We use the TF Lite converter to convert this model to the FlatBuffers format, which is suitable for mobile devices. Optionally, we use some test data to verify the working of the converted model. Finally, we deploy the converted model to a mobile device: Android or iOS.

To make inferences on mobile devices, we need an interpreter that can execute TF Lite models on a variety of platforms and devices. Let’s explore the functionality of the TF Lite Interpreter and its input/output details by first converting a TF model to the TF Lite format, initializing the TF Lite Interpreter, checking input/output details, and invoking the interpreter to perform inferences.

Model conversion

Convert a TF model to the TF Lite format; for instance, the following code converts a SavedModel:

Press + to interact
import tensorflow as tf
# Path to the directory that has the SavedModel
saved_model_dir = 'saved_model'
# Creating an instance of the TFLiteConverter class and converting the model
my_converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
# Perform model optimizaitons (optional)
my_converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_default_quant_model = my_converter.convert()

Initialization

Initialize the interpreter with a TF Lite model as follows:

Press + to interact
import tensorflow as tf
# Creating an instance of the Interpreter and allocating tensors
interpreter = tf.lite.Interpreter(model_content=tflite_default_quant_model)
interpreter.allocate_tensors()

The code above initializes the interpreter with an already available TF Lite model: tflite_default_quant_model. When creating an instance of the Interpreter class in TF Lite, it’s necessary to allocate memory for the tensors to feed data into the model and obtain the outputs. The interpreter.allocate_tensors() method allocates memory for the input and output tensors of the model.

Input/output details

Before invoking the model on the test data, let’s analyze the input/output details of the model. We can access input and output tensors using the get_input_tensor() and get_output_tensor() methods of the tf.lite.Interpreter class. These methods return a list of dictionaries, where each dictionary contains information about one of the outputs of the TF Lite model. Each dictionary contains the following keys:

  • index: This is the ...