TF Lite Interpreter (Part 1)
Learn to apply TF Lite Interpreter to make inferences on mobile devices.
We'll cover the following...
To run a ML/DL model on a mobile device, we first define, compile, and train a TF model. We use the TF Lite converter to convert this model to the FlatBuffers format, which is suitable for mobile devices. Optionally, we use some test data to verify the working of the converted model. Finally, we deploy the converted model to a mobile device: Android or iOS.
To make inferences on mobile devices, we need an interpreter that can execute TF Lite models on a variety of platforms and devices. Let’s explore the functionality of the TF Lite Interpreter and its input/output details by first converting a TF model to the TF Lite format, initializing the TF Lite Interpreter, checking input/output details, and invoking the interpreter to perform inferences.
Model conversion
Convert a TF model to the TF Lite format; for instance, the following code converts a SavedModel:
import tensorflow as tf# Path to the directory that has the SavedModelsaved_model_dir = 'saved_model'# Creating an instance of the TFLiteConverter class and converting the modelmy_converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)# Perform model optimizaitons (optional)my_converter.optimizations = [tf.lite.Optimize.DEFAULT]tflite_default_quant_model = my_converter.convert()
Initialization
Initialize the interpreter with a TF Lite model as follows:
import tensorflow as tf# Creating an instance of the Interpreter and allocating tensorsinterpreter = tf.lite.Interpreter(model_content=tflite_default_quant_model)interpreter.allocate_tensors()
The code above initializes the interpreter with an already available TF Lite model: tflite_default_quant_model
. When creating an instance of the Interpreter
class in TF Lite, it’s necessary to allocate memory for the tensors to feed data into the model and obtain the outputs. The interpreter.allocate_tensors()
method allocates memory for the input and output tensors of the model.
Input/output details
Before invoking the model on the test data, let’s analyze the input/output details of the model. We can access input and output tensors using the get_input_tensor()
and get_output_tensor()
methods of the tf.lite.Interpreter
class. These methods return a list of dictionaries, where each dictionary contains information about one of the outputs of the TF Lite model. Each dictionary contains the following keys:
index
: This is the ...