TensorFlow Lite

The TensorFlow Lite framework consists of five high-level components. All of these components are optimized to run on a mobile platform as shown below in the architecture diagram:

Here are the core units of the TensorFlow Lite architecture:

The first part is to convert your existing model into a TensorFlow Lite-compatible model (.tflite) using the TensorFlow Lite Converter, and have your trained model on the disk itself. You can also use the pre-trained model in your mobile or embedded applications.
Java/C++ API—the API loads the .tflite model and invokes the interpreter. It is available on all platforms. Java API is a wrapper written on top of C++ API, and it is available only on Android.
Interpreter and kernels—the interpreter module operates with the help of operation kernels. It loads kernels selectively; the size of the core interpreter is 75 KB. This is a significant reduction on TensorFlow Lite from the 1.1 MB required by TensorFlow Mobile. With all the supported ops, its core interpreter size comes to 400 KB. Developers can selectively choose which ops they want to include. In that way, they can keep the footprint small.

H/W accelerated delegates—on select Android devices, the interpreter will use the Android Neural Networks API (NNAPI) for hardware acceleration, or default to CPU execution if none are available.

You can also implement custom kernels using the C++ API that can be used by the interpreter.

Supported platforms

TensorFlow Lite currently supports Android/iOS platforms as well as Linux (for example Raspberry Pi) platforms. On embedded devices such as Raspberry Pi, Python API helps. TensorFlow Lite platforms also support Core ML models as well as iOS platforms.

On iOS platforms, from the pre-trained TensorFlow model, we can directly convert the format into the Core ML model where the app will directly run on the Core ML runtime:

With a single model, we can run the model on both Android/iOS platforms by converting the formats.

TensorFlow Lite memory usage and performance

TensorFlow uses FlatBuffers for the model. FlatBuffers is a cross-platform, open source serialization library. The main advantage of using FlatBuffers is that it does not need a secondary representation before accessing the data through packing/unpacking. It is often coupled with per-object memory allocation. FlatBuffers is more memory-efficient than Protocol Buffers because it helps us to keep the memory footprint small.

FlatBuffers was originally developed for gaming platforms. It is also used in other platforms since it is performance-sensitive. At the time of conversion, TensorFlow Lite pre-fuses the activations and biases, allowing TensorFlow Lite to execute faster. The interpreter uses static memory and execution plans that allow it to load faster. The optimized operation kernels run faster on the NEON and ARM platforms.

TensorFlow takes advantage of all innovations that happen on a silicon level on these devices. TensorFlow Lite supports the Android NNAPI. At the time of writing, a few of the Oracle Enterprise Managers (OEMs) have started using the NNAPI. TensorFlow Lite uses direct graphics acceleration, which uses Open Graphics Library (OpenGL) on Android and Metal on iOS.

To improve performance, there have been changes to quantization. This is a technique to store numbers and perform calculations on them. This helps in two ways. Firstly, as long as the model is smaller, it is better for smaller devices. Secondly, many processors have specialized synthe instruction sets, which process fixed-point operands much faster than they process floating point numbers. So, a very naive way to do quantization would be to simply shrink the weights and activations after you are done training. However, this leads to suboptimal accuracies.

TensorFlow Lite gives three times the performance of TensorFlow on MobileNet and Inception-v3. While TensorFlow Lite only supports inference, it will soon be adapted to also have a training module in it. TensorFlow Lite supports around 50 commonly used operations.

It supports MobileNet, Inception-v3, ResNet50, SqueezeNet, DenseNet, Inception-v4, SmartReply, and others:

The y axis in the graph is measured in milliseconds.

Hands-on with TensorFlow Lite

With TensorFlow Lite, you can use an existing model to quickly start building your first TensorFlow Lite-based application:

Using TensorFlow Lite in real time consists of four steps:

In the first step, we need to either use an existing model or prepare our own model and train it.
Once the model is ready, it needs to be converted into .tflite format using converters.
Then, we can write ops on top of it for any kind of optimization.
You can start writing your hello world project.

Let's jump straight into the code from here.

Converting SavedModel into TensorFlow Lite format

Converting your ML model into a TensorFlow Lite model can be done in just one line of code by calling the conversion method. Here is the simple Python snippet that converts your existing model into TensorFlow Lite format. You can feed in the existing model and convert that into .tflite format:

import sys
from tf.contrib.lite import convert_savedmodel
convert_savedmodel.convert(
                            saved_model_directory="/tmp/your_model",
                            output_tflite_file="/tmp/my_model.tflite")

The code here converts the existing model created in other frameworks into TensorFlow Lite format using FlatBuffers. There are a few conversion strategies that need to be followed.

Strategies

We implement the following strategies:

Use a frozen graphdef (or SavedModel)
Avoid unsupported operators

Use visualizers to understand the model (TensorBoard and TensorFlow Lite visualizer)
Write custom operators for any missing functionality
If anything is missed out, file an issue with the community

We will see these strategies in detail when we go further into practical applications in future chapters.

TensorFlow Lite on Android

We can start using the demo app provided in the TensorFlow GitHub repository. This is a camera application that classifies images continuously using either a floating point Inception-v3 model or a quantized MobileNet model. Try this using Android Version 5.0 or preceding.

The demo app can be found at: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/lite/java/demo/app.

This application performs real-time classification of frames. It displays the top most-probable classification categories. It also displays the time taken to detect the object.

There are three ways to get the demo app on your device:

You can download the APK binary, which is pre-built
You can build on Android Studio and run the application
You can use Bazel to download the source code of TensorFlow Lite, and run the app through the command line

Downloading the APK binary

This is the easiest way to try the application.

Once you install the app, start the application. When you open the app for the first time, it will prompt you to access the device camera using runtime permissions. Once the permissions are enabled, you can use the app to recognize objects in the real-time back camera view. In the results, you can see the top three classifications for the identified object, along with the latency.

TensorFlow Lite on Android Studio

You can download and build TensorFlow Lite directly from Android Studio by following these steps:

Download and install the latest version of Android Studio.
In your studio settings, make sure that the NDK version is greater than 14 and the SDK version is greater than 26. We are using 27 in this book and on further applications. We will look in detail at how to configure this in further projects.
You can download the application from the link in the following information box.
As Android Studio instructs, you need to install all the Gradle dependencies.

The demo app can be found at: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/lite/java/demo/app/src/main/java/com/example/android/tflitecamerademo.

We need a model in order to use it in the application. We can either use an existing model or train our own model. Let's use an existing model in this application.

You can download models at the link given next, in the information box. You can also download the zipped model file from the link given:

You can download an Inception-v3 floating point model or the latest MobileNet model. Copy the appropriate .tflite to the Android app's assets directory. You can then change the classifier in the Camera2BasicFragment.java file, tensorflow/contrib/lite/java/demo/app/src/main/assets/.

The models can be downloaded from: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/lite/g3doc/models.md.

Now, you can build and run the demo app.

Building the TensorFlow Lite demo app from the source

As a first step, clone the TensorFlow repo. You need Bazel to build the APK:

git clone https://github.com/tensorflow/tensorflow

Installing Bazel

If Bazel is not installed on your system, you need to install it. This book is written according to the macOS High Sierra 10.13.2 experience. Bazel is installed through Homebrew.

Installing using Homebrew

The following are the steps to install Homebrew:

Homebrew has dependency with JDK, which you need to install first. Download the latest JDK from the Oracle website and install it.
Then, install Homebrew.

You can run the following script directly from Terminal:

/usr/bin/ruby -e "$(curl -fsSL \
   https://raw.githubusercontent.com/Homebrew/install/master/install)"

Once Homebrew is installed, you can install Bazel with the following command:

brew install bazel

All is well. Now, you can verify the Bazel version using the command shown here:

bazel version

If Bazel is already installed, you can upgrade the version using this command:

brew upgrade bazel

Note that Bazel does not currently support Android builds on Windows. Windows users should download the pre-built binary.

Installing Android NDK and SDK

You need Android NDK to build the TensorFlow Lite code. You can download this from NDK Archives, accessed through the following link.

Android NDK Archives can be downloaded from: https://developer.android.com/ndk/downloads/older_releases.

Android Studio comes with SDK tools. You need to access build tools version 23 or higher (the application runs on devices with API 21 or higher).

You can update the WORKSPACE file in the root of the directory with the API level and path to both SDK and NDK.

Update the api_level and location of the SDK and NDK at the root of the repository. If you open SDK Manager from Studio, you can find the SDK path. For example, note the following for SDK:

android_sdk_repository (
 name = "androidsdk",
 api_level = 27,
 build_tools_version = "27.0.3",
 path = "/Users/coco/Library/Android/sdk",
)

And for Android NDK archives:

android_ndk_repository(
 name = "androidndk",
 path = "/home/coco/android-ndk-r14b/",
 api_level = 19,
)

At the time of writing, android-ndk-r14b-darwin-x86_64.zip is used from the NDK Archives. You can adjust the preceding parameters based on the availability.

Now, we are ready to build the source code. To build the demo app, run Bazel:

bazel build --cxxopt=--std=c++11 
 //tensorflow/contrib/lite/java/demo/app/src/main:TfLiteCameraDemo

Caution: Due to a bug, Bazel only supports the Python 2 environment right now.

MobileNet is a good place to start ML. The model images from this dataset consist of images in 299 * 299 pixel. But, the camera captures in a 224 * 224 pixel image and resizes it to match the size in the model. It occupies 224 * 224 * 3 bytes in the disk, per image. These bytes are converted into ByteBuffer row by row after that. Here, the number 3 represents RGB values of a pixel.

The demo app here uses the TensorFlow Lite Java API, which takes a single image as input and produces the same in output. The output contains a two-dimensional array. The first array contains the category index value, and the second dimension contains the confidence value of the classification. From the values, it displays the top three to the user on the frontend.

TensorFlow Lite on iOS

Now, we will build the same application on the iOS environment. The app has the same features, and we will also use the same quantized MobileNet model. We need to run it on a real iOS device to use the camera functionality; it won't work on a simulator.

Prerequisites

To begin using Xcode, you need to have a valid Apple developer ID on their portal. This application also requires an iPhone since it uses the camera module. You need to have the provisioning profile assigned to the particular device. Only then should you be able to build and run the application on the device.

You can clone the complete TensorFlow repository, but to run this application you may not need the complete source code. If you have downloaded it already, you don't need to do it again:

git clone https://github.com/tensorflow/tensorflow

Xcode comes with command-line tools, as shown here:

xcode-select --install

Building the iOS demo app

If you are not very familiar with iOS application building, please look at some basic tutorials for this. You need to install cocoapods to install all the dependencies:

sudo gem install cocoapods

There is a shell script available to download the model files required to run this application:

sh tensorflow/contrib/lite/examples/ios/download_models.sh

You can go to the project directory and install pod from the command line:

cd tensorflow/contrib/lite/examples/ios/camera
pod install
pod update

Once the update is done, you should be able to see tflite_camera_example.xcworkspace. Then, you can open the application in Xcode. You can use the following command as well: