The TensorFlow Lite framework consists of five high-level components. All of these components are optimized to run on a mobile platform as shown below in the architecture diagram:
Here are the core units of the TensorFlow Lite architecture:
- The first part is to convert your existing model into a TensorFlow Lite-compatible model (.tflite) using the TensorFlow Lite Converter, and have your trained model on the disk itself. You can also use the pre-trained model in your mobile or embedded applications.
- Java/C++ API—the API loads the .tflite model and invokes the interpreter. It is available on all platforms. Java API is a wrapper written on top of C++ API, and it is available only on Android.
- Interpreter and kernels—the interpreter module operates with the help of operation kernels. It loads kernels selectively; the size of the core interpreter is 75 KB. This is a significant reduction on TensorFlow Lite from the 1.1 MB required by TensorFlow Mobile. With all the supported ops, its core interpreter size comes to 400 KB. Developers can selectively choose which ops they want to include. In that way, they can keep the footprint small.
- H/W accelerated delegates—on select Android devices, the interpreter will use the Android Neural Networks API (NNAPI) for hardware acceleration, or default to CPU execution if none are available.
You can also implement custom kernels using the C++ API that can be used by the interpreter.