We will now start to build the mobile application to generate captions for objects at which the camera is pointed. It will consist of a camera preview to capture images and a text view to display the captions returned by the model.Â
The application can be broadly divided into two parts, as follows:
- Building the camera preview
- Integrating the model to fetch the captions
In the following section, we will talk about building a basic camera preview.