Why use machine learning on mobile devices?

Machine learning is needed to extract meaningful and actionable information from huge amounts of data. A significant amount of computation is required to analyze huge amounts of data and arrive at an inference. This processing is ideal for a cloud environment. However, if we could carry out machine learning on a mobile, the following would be the advantages:

Machine learning could be performed offline, as there would be no need to send all the data that the mobile has to the network and wait for results back from the server.
The network bandwidth cost incurred, if any, due to the transmission of mobile data to the server is avoided.
Latency can be avoided by processing data locally. Mobile machine learning has a great deal of responsiveness as we don't have to wait for connection and response back from the server. It might take up to 1-2 seconds for server response, but mobile machine learning can do it instantly.
Privacy—this is another advantage of mobile machine learning. There is no need to send the user data outside the mobile device, enabling better privacy.

Machine learning started in computers, but the emerging trend shows that mobile app development with machine learning implemented on mobile devices is the next big thing. Modern mobile devices show the high productive capacity level that is enough to perform appropriate tasks to the same degree as traditional computers do. Also, there are some signals from global corporations that confirm this assumption:

Google launched TensorFlow for Mobile. There is very significant interest from the developer community also.
Apple has launched Siri SDK and Core ML and now all developers can incorporate this feature into their apps.
Lenovo is working on their new smartphone that also performs without an internet connection and executes indoor geolocation and augmented reality.
There is significant research being undertaken by most of the mobile chip makers, whether it is Apple, Qualcomm, Samsung, or even Google itself, working on hardware dedicated to speeding up machine learning on mobile devices.
There are many innovations happening in the hardware layer to enable hardware acceleration, which would make machine learning on mobile easy.
Many mobile-optimized models such as MobileNets, Squeeze Net, and so on have been open sourced.
The availability of IoT devices and smart hardware appliances is increasing, which will aid in innovation.
There are more use cases that people are interested in for offline scenarios.
There is more and more focus on user data privacy and users' desire for their personal data not to leave their mobile devices at all.

Some classic examples of machine learning on mobile devices are as follows:

Speech recognition
Computer vision and image classification
Gesture recognition
Translation from one language into another
Interactive on-device detection of text
Autonomous vehicles, drone navigation, and robotics
Patient-monitoring systems and mobile applications interacting with medical devices

Ways to implement machine learning in mobile applications

Now, we clearly understand what machine learning is and what the key tasks to be performed in a learning problem are. The four main activities to be performed for any machine learning problem are as follows:

Define the machine learning problem
Gather the data required
Use that data to build/train a model
Use the model to make predictions

Training the model is the most difficult part of the whole process. Once we have trained the model and have the model ready, using it to infer or predict for a new dataset is very easy.

For all the four steps provided in the preceding points, we clearly need to decide where we intend to use them—on a device or in the cloud.

The main things we need to decide are as follows:

First of all, are we going to train and create a custom model or use a prebuilt model?
If we want to train our own model, do we do this training on our desktop machine or in the cloud? Is there a possibility to train the model on a mobile device?
Once the model is available, are we going to put it in a local device and do the inference on the device or are we going to deploy the model in the cloud and do the inference from there?

The following are the broad possibilities to implement machine learning in mobile applications. We will get into the details of it in the upcoming sections:

Utilizing machine learning service providers for a machine learning model

There are many service providers offering machine learning as a service. We can just utilize them.

Examples of such providers who provide machine learning as a service are listed in the following points. This list is increasing every day:

Clarifai
Google Cloud Vision

Microsoft Azure Cognitive Services
IBM Watson
Amazon Web Services

If we were to go with this model, the training is already done, the model is built, and model features are exposed as web services. So, all we have to do from the mobile application is simply to invoke the model service with the required dataset and get the results from the cloud provider and then display the results in the mobile application as per our requirement:

Some of the providers provide an SDK that makes the integration work very simple.

There may be a charge that we need to provide to the cloud service provider for utilizing their machine learning web services. There may be various models based on which this fee is charged, for example, the number of times it is invoked, the type of model, and so on.

So, this is a very simple way to use machine learning services, without actually having to do anything about the model. On top of this, the machine learning service provider keeps the model updated by constant retraining, including new datasets whenever required, and so on. So, the maintenance and improvement of the model is automatically taken care of on a routine basis.

So, this type of model is easy for people who are experts in mobile but don't know anything about ML, but want to build an ML-enabled app.

So the obvious benefits of such a cloud-based machine learning service are as follows:

It is easy to use.
No knowledge of machine learning is required and the tough part of the training is done by the service provider.

Retraining, model updates, support, and maintenance of the model are done by the provider.
Charges are paid only as per usage. There is no overhead to maintain the model, the data for training, and so on.

Some of the flip sides of this approach are as follows:

The prediction will be done in the cloud. So, the dataset for which the prediction or inference is to be done has to be sent to the cloud. The dataset has to be maintained at the optimal size.
Since data moves over the network, there may be some performance issues experienced in the app, since the whole thing now becomes network-dependent.
Mobile applications won't work in offline mode and work as completely online applications.
Mostly, charges are to be paid per request. So, if the number of users of the application increases exponentially, the cost for the machine learning service also increases.
The training and retraining is in the control of the cloud service provider. So, they might have done training for common datasets. If our mobile application is going to use something really unique, chances are that the predictions may not work.

To get started with ML-enabled mobile applications, the model is the right fit both with respect to cost and technical feasibility. And absolutely fine for a machine learning newbie.

Ways to train the machine learning model

There are various ways to go about training our own machine learning model. Before getting into ways to train our model, why would we go for training our own model?

Mostly, if our data is special or unique in some way and very much specific to our requirements and when the existing solutions cannot be used to solve our problem, we may decide to train our own model.

For training our own model, a good dataset is required. A good dataset is one which is qualitatively and quantitatively good and large.

Training our model can be done in multiple ways/places based on our requirements and the amount of data:

On a desktop (training in the cloud):
- General cloud computing
- Hosted machine learning
- Private cloud/simple server machine
On a device: This is not very feasible. We can only deploy the trained model on a mobile device and invoke it from a mobile device. So far, the training process itself is not feasible from a mobile device.

On a desktop (training in the cloud)

If we have decided to carry out the training process on a desktop, we have to do it in the cloud or on our humble local server, based on our needs.

If we decide to use the cloud, again we have the following two options:

Generic cloud computing
Hosted machine learning

Generic cloud computing is similar to utilizing the cloud service provider to carry out our work. We want to carry out machine learning training. So, in order to carry this out, whatever is required, say hardware, storage, and so on, must be obtained from them. We can do whatever we need with these resources. We need to place our training dataset here, run the training logic/algorithms, build the model, and so on.

Once the training is done and the model is created, the model can be taken anywhere for usage. To the cloud provider, we pay the charges for utilizing the hardware and storage only.

Amazon Web Services (AWS) and Azure are some of the cloud-computing vendors.

The benefits of using this approach are as follows:

The hardware/storage can be procured and used on the go. There is no need to worry about increasing storage and so on, when the amount of training data increases. It can be incremented when needed by paying the charges.
Once the training is done and the model is created, we can release the computing resources. Costs incurred on computing resources are only for the training period and hence if we are able to finish the training quickly, we save a lot.
We are free to download the trained model and use it anywhere.

What we need to be careful about when we go for this approach is the following:

We need to take care of the entire training work and the model creation. We are only going to use the compute resources required to carry out this work.
So, we need to know how to train and build the model.

Several companies, such as Amazon, Microsoft, and Google, now offer machine learning as a service on top of their existing cloud services. In the hosted machine learning model, we neither need to worry about the compute resources nor the machine learning models. We need to upload the data for our problem set, choose the model that we want to train for our data from the available list of models, and that's all. The machine learning services take care of training the model and providing the trained model to us for usage.

This approach works really well when we are not so well-versed to write our own custom model and train it, but also do not want to go completely to a machine learning provider to use their service, but want to do something in between. We can choose between the models, upload our unique dataset, and then train it for our requirements.

In this type of approach, the provider usually makes us tied to their platform. We may not be able to download the model and deploy it anywhere else for usage. We may need to be tied to them and utilize their platform from our app for using the trained model.

One more thing to note is that if at a later point in time, we decide to move to another provider, the trained model cannot be exported and imported to the other provider. We may need to carry out the training process again on the new provider platform.

In this approach, we might need to pay for the compute resources –hardware/storage –plus, after the training, to use the trained model, we may need to pay on an ongoing per-usage basis, that is, an on-demand basis; whenever we use it, we need to pay for what we use.

The benefits of using this approach are as follows:

There is no need to worry about the compute resources/storage required for training the data.
There is no need to worry about understanding the details of machine learning models to build and train custom models.
Just upload the data, choose the model to use for training and that's it. Get the trained model for usage
There is no need to worry about deploying the model to anywhere for consumption from the mobile application.

What we need to be careful about when we go for this approach is as follows:

Mostly, we may get tied to their platform after the training process in order to use the model obtained after training. However, there are a few exceptions, such as Google's Cloud platform.
We may be able to choose only from the models provided by the provider. We can only choose from the available list.
A trained model from one platform cannot be moved to another platform. So, if we decide to change the platform later, we may need to retain again in their platform.
We may need to pay for compute resources and also pay on an ongoing basis for usage of the model.

Using our private cloud/simple server is similar to training on the generic cloud, except that we need to manage the compute resources/storage. In this approach, the only thing we miss out on is the flexibility given by generic cloud solution providers that include increasing/decreasing the compute and storage resources, the overhead to maintain and manage these compute resources, and so on.

The major advantage we get with this approach is about the security of the data we get. If we think our data is really unique and needs to be kept completely secured, this is a good approach to use. Here, everything is done in-house using our own resources.

The benefits of using this approach are as follows:

Absolutely everything is in our control, including the compute resources, training data, model, and so on
It is more secure

What we need to be careful about when we go for this approach is the following:

Everything needs to be managed by us
We should be clear with the machine learning concepts, data, model, and training process
Continuous availability of compute resources/hardware is to be managed by us
If our dataset is going to be huge, this might not be very effective, as we may need to scale the compute resources and storage as per the increasing dataset size

On a device

The training process on a device has still not picked up. It may be feasible for a very small dataset. Since the compute resources required to train the data and also the storage required to store the data is more, generally mobile is not the preferred platform to carry out the training process.

The retraining phase also becomes complicated if we use mobile as a platform for the training process.

Ways to carry out the inference – making predictions

Once the model is created, we need to use the model for a new dataset in order to infer or make the predictions. Similar to how we had various ways in which we could carry out the training process, we can have multiple approaches to carry out the inference process also:

On a server:
- General cloud computing
- Hosted machine learning
- Private cloud/simple server machine
On a device

Inference on a server would require a network request and the application will need to be online to use this approach. But, inference on the device means the application can be a completely offline application. So, obviously, all the overheads for an online app, in terms of speed/performance, and so on, is better for an offline application.

However, for inference, if there are more compute resources—that is, processing power/memory is required—the inference cannot be done on the device.

Inference on a server

In this approach, once the model is trained, we host the model on a server to utilize it from the application.

The model can be hosted either in a cloud machine or on a local server, or it can be that of a hosted machine learning provider. The server is going to publish the endpoint URL, which needs to be accessed to utilize it to make the required predictions. The required dataset is to be passed as input to the service.

Doing the inference on a server makes the mobile application simple. The model can be improved periodically, without having to redeploy the mobile client application. New features can be added into the model easily. There is no requirement to upgrade the mobile application for any model changes.

The benefits of using this approach are as follows:

Mobile application becomes relatively simple.
The model can be updated at any time without the redeployment of the client application.
It is easy to support multiple OS platforms without writing the complex inference logic in an OS-specific platform. Everything is done in the backend.

What we need to be careful about when we go for this approach is the following:

The application can work only in online mode. The application has to connect to backend components in order to carry out the inference logic.
There is a requirement to maintain the server hardware and software and ensure it is up and running. It needs to scale for users. For scalability, the additional cost is required to manage multiple servers and ensure they are up and running always.
Users need to transmit the data to the backend for inference. If the data is huge, they might experience performance issues as well as users needing to pay for transmitting the data.

Inference on a device

In this approach, the machine learning model is loaded into the client mobile application. To make a prediction, the mobile application runs all the inference computations locally on the device, on its own CPU or GPU. It need not communicate to the server for anything related to machine learning.

Speed is the major reason for doing inference directly on a device. We need not send a request over the server and wait for the reply. Things happen almost instantaneously.

Since the model is bundled along with the mobile application, it is not very easy to upgrade the model in one place and reuse it. The mobile application upgrade has to be done. The upgrade push has to be provided to all active users. All this is a big overhead and will consume a lot of effort and time.

Even for small changes, retraining the model with very few additional parameters will involve a complex process of an application upgrade, pushing the upgrade to live users, and maintaining the required infrastructure for the same.

The benefits of using this approach are as follows:

Users can use the mobile application in offline mode. Availability of the network is not essential to operate the mobile application.
The prediction and inference can happen very quickly since the model is right there along with the application source code.
The data required to predict need not be sent over the network and hence no bandwidth cost is involved for users.
There is no overhead to run and maintain server infrastructure, and multiple servers can be managed for user scalability.

What we need to be careful about when we go for this approach is the following:

Since the model is included along with the application, it is difficult to make changes to the model. The changes can be done, but to make the changes reach all client applications is a costly process that consumes effort and time.
The model file, if huge, can increase the size of the application significantly.
The prediction logic should be written for each OS platform the application supports, say iOS or Android.
All of the model has to be properly encrypted or obfuscated to make sure it is not hacked by other developers.

In this book, we are going to look into the details of utilizing the SDKs and tools available to perform tasks related to machine learning locally on a mobile device itself.

Popular mobile machine learning tools and SDKs

The following are the key machine learning SDKs we are going to explore in this book:

TensorFlow Lite from Google
Core ML from Apple
Caffe2Go from Facebook
ML Kit from Google
Fritz.ai

We will go over the details of the SDKs and also sample mobile machine learning applications built using these SDKs, leveraging different types of machine learning algorithms.

Skills needed to implement on-device machine learning

In order to implement machine learning on a mobile device, deep knowledge of machine learning algorithms, the entire process, and how to build the machine learning model is not required. For a mobile application developer who knows how to create mobile applications using iOS or Android SDK, just like how they utilize the backend APIs to invoke the backend business logic, they need to know the mechanism to invoke the machine learning models from their mobile application to make predictions. They need to know the mechanism to import the machine learning model into the mobile resources folder and then invoke the various features of the model to make the predictions.

To summarize, the following diagram shows the steps for a mobile developer to implement machine learning on a device:

Machine learning implementation on mobiles can be considered similar to backend API integration. You build the API separately and then integrate where required. Similarly, you build the model separately outside the device and import it into the mobile application and integrate where required.