Android 9 pie’s Smart Linkify: How Android’s new machine learning based feature works

Last week, Google launched Android 9 pie, the latest machine learning based Android operating system after Android Oreo. One of the features in Android 9 pie, named, smart linkify, a new version of the existing Android Linkify API adds clickable links on identifying entities such as dates, flights, addresses, etc, in content or text input via TextClassifier API.

android-9-pies-smart-linkify-how-androids-new-machine-learning-based-feature-works-img-0

Smart linkify

Smart linkify API is trained in TensorFlow which uses a small feedforward neural network. This enables it to figure out whether or not a series of numbers or words is a phone number or address, just like Android Oreo’s Smart Text Selection feature. But, what’s different with this new feature is that instead of just making it easier to highlight and copy the associated text manually, it adds a relevant actionable link allowing users to immediately take action with a just a click.

How does smart linkify work?

Smart linkify follows three basic steps:

Locating entities in an input text

Processing the input text

Training the network

Let’s have a quick look at each of the above-mentioned steps.

Finding entities in an input text

The underlying process for detecting entities within texts is not an easy task. It poses many problems as people follow different ways to write addresses and phone numbers. There can also be confusion regarding the type of entity. For instance, “Confirmation number: 857-555-3556” can look like a phone number even though it’s not.

So, to fix this problem, an inference algorithm with two small feedforward neural networks was designed by the Android team. The two feedforward neural networks look for context surrounding words and perform all kinds of entity chunking beyond just addresses and phone numbers.

The first input text is split into words and then all the possible combination of entries, named “candidates” are analyzed. After analyzing the candidates, a score is assigned on a scale of validity. Any overlapping candidates are removed, favoring the ones with the higher score. After this, the second neural network takes over and assigns a type of entity, as either a phone number, address or in some cases, a non-entity.

android-9-pies-smart-linkify-how-androids-new-machine-learning-based-feature-works-img-1

Smart Linkify finding entities in a string of text

Processing the input text

After the entities have been located in the text, it’s time to process it. The neural networks determine whether the given entity candidate in the input text is valid or not. After knowing the context surrounding the entity, the network classifies it. With the help of machine learning, the input text is split into several parts and each is fed to the network separately.

android-9-pies-smart-linkify-how-androids-new-machine-learning-based-feature-works-img-2

Smart linkify processing the input text

Google uses character n-grams and a binary capitalization feature to “represent the individual words as real vectors suitable as an input of the neural network”.

Character n-grams represent the word as a set of all character subsequences of a certain length. Google used lengths 1 to 5.

The binary feature indicates whether the word starts with a capital letter. This is important as the capitalization in postal addresses is quite distinct, thereby, helping the networks to differentiate.

Training the network

Google has a training algorithm in place for datasets. It involves collecting lists of addresses, phone numbers and named entities (such as product, place, business names, etc). These are then used to synthesize the data for training neural networks.

“We take the entities as they are and generate random textual contexts around them (from the list of random words on Web). Additionally, we add phrases like “Confirmation number:” or “ID:” to the negative training data for phone numbers, to teach the network to suppress phone number matches in these contexts”, says the Google team.

There are a couple of other techniques that Google used for training the network such as:

Quantizing the embedding matrix to 8-bit integers

Sharing embedding matrices between the selection and classification networks.

Varying the size of the context before/after the entities

Creating artificial negative examples out of the positive ones for classification network.

Currently, Smart Linkify offers support for 16 languages and plans to support more languages in the future.

Google still relies on traditional techniques using standard regular expressions for flight numbers, date, times, IBAN, etc, but it plans to include ML models for these in the future.

For more coverage on smart linkify, be sure to check out the official Google AI blog.

All new Android apps on Google Play must target API Level 26 (Android Oreo) or higher

Android P Beta 4 is here, stable Android P expected in the coming weeks!

Is Google planning to replace Android with Project Fuchsia?