Live Transcribe combines the results of extensive user experience (UX) research with sustainable connectivity to speech processing servers. To ensure that connectivity to these servers doesn’t cause excessive data usage, the team used cloud ASR (Automated Speech Recognition) for greater accuracy. Similarly, to reduce the network data consumption required by Live Transcribe, an on-device neural network-based speech detector was implemented.
https://www.youtube.com/watch?v=jLCwjIaPXwA
The on-device neural network-based speech detector is built using Google’s dataset for audio event research, called AudioSet, announced last year. AudioSet is an image-like model that is capable of detecting speech, automatically managing network connections to the cloud ASR engine, and minimizing data usage over long periods of use.
Additionally, the Google team partnered with Gallaudet University to make Live Transcribe intuitive, with the help of user experience research collaborations. This, in turn, would ensure that the core user needs are satisfied while maximizing the app’s potential. Google considered different devices ranging from computers, tablets, smartphones, and small projectors, etc., to effectively display auditory information and captions. After rigorous analysis, Google decided to choose smartphones because of its ” sheer ubiquity” and enhanced capabilities.
Google mentions that while building Live Transcribe, they faced a challenge regarding displaying transcription confidence. The researchers explored if they needed to show word-level or phrase-level confidence, as it was traditionally considered to be helpful. Using previous UX research, they found out that a transcript is easiest to read when it is not layered and focuses on the better presentation of the text, thus supplementing it with other auditory signals apart from speech signals.
Another useful UX signal is the noise level of the current environment and to address this, researchers built an indicator that visualizes the volume of user speech relative to background noise. This helps provide users instant feedback on microphone performance, allowing them to adjust the placement of the phone.
To enhance the capabilities of this mobile-based automatic speech transcription service, researchers plan to include on-device recognition, speaker-separation, and speech enhancement.
“Our research with Gallaudet University shows that combining it with other auditory signals like speech detection and a loudness indicator makes a tangibly meaningful change in communication options for our users”, state the researchers. Google has currently rolled out the test version of Live Transcribe on Play Store, and it has been pre-installed on all Pixel 3 devices with the latest update.
Public reaction to the news has been largely positive, with people appreciating the newly released app:
https://twitter.com/MattWilliams84/status/1092510959988629505
https://twitter.com/iamAbhisarW/status/1092642493504589826
https://twitter.com/seanmarnold/status/1092508455200587776
For more information, check out the official Live Transcribe blog.
Transformer-XL: A Google architecture with 80% longer dependency than RNNs
Google News Initiative partners with Google AI to help ‘deep fake’ audio detection research
Google Cloud Firestore, the serverless, NoSQL document database, is now generally available