Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon

Amazon Transcribe Streaming announces support for WebSockets

Save for later
  • 3 min read
  • 29 Jul 2019

article-image

Last week, Amazon announced that its automatic speech recognition (ASR) service, Amazon Transcribe, now supports WebSockets. According to Amazon, “WebSocket support opens Amazon Transcribe Streaming up to a wider audience and makes integrations easier for customers that might have existing WebSocket-based integrations or knowledge”.

Amazon Transcribe allows developers to add speech-to-text capability to their applications easily with its ASR service. Amazon announced the general availability of Amazon Transcribe in the AWS San Francisco Summit 2018. With Amazon Transcribe API, users can analyze audio files stored in Amazon S3 and have the service return a text file of the transcribed speech. Real-time transcripts from a live audio stream are also possible with the Transcribe API.

Until now, the Amazon Transcribe Streaming API has been available using HTTP/2 streaming. However, Amazon adds the new WebSockets support as another integration option for bringing real-time voice capabilities to different projects built using Transcribe.

What are WebSockets?


WebSockets are a protocol built atop TCP, similar to HTTP. HTTP is excellent for short-lived requests, however, it does not handle persistent real-time communications well. Due to this, the first Amazon Transcribe Streaming API made available uses HTTP/2 streams that solve a lot of the issues that HTTP had with real-time communications.

Amazon states, “an HTTP connection is normally closed at the end of the message, a WebSocket connection remains open”. With this advantage, messages can be sent bi-directionally with no bandwidth or latency added by handshaking and negotiating a connection.

WebSocket connections are full-duplex, which means that the server and client can both transmit data to and fro at the same time. WebSockets were also designed “for cross-domain usage, so there’s no messing around with cross-origin resource sharing (CORS) as there is with HTTP”.

Amazon Transcribe Streaming using Websockets


While using the WebSocket protocol to stream audio, Amazon Transcribe transcribes the stream in real-time. When a user encodes the audio with event stream encoding, Amazon Transcribe responds with a JSON structure, which is also encoded using event stream encoding.

Key components of a WebSocket request to Amazon Transcribe are:

  • Creating a pre-signed URL to access Amazon Transcribe.
  • Unlock access to the largest independent learning library in Tech for FREE!
    Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
    Renews at €18.99/month. Cancel anytime
  • Creating binary WebSocket frames containing event stream encoded audio data.
  • Handling WebSocket frames in the response.


The different languages that Amazon Transcribe currently supports during real-time transcription include British English (en-GB), US English (en-US), French (fr-FR), Canadian French (fr-CA), and US Spanish (es-US).

To know more about WebSockets API in detail, visit Amazon’s official post.

Understanding WebSockets and Server-sent Events in Detail

Implementing a non-blocking cross-service communication with WebClient[Tutorial]

Introducing Kweb: A Kotlin library for building rich web applications