Speech-to-text frameworks and toolkits
Many cloud-based AI providers offer speech to text as a service:
- Amazon's offering for speech recognition is known as Amazon Transcribe. Amazon Transcribe allows transcription of the audio files stored in Amazon S3 in four different formats:
.flac
,.wav
,.mp4
, and.mp3
. It allows an audio file with a maximum of two hours in length and 1 GB in size. The results of the transcription are created as a JSON file in an Amazon S3 bucket. - Google offers speech to text as part of its Google Cloud ML Services. Google Cloud Speech to Text supports
FLAC
,Linear16
,MULAW
,AMR
,AMR_WB
, andOGG_OPUS
file formats. - Microsoft offers a speech to text API as part of its Azure Cognitive Services platform, known as Speech Service SDK. The Speech Service SDK integrates with rest of the Microsoft APIs to transcribe recorded audio. It only allows the WAV or PCM file format with a single channel and sample rate of 6 kHz.
- IBM offers a speech to text API as part if its Watson platform...