Last week, Facebook AI Research (FAIR) speech team introduced the first fully convolutional speech recognition approach. Additionally, they have also open-sourced flashlight, a C++ library for machine learning and wav2letter++, a fast and simple system for developing end-to-end speech recognizers.
The current state-of-the-art-speech recognition systems are built on RNNs for acoustic or language modeling. Facebook’s newly-introduced system provides an alternative approach based solely on convolutional neural networks. This system eliminates the feature extraction step altogether as it is trained end-to-end to predict characters from the raw waveform. It uses an external convolutional language model to decode words.
The following diagram depicts the architecture of this CNN-based speech recognition system:
Apart from this CNN-based approach, Facebook released the wav2letter++ and flashlight frameworks to complement this approach and enable reproducibility.
flashlight is a C++ standalone library for machine learning. It uses the ArrayFire tensor library and features just-in-time compilation with modern C++. It targets both CPU and GPU backends to provide maximum efficiency and scale.
The wav2letter++ toolkit is built on top of flashlight and written entirely in C++. It also uses ArrayFire as its primary library for tensor operations. ArrayFire is a highly optimized tensor library that can execute on multiple backends including a CUDA GPU and CPU backed. It supports multiple audio file formats such as wav and flac. And, also supports several feature types including the raw audio, a linearly scaled power spectrum, log-Mels (MFSC) and MFCCs.
To read more in detail, check out Facebook’s official announcement.
Facebook halted its project ‘Common Ground’ after Joel Kaplan, VP, public policy, raised concerns over potential bias allegations
Facebook releases DeepFocus, an AI-powered rendering system to make virtual reality more real
The district of Columbia files a lawsuit against Facebook for the Cambridge Analytica scandal