Yesterday, Mozilla’s Emerging Technologies group introduced a new project called LPCNet, which is a WaveRNN variant. LPCNet aims to improve the efficiency of speech synthesis by combining deep learning and digital signal processing (DSP) techniques. It can be used for Text-to-Speech (TTS), speech compression, time stretching, noise suppression, codec post-filtering, and packet loss concealment.
Many recent neural speech synthesis algorithms have made it possible to synthesize high-quality speech and code high-quality speech at very low bitrate. These algorithms, which are often based on algorithms like WaveNet, give promising results in real-time with a high-end GPU. But LPCNet aims to perform speech synthesis on end-user devices like mobile phones, which generally do not have powerful GPUs and have a very limited battery capacity.
We do have some low complexity parametric synthesis models such as low bitrate vocoders, but their quality is a concern. Generally, they are efficient at modeling the spectral envelope of the speech using linear prediction, but no such simple model exists for the excitation. LPCNet aims to show that the efficiency of speaker-independent speech synthesis can be improved by combining newer neural synthesis techniques with linear prediction.
In addition to linear prediction, it includes the following tricks:
You can read more in detail about LPCNet on Mozilla’s official website.
Mozilla v. FCC: Mozilla challenges FCC’s elimination of net neutrality protection rules
Mozilla shares why Firefox 63 supports Web Components
Mozilla shares how AV1, the new open source royalty-free video codec, works