Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

Mozilla introduces LPCNet: A DSP and deep learning-powered speech synthesizer for lower-power devices

Save for later
  • 2 min read
  • 21 Nov 2018

article-image

Yesterday, Mozilla’s Emerging Technologies group introduced a new project called LPCNet, which is a WaveRNN variant. LPCNet aims to improve the efficiency of speech synthesis by combining deep learning and digital signal processing (DSP) techniques. It can be used for Text-to-Speech (TTS), speech compression, time stretching, noise suppression, codec post-filtering, and packet loss concealment.

Why is LPCNet introduced?


Many recent neural speech synthesis algorithms have made it possible to synthesize high-quality speech and code high-quality speech at very low bitrate. These algorithms, which are often based on algorithms like WaveNet, give promising results in real-time with a high-end GPU. But LPCNet aims to perform speech synthesis on end-user devices like mobile phones, which generally do not have powerful GPUs and have a very limited battery capacity.

We do have some low complexity parametric synthesis models such as low bitrate vocoders, but their quality is a concern. Generally, they are efficient at modeling the spectral envelope of the speech using linear prediction, but no such simple model exists for the excitation. LPCNet aims to show that the efficiency of speaker-independent speech synthesis can be improved by combining newer neural synthesis techniques with linear prediction.

What mechanisms does LPCNet use?


In addition to linear prediction, it includes the following tricks:

  • Pre-emphasis/de-emphasis filters: These filters allow shaping the noise caused by the μ-law quantization. LPCNet is capable of shaping the μ-law quantization noise to be mostly inaudible.
  • Unlock access to the largest independent learning library in Tech for FREE!
    Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
    Renews at $19.99/month. Cancel anytime
  • Sparse matrices: LPCNet uses sparse matrices in the main RNN similar to WaveRNN. These block-sparse matrices consist of blocks with size 16x1 to make it easier to vectorize the products. Instead of forcing many non-zero blocks along the diagonal, as a minor improvement, all the weights on the diagonal of the matrices are kept.
  • Input embedding: Instead of feeding the inputs directly to the network, the developers have used an embedding matrix. Embedding is generally used in natural language processing, but using it for μ-law values makes it possible to learn non-linear functions of the input.


You can read more in detail about LPCNet on Mozilla’s official website.

Mozilla v. FCC: Mozilla challenges FCC’s elimination of net neutrality protection rules

Mozilla shares why Firefox 63 supports Web Components

Mozilla shares how AV1, the new open source royalty-free video codec, works