Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon

Fabrice Bellard, the creator of FFmpeg and QEMU, introduces a lossless data compressor which uses neural networks

Save for later
  • 3 min read
  • 08 Apr 2019

article-image
Last month, Fabrice Bellard and his team published a paper named Lossless Data Compression with Neural Networks. This paper talks about lossless data compressors that use pure NN models based on Long Short-Term Memory (LSTM) and Transformer models.

How does this model work?


This lossless data compressor uses the traditional predictive approach:

  • At each time, the encoder computes the probability vector of the next symbol value with the help of the neural network model knowing all the preceding symbols.
  • The actual symbol value is then encoded using an arithmetic encoder and the model is updated of knowing the symbol value.
  • As the decoder works symmetrically, meaning that both the encoder and decoder update their model identically, there is no need to transmit the model parameters.
  • Unlock access to the largest independent learning library in Tech for FREE!
    Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
    Renews at $19.99/month. Cancel anytime
  • To improve the compression ratio and speed, a preprocessing stage was added. For small LSTM models, the team reused the text preprocessor CMIX and lstm-compress to have a meaningful comparison. The larger models used subword based preprocessor where each symbol represents a sequence of bytes.


The model uses something called arithmetic coding, a standard compression technique. This model tries to make arithmetic coding adaptive. Here’s an example by a Reddit user that explains how exactly this model works:

The rough frequency of `e' in English is about 50%. But if you just saw this partial sentence "I am going to th", the probability/frequency of `e' skyrockets to, say, 98%. In standard arithmetic coding scheme, you would still parametrize you encoder with 50% to encode the next "e" despite it's very likely (~98%) that "e" is the next character (you are using more bits than you need in this case), while with the help of a neural network, the frequency becomes adaptive.”

To ensure that both the decoder and encoder are using the exact same model, the authors have developed a custom C library called LibNC. This library is responsible for implementing the various operations needed by the models. It has no dependency on any other libraries and has a C API.

Results of the experiment


Performance of the model was evaluated against enwik8 Hutter Prize benchmark. The models show slower decompression speed, 1.5x slower for the LSTM model and 3x slower for the Transformer model. But, its description is simple and the memory consumption is reasonable as compared to other compressors giving similar compression ratio.

Speaking of the compression ratio, the models are yet to reach the performance of CMIX, a lossless data compressor that gives optimized compression ratio at the cost of high CPU/memory usage. In all the experiments, the Transformer model gives worse performance than the LSTM model although it gives the best performance in language modeling benchmarks.

To know more in detail, check out the paper, Lossless Data Compression with Neural Networks.

Microsoft open-sources Project Zipline, its data compression algorithm and hardware for the cloud

Making the Most of Your Hadoop Data Lake, Part 1: Data Compression

Interpretation of Functional APIs in Deep Neural Networks by Rowel Atienza