Deep learning network is known for its deep and complex structure, parameters can be millions, and so does the model size. However, for this mobile era, everything needs to be light and and instant even with the fast improvement in terms of hardware and CPU/GPU capability. Customers expect advanced applications happen on the go, and on the device without any personal information been uploaded to some server or cloud.
The most important characteristics of deep learning has unfortunately became the hurdle for fast, online, mobile applications. There are many real-time applications, mobile applications, wearable applications urging for the progress of portable deep learning, i.e. a advanced system with limited resources, for example memory, CPU, energy and bandwidth.
Deep compression significantly reduces the computation and storage required by neural networks....