Recipes for Post-training Quantization of Deep Neural Networks
Ashutosh Mishra, Christoffer Löffler, Axel Plinge
In: Workshop on Energy Efficient Machine Learning and Cognitive Computing; Saturday, December 05, 2020 Virtual (from San Jose, California, USA)
Given the presence of deep neural networks (DNNs) in all kinds of applications, the question of optimized deployment is becoming increasingly important. One important step is the automated size reduction of the model footprint. Of all the methods emerging, post-training quantization is one of the simplest to apply. Without needing long processing or access to the training set, a straightforward reduction of the memory footprint by an order of magnitude can be achieved. A difficult question is which quantization methodology to use and how to optimize different parts of the model with respect to different bit width. We present an in-depth analysis on different types of networks for audio, computer vision, medical and hand-held manufacturing tools use cases; Each is compressed with fixed and adaptive quantization and fixed and variable bit width for the individual tensors.
Getting AI in your pocket with deep compression
Ashutosh Mishra, Axel Plinge
Deep neural networks (DNNs) have become state-of-the-art for a wide range of applications including computer vision, speech recognition, and robotics. The superior performance often comes at the cost of high computational complexity. The process of creating and training a DNN model is difficult and labor-intense, and the resulting models rarely optimized for running on embedded devices. Automated techniques to improve energy efficiency and speed without sacrificing application accuracy are vital. Big companies showed that compressing weights or squeezing the architecture can reduce the model complexity by a factor of 20-50 while maintaining almost identical performance. The field of 'deep compression' has become a dedicated branch of research. Technical support for such optimizations is starting to be available by a growing set of tools. This talk gives an overview of deep compression techniques and tools with example applications.