Optimization and Deployment of Deep Neural Networks for Heterogenous Systems
The increasing popularity of neural networks requires them to be executable on embedded systems. The latter are typically realized as heterogeneous hardware platforms and include e.g. multi-core CPUs with GPUs as well as dedicated AI accelerators. The systems are thereby limited in their resources such as memory or power consumption. This leads to the fact that common network architectures such as ResNet, VGG or U-Net can only be executed very slowly or the available possibilities are not used optimally.
In order to execute pre-trained neural networks efficiently at inference time nevertheless, they have to be optimized for the specific use case and adapted to the target hardware while maintaining the accuracy of the model. For application-specific optimization, compression methods such as quantization or pruning are used, for example, which can significantly reduce the number of necessary computational operations. Afterwards, the simplified network can be adapted to the conditions of the target platform with special programs, so called Deep Learning (DL) compilers, such as TVM or TensorRT, so that the hardware is used optimally. Substantial performance improvements can be achieved in this way. In practice, however, this is made difficult by high complexity and a lack of interoperability.
Our seminar therefore provides basic know-how about the respective tools as well as their practical usage in order to adapt neural network quickly with respect to requirements and constraints.
We will address the following topics:
- Basics of the DL compilers TensorRT and TVM
- Basics about exchange formats, e.g. ONNX
- Application of TensorRT and TVM in detail
- Practical examples
- Throughput maximization on GPU's
- Inference with reduced computational accuracy
- Best practice
- Performance profiling to identify bottlenecks
- Basics of compression tools, e.g. compression in Pytorch / TFLite, Intellabs Distiller, Microsoft Neural Network Interface (NNI)
- Interaction between compression tools and DL compilers
- Integration and deployment of neural networks in C/C++
- Target platforms, including CPU (x86/ARM), Nvidia desktop GPUs, Nvidia Jetson boards and Google Coral Edge TPU and NPU
Target group:
Our seminar is aimed at software developers who have already gained concrete experience in working with neural networks or at companies that are active in the field of AI and embedded systems or would like to become active.
The seminar materials as well as practical examples can be viewed and downloaded on our GitLab portal.