NVIDIA brings new deep learning updates at CVPR conference

NVIDIA team has announced a new set of deep learning updates on their cloud computing software and hardware front during Computer Vision and Pattern Recognition Conference (CVPR 2018) held in Salt Lake City.

Some of the key announcements made during the CVPR conference include Apex, an early release of a new open-source PyTorch extension, NVIDIA DALI and NVIDIA nvJPEG for efficient data optimization and image decoding, Kubernetes on NVIDIA GPUs release candidate, and runtime engine TensorRT version 4.

Let’s look at some noteworthy updates made during CVPR conference:

Apex

Apex is an open-source PyTorch extension that includes all the required NVIDIA-maintained utilities to provide optimized and efficient mixed precision results and distributed training in PyTorch. This new extension helps machine learning engineers and data scientists to maximize deep learning training performance on NVIDIA Volta GPUs. The core promise of Apex is to provide up-to-date utilities to users as quickly as possible.

Some of the notable features included are:

NVIDIA PyTorch team has been inspired by the state of the art mixed precision training in tasks such as sentiment analysis, translational networks, and image classification. This has allowed them to create a set of tools to bring these methods to all levels of PyTorch users.

Apex provides mixed precision utilities which are designed to improve training speed while maintaining the accuracy and stability of training in single precision.

With Apex, you will now only require four or fewer line changes to the existing code to provide automatic loss scaling, automated execution of operations on FP16 or FP32, and automatic handling of master parameter conversion.

In order to install/use Apex in your own development environment, you will require CUDA 9, PyTorch 0.4 or later, and Python 3. The extension is still in their early release, so we can expect the modules and utilities to undergo changes. If you want to download the code and get started with the tutorials and examples, you can visit the GitHub page. You can visit the official announcement page for more details.

NVIDIA DALI and NVIDIA nvJPEG

NVIDIA is using the power of GPUs with NVIDIA DALI, which utilizes the NVIDIA nvJPEG library to work on images at greater speed. This allows one to deal with performance bottleneck issues faced during image recognition and while decoding in deep learning powered computer vision applications.

NVIDIA DALI is an open-source GPU-accelerated data augmentation and image loading library which can be used to optimize data pipelines (data optimization) of deep learning frameworks. You can refer to the GitHub page to learn more. NVIDIA nvJPEG is a GPU-accelerated library for JPEG decoding. You can download the release candidate for feedback and testing.

This new update allows deep learning practitioners and researchers to have optimized training performance on image classification models such as ResNet-50 with MXNet, TensorFlow, and PyTorch across Amazon Web Services P3 8 GPU instances or DGX-1 systems with Volta GPUs. You can refer to the official announcement page for more details.

Kubernetes on NVIDIA GPUs

NVIDIA team has announced a release candidate of Kubernetes on NVIDIA GPUs which is freely available to developers for testing. This allows the enterprise to scale up training and ease up deployment to multi-cloud GPU clusters smoothly. This will ensure automated deployment, maintenance, and proper scheduling and operations of multiple GPU accelerated containers across clusters of nodes. You can arrange the growing resources on heterogeneous GPU clusters.

To know more about this update, you can refer to the official announcement page.

TensorRT 4

This new release of inference optimizer and runtime engine adds new layers such as recurrent neural networks, multilayer perceptrons, ONNX parser, and integration with TensorFlow to ease deep learning tasks.

Moreover, it also provides the ability to execute custom neural network layers using FP16 precision and support for the Xavier SoC through NVIDIA DRIVE AI platforms. TensorRT ensures speeding up deep learning tasks such as machine translation, speech and image processing, recommender systems on GPUs. Using TensorRT across these application areas speed up the process 45x to 190x.

All members of NVIDIA registered developer program can use TensorRT 4 for free. For more detailed information about the new features and updates, you can visit the developer’s official page.