PyTorch is a dynamic tensor-based, deep learning framework for experimentation, research, and production. It can be used as a GPU-enabled replacement for NumPy or a flexible, efficient platform for building neural networks. The dynamic graph creation and tight Python integration makes PyTorch a standout in deep learning frameworks.
If you are at all familiar with the deep learning ecosystem, then frameworks such as Theano and TensorFlow, or higher-level derivatives such as Keras, are amongst the most popular. PyTorch is a relative newcomer to the deep learning framework set. Despite this, it is now being used extensively by Google, Twitter, and Facebook. It stands out from other frameworks in that both Theano and TensorFlow encode computational graphs in static structures that need to be run in self-contained sessions. In contrast, PyTorch can dynamically implement computational graphs. The consequence for a neural net is that the network can change behavior as it is being run, with little or no overhead. In TensorFlow and Theano, to change behavior, you effectively have to rebuild the network from scratch.
This dynamic implementation comes about through a process called tape-based auto-diif, allowing PyTorch expressions to be automatically differentiated. This has numerous advantages. Gradients can be calculated on the fly and since the computational graph is dynamic, it can be changed at each function call, allowing it to be used in interesting ways in loops and under conditional calls that can respond, for example, to input parameters or intermediate results. This dynamic behavior and great flexibility has made PyTorch a favored experimental platform for deep learning.
Another advantage of PyTorch is that it is closely integrated with the Python language. For Python coders, it is very intuitive and it interoperates seamlessly with other Python packages, such as NumPy and SciPy. PyTorch is very easy to experiment with. It makes an ideal tool for not only building and running useful models, but also as a way to understand deep learning principles by direct experimentation.
As you would expect, PyTorch can be run on multiple graphical processing units (GPUs). Deep learning algorithms can be computationally expensive. This is especially true for big datasets. PyTorch has strong GPU support, with intelligent memory sharing of tensors between processes. This basically means there is an efficient and user-friendly way to distribute the processing load across the CPU and GPUs. This can make a big difference to the time it takes to test and run large complex models.
Dynamic graph generation, tight Python language integration, and a relatively simple API makes PyTorch an excellent platform for research and experimentation. However, versions prior to PyTorch 1 had deficits that prevented it from excelling in production environments. This deficiency is being addressed in PyTorch 1.
Research is an important application for deep learning, but increasingly, deep learning is being embedded in applications that run live on the web, on a device, or in a robot. Such an application may service thousands of simultaneous queries and interact with massive, dynamic data. Although Python is one of the best languages for humans to work with, specific efficiencies and optimizations are available in other languages, most commonly C++ and Java. Even though the best way to build a particular deep learning model may be with PyTorch, this may not be the best way to deploy it. This is no longer a problem because now with PyTorch 1, we can export Python free representations of PyTorch models.
This has come about through a partnership between Facebook, the major stakeholder of PyTorch, and Microsoft, to create the Open Neural Network Exchange (ONNX) to assist developers in converting neural net models between frameworks. This has led to the merging of PyTorch with the more production-ready framework, CAFFE2. In CAFFE2, models are represented by a plain text schema, making them language agnostic. This means they are more easily deployed to Android, iOS, or Rasberry Pi devices.
With this in mind, PyTorch version 1 has expanded its API included production-ready capabilities, such as optimizing code for Android and iPhone, a just in time (JIT) C++ compiler, and several ways to make Python free representations of your models.
In summary, PyTorch has the following characteristics:
- Dynamic graph representation
- Tightly integrated with the Python programming language
- A mix of high-and low-level APIs
- Straightforward implementation on multiple GPUs
- Able to build Python-free model representation for export and production
- Scales to massive data using the Caffe framework