PyTorch is the Python deep learning framework and it's getting a lot of traction lately. PyTorch is the Python implementation of Torch, which uses Lua. It is backed by Facebook and is fast thanks to GPU-accelerated tensor computations. A huge benefit of using PyTorch over other frameworks is that graphs are created on the fly and are not static. This means networks are dynamic and you can adjust your network without having to start over again. As a result, the graph that is created on the fly can be different for each example. PyTorch supports multiple GPUs and you can manually set which computation needs to be performed on which device (CPU or GPU).
Using PyTorch’s dynamic computation graphs for RNNs
How to do it...
- First, we install PyTorch in our Anaconda environment, as follows:
conda install pytorch torchvision cuda80 -c soumith
If you want to install PyTorch on another platform, you can have a look at the PyTorch website for clear guidance: http://pytorch.org/.
- Let's import PyTorch into our Python environment:
import torch
- While Keras provides higher-level abstraction for building neural networks, PyTorch has this feature built in. This means one can build with higher-level building blocks or can even build the forward and backward pass manually. In this introduction, we will use the higher-level abstraction. First, we need to set the size of our random training data:
batch_size = 32
input_shape = 5
output_shape = 10
- To make use of GPUs, we will cast the tensors as follows:
torch.set_default_tensor_type('torch.cuda.FloatTensor')
This ensures that all computations will use the attached GPU.
- We can use this to generate random training data:
from torch.autograd import Variable
X = Variable(torch.randn(batch_size, input_shape))
y = Variable(torch.randn(batch_size, output_shape), requires_grad=False)
- We will use a simple neural network having one hidden layer with 32 units and an output layer:
model = torch.nn.Sequential(
torch.nn.Linear(input_shape, 32),
torch.nn.Linear(32, output_shape),
).cuda()
We use the .cuda() extension to make sure the model runs on the GPU.
- Next, we define the MSE loss function:
loss_function = torch.nn.MSELoss()
- We are now ready to start training our model for 10 epochs with the following code:
learning_rate = 0.001
for i in range(10):
y_pred = model(x)
loss = loss_function(y_pred, y)
print(loss.data[0])
# Zero gradients
model.zero_grad()
loss.backward()
# Update weights
for param in model.parameters():
param.data -= learning_rate * param.grad.data
The PyTorch framework gives a lot of freedom to implement simple neural networks and more complex deep learning models. What we didn't demonstrate in this introduction, is the use of dynamic graphs in PyTorch. This is a really powerful feature that we will demonstrate in other chapters of this book.