Installing PyTorch

PyTorch will run on macOS X, 64 bit Linux, and 64 bit Windows. Be aware that Windows does not currently offer (easy) support for the use of GPUs in PyTorch. You will need to have either Python 2.7 or Python 3.5 / 3.6 installed on your computer before you install PyTorch, remembering to install the correct version for each Python version. Unless you have a reason not to, it is recommended that you install the Anaconda distribution of Python. This this is available from: https://anaconda.org/anaconda/python.

Anaconda includes all the dependencies of PyTorch, as well as technical, math, and scientific libraries essential to your work in deep learning. These will be used throughout the book, so unless you want to install them all separately, install Anaconda.

The following is a list of the packages and tools that we will be using in this book. They are all installed with Anaconda:

NumPy: A math library primarily used for working with multidimensional arrays
Matplotlib: A plotting and visualization library
SciPy: A package for scientific and technical computing
Skit-Learn: A library for machine learning
Pandas: A library for working with data
IPython: A notebook-style code editor used for writing and running code in a browser

Once you have Anaconda installed, you can now install PyTorch. Go to the PyTorch website at https://pytorch.org/.

The installation matrix on this website is pretty self-explanatory. Simply select your operating system, Python version, and, if you have GPUs, your CUDA version, and then run the appropriate command.

As always, it is good practice to ensure your operating system and dependent packages are up to date before installing PyTorch. Anaconda and PyTorch run on Windows, Linux, and macOS, although Linux is probably the most used and consistent operating system. Throughout this book, I will be using Python 3.7 and Anaconda 3.6.5 running on Linux

Code in this book was written on the Jupyter Notebook and these notebooks are available from the book's website.

You can either choose to set up your PyTorch environment locally on your own machine or remotely on a cloud server. They each have their pros and cons. Working locally has the advantage that it is generally easier and quicker to get started. This is especially true if you are not familiar with SSH and the Linux terminal. It is simply a matter of installing Anaconda and PyTorch, and you are on your way. Also, you get to choose and control your own hardware, and while this is an upfront cost, it is often cheaper in the long run. Once you start expanding hardware requirements, cloud solutions can become expensive. Another advantage of working locally is that you can choose and customize your integrated development envionment (IDE). In fact, Anaconda has its own excellent desktop IDE called Spyder.

There are a few things you need to keep in mind when building your own deep learning hardware and you require GPU acceleration:

Use NVIDIA CUDA-compliant GPUs (for example, GTX 1060 or GTX 1080)
A chipset that has at least 16 PCIe lanes
At least 16 GB of RAM

Working on the cloud does offer the flexibility to work from any machine as well as more easily experiment with different operating systems, platforms, and hardware. You also have the benefit of being able to share and collaborate more easily. It is generally cheap to get started, costing a few dollars a month, or even free, but as your projects become more complex and data intensive, you will need to pay for more capacity.

Let's look briefly at the installation procedures for two cloud server hosts: Digital Ocean and Amazon Web Services.

Digital Ocean

Digital Ocean offers one of the simplest entry points into cloud computing. It offers predictable simple payment structures and straightforward server administration. Unfortunately, Digital Ocean does not currently support GPUs. The functionality revolves around droplets, pre-built instances of virtual private servers. The following are the steps required to set up a droplet:

Sign up for an account with Digital Ocean. Go to https://www.digitalocean.com/.
Click on the Create button and choose New Droplet.
Select the Ubuntu distribution of Linux and choose the two gigabyte plan or above.
Select the CPU optimization if required. The default values should be fine to get started.
Optionally, set up public/private key encryption.
Set up an SSH client (for example, PuTTY) using the information contained in the email sent to you.
Connect to your droplet via your SSH client and curl the latest Anaconda installer. You can find the address location of the installer for your particular environment at https://repo.continuum.io/.
Install PyTorch using this command:

conda install pytorch torchvision -c pytorch

Once you have spun up your droplet, you can access the Linux command through an SSH client. From Command Prompt, you can curl the latest Anaconda installer available from: https://www.anaconda.com/download/#linux.

An installation script is also available from the continuum archive at https://repo.continuum.io/archive/. Full step-by-step instructions are available from the Digital Ocean tutorials section.

Tunneling in to IPython

IPython is an easy and convenient way to edit code through a web browser. If you are working on a desktop computer, you can just launch IPython and point your browser to localhost:8888. This is the port that the IPython server, Jupyter, runs on. However, if you are working on a cloud server, then a common way to work with code is to tunnel in to IPython using SSH. Tunneling in to IPython involves the following steps:

In your SSH client, set your destination port to localhost:8888. In PuTTY, go to Connection | SSH | Tunnels.
Set the source port to anything above 8000 to avoid conflicting with other services. Click Add. Save these settings and open the connection. Log in to your droplet as usual.
Start the IPython server by typing jupyter notebook into Command Prompt of your server instance.
Access IPython by pointing your browser to localhost: source port; for example, localhost:8001.
Start the IPython server.

Note that you may need a token to access the server for the first time. This is available from the command output once you start Jupyter. You can either copy the URL given in this output directly into your browser's address bar, changing the port address to your local source port address, for example: 8001, or you can elect to paste the token, the part after token=, into the Jupyter start-up page and replace it with a password for future convenience. You now should be able to open, run, and save IPython notebooks.

Amazon Web Services (AWS)

AWS is the original cloud computing platform, most noted for its highly-scalable architecture. It offers a vast array of products. What we need to begin is an EC2 instance. This can be accessed form the Services tab of the AWS control panel. From there, select EC2 and then Launch Instance. From here, you can choose the machine image you require. AWS provide several types of machine images specifically for deep learning. Feel free to experiment with any of these but the one we are going to use here is the deep learning AMI for Ubuntu version 10. It comes with pre-installed environments for PyTorch and TensorFlow. After selecting this, you get to choose other options. The default T2 micro with 2 GB of memory should be fine to experiment with; however, if you want GPU acceleration, you will need to use the T2 medium instance type. Finally, when you launch your instance, you will be prompted to create and download your public-private key pair. You can then use your SSH client to connect to the server instance and tunnel in to the Jupyter Notebook as per the previous instructions. Once again, check the documentation for the finer details. Amazon has a pay-per-resource model, so it is important you monitor what resources you are using to ensure you do not receive any unnecessary or unexpected charges.