Installing and loading Theano
In this section, we'll install Theano, run it on the CPU and GPU devices, and save the configuration.
Conda package and environment manager
The easiest way to install Theano is to use conda
, a cross-platform package and environment manager.
If conda
is not already installed on your operating system, the fastest way to install conda
is to download the miniconda
installer from https://conda.io/miniconda.html. For example, for conda under Linux 64 bit and Python 2.7
, use this command:
wget https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh chmod +x Miniconda2-latest-Linux-x86_64.sh bash ./Miniconda2-latest-Linux-x86_64.sh
Conda enables us to create new environments in which versions of Python (2 or 3) and the installed packages may differ. The conda
root environment uses the same version of Python as the version installed on the system on which you installed conda
.
Installing and running Theano on CPU
Let's install Theano:
conda install theano
Run a Python session and try the following commands to check your configuration:
>>> from theano import theano >>> theano.config.device 'cpu' >>> theano.config.floatX 'float64' >>> print(theano.config)
The last command prints all the configuration of Theano. The theano.config
object contains keys to many configuration options.
To infer the configuration options, Theano looks first at the ~/.theanorc
file, then at any environment variables that are available, which override the former options, and lastly at the variable set in the code that are first in order of precedence:
>>> theano.config.floatX='float32'
Some of the properties might be read-only and cannot be changed in the code, but floatX
, which sets the default floating point precision for floats, is among the properties that can be changed directly in the code.
Note
It is advised to use float32
since GPU has a long history without float64
. float64
execution speed on GPU is slower, sometimes much slower (2x to 32x on latest generation Pascal hardware), and float32
precision is enough in practice.
GPU drivers and libraries
Theano enables the use of GPU, units that are usually used to compute the graphics to display on the computer screen.
To have Theano work on the GPU as well, a GPU backend library is required on your system.
The CUDA library (for NVIDIA GPU cards only) is the main choice for GPU computations. There is also the OpenCL standard, which is open source but far less developed, and much more experimental and rudimentary on Theano.
Most scientific computations still occur on NVIDIA cards at the moment. If you have an NVIDIA GPU card, download CUDA from the NVIDIA website, https://developer.nvidia.com/cuda-downloads, and install it. The installer will install the latest version of the GPU drivers first, if they are not already installed. It will install the CUDA library in the /usr/local/cuda
directory.
Install the cuDNN library, a library by NVIDIA, that offers faster implementations of some operations for the GPU. To install it, I usually copy the /usr/local/cuda
directory to a new directory, /usr/local/cuda-{CUDA_VERSION}-cudnn-{CUDNN_VERSION}
, so that I can choose the version of CUDA and cuDNN, depending on the deep learning technology I use and its compatibility.
In your .bashrc
profile, add the following line to set the $PATH
and $LD_LIBRARY_PATH
variables:
export PATH=/usr/local/cuda-8.0-cudnn-5.1/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda-8.0-cudnn-5.1/lib64:/usr/local/cuda-8.0-cudnn-5.1/lib:$LD_LIBRARY_PATH
Installing and running Theano on GPU
N-dimensional GPU arrays have been implemented in Python in six different GPU libraries (Theano/CudaNdarray,PyCUDA
/ GPUArray,CUDAMAT
/ CUDAMatrix
, PYOPENCL
/GPUArray
, Clyther
, Copperhead
), are a subset of NumPy.ndarray
. Libgpuarray
is a backend library to have them in a common interface with the same property.
To install libgpuarray
with conda
, use this command:
conda install pygpu
To run Theano in GPU mode, you need to configure the config.device
variable before execution since it is a read-only variable once the code is run. Run this command with the THEANO_FLAGS
environment variable:
THEANO_FLAGS="device=cuda,floatX=float32" python >>> import theano Using cuDNN version 5110 on context None Mapped name None to device cuda: Tesla K80 (0000:83:00.0) >>> theano.config.device 'gpu' >>> theano.config.floatX 'float32'
The first return shows that GPU device has been correctly detected, and specifies which GPU it uses.
By default, Theano activates CNMeM, a faster CUDA memory allocator. An initial pre-allocation can be specified with the gpuarra.preallocate
option. At the end, my launch command will be as follows:
THEANO_FLAGS="device=cuda,floatX=float32,gpuarray.preallocate=0.8" python >>> from theano import theano Using cuDNN version 5110 on context None Preallocating 9151/11439 Mb (0.800000) on cuda Mapped name None to device cuda: Tesla K80 (0000:83:00.0)
The first line confirms that cuDNN is active, the second confirms memory pre-allocation. The third line gives the default context name (that is, None
when flag device=cuda
is set) and the model of GPU used, while the default context name for the CPU will always be cpu
.
It is possible to specify a different GPU than the first one, setting the device to cuda0
, cuda1
,... for multi-GPU computers. It is also possible to run a program on multiple GPU in parallel or in sequence (when the memory of one GPU is not sufficient), in particular when training very deep neural nets, as for classification of full images as described in Chapter 7, Classifying Images with Residual Networks. In this case, the contexts=dev0->cuda0;dev1->cuda1;dev2->cuda2;dev3->cuda3
flag activates multiple GPUs instead of one, and designates the context name to each GPU device to be used in the code. Here is an example on a 4-GPU instance:
THEANO_FLAGS="contexts=dev0->cuda0;dev1->cuda1;dev2->cuda2;dev3->cuda3,floatX=float32,gpuarray.preallocate=0.8" python >>> import theano Using cuDNN version 5110 on context None Preallocating 9177/11471 Mb (0.800000) on cuda0 Mapped name dev0 to device cuda0: Tesla K80 (0000:83:00.0) Using cuDNN version 5110 on context dev1 Preallocating 9177/11471 Mb (0.800000) on cuda1 Mapped name dev1 to device cuda1: Tesla K80 (0000:84:00.0) Using cuDNN version 5110 on context dev2 Preallocating 9177/11471 Mb (0.800000) on cuda2 Mapped name dev2 to device cuda2: Tesla K80 (0000:87:00.0) Using cuDNN version 5110 on context dev3 Preallocating 9177/11471 Mb (0.800000) on cuda3 Mapped name dev3 to device cuda3: Tesla K80 (0000:88:00.0)
To assign computations to a specific GPU in this multi-GPU setting, the names we choose, dev0
, dev1
, dev2
, and dev3
, have been mapped to each device (cuda0
, cuda1
, cuda2
, cuda3
).
This name mapping enables to write codes that are independent of the underlying GPU assignments and libraries (CUDA or others).
To keep the current configuration flags active at every Python session or execution without using environment variables, save your configuration in the ~/.theanorc
file as follows:
[global] floatX = float32 device = cuda0 [gpuarray] preallocate = 1
Now you can simply run python
command. You are now all set.