Computer vision with PyTorch
PyTorch provides several convenient functions for computer vision, which includes convolutional layers and pooling layers. PyTorch provides Conv1d
, Conv2d
, and Conv3d
under the torch.nn
package. As it sounds, Conv1d
handles one-dimensional convolution, while Conv2d
works with two-dimensional convolution with inputs like images, and Conv3d
operates a three-dimensional convolution on inputs like videos. Obviously, this is confusing since the dimension specified never considered the depth of the input. For instance, Conv2d
handles four-dimensional input among which the first dimension would be batch size, the second dimension would be the depth of the image (in RGB channels), and the last two dimensions would be the height and width of the image.
Apart from the higher-layer functions for computer vision, torchvision
has some handy utility functions for setting up the network. We'll explore some of those in this chapter.
This chapter explains PyTorch...