The best way to understand the next snippet is to have a look at the VGG19 architecture. Here is a good place: https://github.com/fchollet/deep-learning-models/blob/master/vgg19.py (about half way down the page).
Here, you will see that VGG19 is a fairly straightforward architecture, consisting of blocks of convolutional layers with a max pooling layer at the end of each block.
For the content layer, we use the second convolutional layer in block5. This highest block is used because the earlier blocks have feature maps more representative of individual pixels; higher layers in the network capture the high-level content in terms of objects and their arrangement in the input image, but do not constrain the actual exact pixel values of the reconstruction (see Gatys et al, 2015, https://arxiv.org/abs/1508.06576, cited previously).
For the style layers...