Convolution layers are really good at processing images. They are capable of learning important features, such as edges, shapes, and complex objects, effectively, as shown in neural networks, such as Inception, AlexNet, Visual Geometry Group (VGG), and ResNet.
In this tutorial, we will use a DCGAN architecture to generate anime characters. We will learn to prepare the dataset for training, Keras implementation of a DCGAN for the generation of anime characters, and training the DCGAN on the anime character dataset.
The development of Deep Convolutional Generative Adversarial Networks (DCGANs) was an important step towards using CNNs for image generation. A DCGAN uses convolutional layers instead of dense layers and were proposed by researchers Alec Radford, Luke Metz, Soumith Chintala, and others, in their paper, Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. Since then, DCGANs have been widely used for various image generation tasks.
This tutorial is an excerpt taken from the book 'Generative Adversarial Networks Projects' written by Kailash Ahirwar. The book explores unsupervised techniques for training neural networks and includes seven end-to-end projects in the GAN domain.
To train a DCGAN network, we need a dataset of anime characters containing cropped faces of the characters. In this tutorial, we will be scraping images for educational and demonstration purposes only. We have scraped images from pixiv.net using a crawler tool called gallery-dl. This is a command-line tool that can be used to download image collections from websites, such as pixiv.net, exhentai.org, danbooru.donmai.us, and more. It is available at the following link: https://github.com/mikf/gallery-dl.
In this section, we will cover the different steps required to install the dependencies and download the dataset. Before executing the following commands, activate the virtual environment created for this project:
pip install --upgrade gallery-dl
pip install --upgrade https://github.com/mikf/gallery-dl/archive/master.zip
# Official gallery-dl Github repo https://github.com/mikf/gallery-dl
gallery-dl https://danbooru.donmai.us/posts?tags=face
Download images at your own risk. The information given is for educational purposes only and we don't support illegal scraping. We don't have copyright of the images, as the images are hosted by their respective owners. For commercial purposes, please contact the respective owner of the website or the content that you are using.
Before we crop or resize the images, take a look at the downloaded images:
As you see, some images contain other body parts as well, which we don't want in our training images. In the next section, we will crop out only the face part of these images. Also, we will resize all images to a size required for the training.
In this section, we will crop out faces from images. We will be using python-animeface to crop the faces from the images. This is an open source GitHub repository that automatically crops faces from images from the command line. It is publicly available at the following link: https://github.com/nya3jp/python-animeface.
Execute the following steps to crop and resize the images:
pip install animeface
import glob
import os
import animeface
from PIL import Image
total_num_faces = 0
for index, filename in enumerate(glob.glob('/path/to/directory/containing/images/*.*')):
try:
# Open image
im = Image.open(filename)
# Detect faces
faces = animeface.detect(im)
except Exception as e:
print("Exception:{}".format(e))
continue
fp = faces[0].face.pos
# Get coordinates of the face detected in the image
coordinates = (fp.x, fp.y, fp.x+fp.width, fp.y+fp.height)
# Crop image cropped_image = im.crop(coordinates)
# Resize image cropped_image = cropped_image.resize((64, 64), Image.ANTIALIAS)
cropped_image.save("/path/to/directory/to/store/cropped/images/filename.png"))
The complete code wrapped inside a Python function appears as follows:
import glob
import os
import animeface
from PIL import Image
total_num_faces = 0
for index, filename in enumerate(glob.glob('/path/to/directory/containing/images/*.*')):
# Open image and detect faces
try:
im = Image.open(filename)
faces = animeface.detect(im)
except Exception as e:
print("Exception:{}".format(e))
continue
# If no faces found in the current image
if len(faces) == 0:
print("No faces found in the image")
continue
fp = faces[0].face.pos
# Get coordinates of the face detected in the image
coordinates = (fp.x, fp.y, fp.x+fp.width, fp.y+fp.height)
# Crop image
cropped_image = im.crop(coordinates)
# Resize image
cropped_image = cropped_image.resize((64, 64), Image.ANTIALIAS)
# Show cropped and resized image
# cropped_image.show()
# Save it in the output directory
cropped_image.save("/path/to/directory/to/store/cropped/images/filename.png"))
print("Cropped image saved successfully")
total_num_faces += 1
print("Number of faces detected till now:{}".format(total_num_faces))
print("Total number of faces:{}".format(total_num_faces))
The preceding script will load all of the images from the folder containing downloaded images, detect faces using the python-animeface library, and crop out the face part from the initial image. Then, the cropped images will be resized to a size of 64 x 64. If you want to change the dimensions of the images, change the architecture of the generator and the discriminator accordingly. We are now ready to work on our network.
In this section, we will write an implementation of a DCGAN in the Keras framework. Keras is a meta-framework that uses TensorFlow or Teano as a backend. It provides high-level APIs for working with neural networks. Let's start by writing the implementation of the generator network.
The generator network consists of some 2D convolutional layers, upsampling layers, a reshape layer, and a batch normalization layer. In Keras, every operation can be specified as a layer. Even activation functions are layers in Keras and can be added to a model just like a normal dense layer.
Perform the following steps to create a generator network:
gen_model = Sequential()
gen_model.add(Dense(units=2048)) gen_model.add(Activation('tanh'))
gen_model.add(Dense(256*8*8)) gen_model.add(BatchNormalization()) gen_model.add(Activation('tanh'))
The output of the second dense layer is a tensor of a size of (16384,). Here, (256, 8, 8) is the number of neurons in the dense layer.
# Reshape layer gen_model.add(Reshape((8, 8, 256), input_shape=(256*8*8,)))
gen_model.add(UpSampling2D(size=(2, 2)))
gen_model.add(Conv2D(128, (5, 5), padding='same')) gen_model.add(Activation('tanh'))
gen_model.add(UpSampling2D(size=(2, 2)))
A 2D upsampling layer repeats the rows and columns of the tensor by a size of [0] and a size of [1], respectively.
gen_model.add(Conv2D(64, (5, 5), padding='same')) gen_model.add(Activation('tanh'))
gen_model.add(UpSampling2D(size=(2, 2)))
gen_model.add(Conv2D(3, (5, 5), padding='same')) gen_model.add(Activation('tanh'))
The generator network will output a tensor of a shape of (batch_size, 64, 64, 3). One image tensor from this batch of tensors is similar to an image of a dimension of 64 x 64 with three channels: Red, Green, and Blue (RGB).
The complete code for the generator network wrapped in a Python method looks as follows:
def get_generator():
gen_model = Sequential()
gen_model.add(Dense(input_dim=100, output_dim=2048))
gen_model.add(LeakyReLU(alpha=0.2))
gen_model.add(Dense(256 * 8 * 8))
gen_model.add(BatchNormalization())
gen_model.add(LeakyReLU(alpha=0.2))
gen_model.add(Reshape((8, 8, 256), input_shape=(256 * 8 * 8,)))
gen_model.add(UpSampling2D(size=(2, 2)))
gen_model.add(Conv2D(128, (5, 5), padding='same'))
gen_model.add(LeakyReLU(alpha=0.2))
gen_model.add(UpSampling2D(size=(2, 2)))
gen_model.add(Conv2D(64, (5, 5), padding='same'))
gen_model.add(LeakyReLU(alpha=0.2))
gen_model.add(UpSampling2D(size=(2, 2)))
gen_model.add(Conv2D(3, (5, 5), padding='same'))
gen_model.add(LeakyReLU(alpha=0.2))
return gen_model
Now we have created the generator network, let's work on creating the discriminator network.
The discriminator network has three 2D convolutional layers, each followed by an activation function followed by two max-pooling layers. The tail of the network contains two fully-connected (dense) layers that work as a classification layer.
Perform the following steps to create a discriminator network:
dis_model = Sequential()
dis_model.add(Conv2D(filters=128, kernel_size=5, padding='same', input_shape=(64, 64, 3))) dis_model.add(LeakyReLU(alpha=0.2))
dis_model.add(MaxPooling2D(pool_size=(2, 2)))
The shape of the output tensor from the first layer will be (batch_size, 32, 32, 128).
dis_model.add(Conv2D(filters=256, kernel_size=3)) dis_model.add(LeakyReLU(alpha=0.2)) dis_model.add(MaxPooling2D(pool_size=(2, 2)))
The shape of the output tensor from this layer will be (batch_size, 30, 30, 256).
dis_model.add(Conv2D(512, (3, 3))) dis_model.add(LeakyReLU(alpha=0.2)) dis_model.add(MaxPooling2D(pool_size=(2, 2)))
The shape of the output tensor from this layer will be (batch_size, 13, 13, 512).
dis_model.add(Flatten())
The output shape of the tensor from the flattened layer will be (batch_size, 18432,).
dis_model.add(Dense(1024)) dis_model.add(LeakyReLU(alpha=0.2))
dis_model.add(Dense(1)) dis_model.add(Activation('tanh'))
The network will generate an output tensor of a shape of (batch_size, 1). The output tensor contains the probability of the classes.
The complete code for the discriminator network wrapped inside a Python method looks as follows:
def get_discriminator():
dis_model = Sequential()
dis_model.add(
Conv2D(128, (5, 5),
padding='same',
input_shape=(64, 64, 3))
)
dis_model.add(LeakyReLU(alpha=0.2))
dis_model.add(MaxPooling2D(pool_size=(2, 2)))
dis_model.add(Conv2D(256, (3, 3)))
dis_model.add(LeakyReLU(alpha=0.2))
dis_model.add(MaxPooling2D(pool_size=(2, 2)))
dis_model.add(Conv2D(512, (3, 3)))
dis_model.add(LeakyReLU(alpha=0.2))
dis_model.add(MaxPooling2D(pool_size=(2, 2)))
dis_model.add(Flatten())
dis_model.add(Dense(1024))
dis_model.add(LeakyReLU(alpha=0.2))
dis_model.add(Dense(1))
dis_model.add(Activation('sigmoid'))
return dis_model
In this section, we have successfully implemented the discriminator and generator networks. Next, we will train the model on the dataset that we prepared in the Downloading and preparing the anime characters dataset section.
Again, training a DCGAN is similar to training a Vanilla GAN network. It is a four-step process:
We will work on these steps one by one in this section.
Let's start by defining the variables and the hyperparameters:
dataset_dir = "/Path/to/dataset/directory/*.*" batch_size = 128 z_shape = 100 epochs = 10000 dis_learning_rate = 0.0005 gen_learning_rate = 0.0005 dis_momentum = 0.9 gen_momentum = 0.9 dis_nesterov = True gen_nesterov = True
Here, we have specified different hyperparameters for the training. We will now see how to load the dataset for the training.
To train the DCGAN network, we need to load the dataset in memory and we need to define a mechanism to load batches of memory. Perform the following steps to load the dataset:
# Loading images all_images = [] for index, filename in enumerate(glob.glob('/Path/to/cropped/images/directory/*.*')): image = imread(filename, flatten=False, mode='RGB') all_images.append(image)
# Convert to Numpy ndarray X = np.array(all_images) X = (X - 127.5) / 127.5
Now we have loaded the dataset, next we will see how to build and compile the networks.
In this section, we will build and compile our networks required for the training:
# Define optimizers dis_optimizer = SGD(lr=dis_learning_rate, momentum=dis_momentum, nesterov=dis_nesterov) gen_optimizer = SGD(lr=gen_learning_rate, momentum=gen_momentum, nesterov=gen_nesterov)
gen_model = build_generator() gen_model.compile(loss='binary_crossentropy', optimizer=gen_optimizer)
Use binary_crossentropy as the loss function for the generator networks and gen_optimizer as the optimizer.
dis_model = build_discriminator() dis_model.compile(loss='binary_crossentropy', optimizer=dis_optimizer)
Similarly, use binary_crossentropy as the loss function for the discriminator network and dis_optimizer as the optimizer.
The code to create and compile an adversarial model is as follows:
adversarial_model = Sequential() adversarial_model.add(gen_model) dis_model.trainable = False adversarial_model.add(dis_model)
When we train this network, we don't want to train the discriminator network, so make it non-trainable before we add it to the adversarial model.
Compile the adversarial model, as follows:
adversarial_model.compile(loss='binary_crossentropy', optimizer=gen_optimizer)
Use binary_crossentropy as the loss function and gen_optimizer as the optimizer for the adversarial model.
Before starting the training, add TensorBoard to visualize the losses, as follows:
tensorboard = TensorBoard(log_dir="logs/{}".format(time.time()), write_images=True, write_grads=True, write_graph=True) tensorboard.set_model(gen_model) tensorboard.set_model(dis_model)
We will train the network for a specified number of iterations, so create a loop that should run for a specified number of epochs. Inside each epoch, we will train our networks on a mini-batch of a size of 128. Calculate the number of batches that need to be processed:
for epoch in range(epcohs): print("Epoch is", epoch) number_of_batches = int(X.shape[0] / batch_size) print("Number of batches", number_of_batches) for index in range(number_of_batches):
We will now take a closer look at the training process. The following points explain the different steps involved in the training of DCGAN:
Perform the following steps to train the discriminator network:
z_noise = np.random.normal(0, 1, size=(batch_size, z_shape))
To sample the values, use the normal() method from the np.random module in the Numpy library.
image_batch = X[index * batch_size:(index + 1) * batch_size]
generated_images = gen_model.predict_on_batch(z_noise)
y_real = np.ones(batch_size) - np.random.random_sample(batch_size) * 0.2 y_fake = np.random.random_sample(batch_size) * 0.2
dis_loss_real = dis_model.train_on_batch(image_batch, y_real)
dis_loss_fake = dis_model.train_on_batch(generated_images, y_fake)
d_loss = (dis_loss_real+dis_loss_fake)/2 print("d_loss:", d_loss)
Up until now, we have been training the discriminator network. In the next section, let's train the generator network.
To train the generator network, we have to train the adversarial model. When we train the adversarial model, it trains the generator network only but freezes the discriminator network. We won't train the discriminator network, as we have already trained it. Perform the following steps to train the adversarial model:
z_noise = np.random.normal(0, 1, size=(batch_size, z_shape))
g_loss = adversarial_model.train_on_batch(z_noise, [1] * batch_size)
We train the adversarial model on the batch of noise vectors and real labels. Here, real labels is a vector with all values equal to 1. We are also training the generator to fool the discriminator network. To do this, we provide it with a vector that has all the values equal to 1. In this step, the generator will receive feedback from the generator network and improve itself accordingly.
print("g_loss:", g_loss)
There is a passive method to evaluate the training process. After every 10 epochs, generate fake images and manually check the quality of the images:
if epoch % 10 == 0:
z_noise = np.random.normal(0, 1, size=(batch_size, z_shape))
gen_images1 = gen_model.predict_on_batch(z_noise)
for img in gen_images1[:2]:
save_rgb_img(img, "results/one_{}.png".format(epoch))
These images will help you to decide whether to continue the training or to stop it early. Stop the training if quality of the generated high-resolution images is good; else, continue the training until your model becomes good. After this step, we then further evaluate the trained model and visualize the generated images.
We have successfully trained a DCGAN network on the ANIME character dataset. Now we can use the model to generate images of anime characters.
To summarize, in this tutorial, we looked at the different steps required to download and prepare the dataset. We then prepared a Keras implementation of the network and trained it on our dataset. If you enjoyed the tutorial and want to explore how to further evaluate the trained model, and optimize the networks by optimizing the hyperparameters, be sure to check out the book 'Generative Adversarial Networks Projects'.
Generative Adversarial Networks: Generate images using Keras GAN [Tutorial]
What you need to know about Generative Adversarial Networks
Generative Adversarial Networks (GANs): The next milestone In Deep Learning