Why data is important
Neural networks work by taking data that is known and processing it in order to train the deepfake AI (see Chapter 1, Surveying Deepfakes, for an explanation of the whole process). We call this set of data, simply enough, a dataset. To create a dataset, the data has to be processed and prepared for the neural network so that it has something to train with. In the case of deepfakes, we use faces, which need to be detected, aligned, and cleaned in order to create an effective dataset.
Without a properly formatted and prepared dataset, the neural network simply cannot be trained. There is another potential problem when it comes to generative networks like deepfakes – a poor quality dataset leads to poor swaps. Unfortunately, it’s hard to know at the beginning whether a dataset will produce a good swap or not. This is a skill that takes time to learn, and your first few deepfakes are unlikely to turn out well as you learn the importance of data.
...