Assessing the limitations of generative AI
Generative AIs like those used in deepfakes are not a panacea and actually have some significant limitations. However, by knowing about these limitations, they can generally be worked around or sidestepped with careful design.
Resolution
Deepfakes are limited in the resolution that they can swap. This is a hardware and time limitation: greater hardware and more time can provide higher resolution swaps. However, this is not a 1:1 linear growth. Doubling the resolution (from, say, 64x64 to 128x128) actually quadruples the amount of required VRAM – that is, the memory that a GPU has direct access to – and the time necessary to train is expanded a roughly equivalent amount. Because of this, resolution is often a balancing act, where you’ll want to make the deepfake the lowest resolution you can without sacrificing the results.
Training required for each face pair
To provide the best results, traditional deepfakes require that you train on every face pair that you wish to swap. This means that if you wanted to swap your own face with two of your friends, you’d have to train two separate models. This is because each model has one encoder and two decoders, which are trained only to swap the faces they were given.
There is a workaround to some multi-face swaps. In order to swap additional faces, you could write your own version with more than two decoders allowing you to swap additional faces. This is an imperfect solution, however, as each decoder takes up a significant amount of VRAM, requiring you to balance the number of faces carefully.
It may be better to simply train multiple pairs. By splitting the task up on multiple computers, you could train multiple models simultaneously, allowing you to create many face pairs at once.
Another option is to use a different type of AI face replacement. First Order Model (which is covered in the Looking at existing deepfake software section of this chapter) uses a different technique: instead of a paired approach, it uses AI to animate an image to match the actions of a replacement. This solution removes the need to retrain on each face pair, but comes at the cost of greatly reduced quality of the swap.
Training data
Generative AIs requires a significant amount of training data to accomplish their tasks. Sometimes, finding sufficient data or data of a high-enough quality is not possible. For example, how would someone create a deepfake of William Shakespeare when there are no videos or photographs of him? This is a tricky problem but can be worked around in several ways. While it is unfortunately impossible to create a proper deepfake of England’s greatest playwright, it would be possible to use an actor who looks like his portraits and then deepfake that actor as Shakespeare.
Tip
We will cover more on how to deal with poor or insufficient data in Chapter 3, Mastering Data.
Finding sufficient data (or clever workarounds) is the most difficult challenge that any data scientist faces. Occasionally, there simply is no way to get sufficient data. This is when you might need to re-examine the video to see whether there is another way to shoot it to avoid the lack of data, or you might try using other sources of similar data to patch the gaps. Sometimes, just knowing the limitations in advance can prevent a problem – other times, a workaround in the last minutes may be enough to save a project from failure.
While everyone should know the data limitations, knowing the limitations of the process is only for experts. If you are only looking to use deepfakes, you’ll probably use existing software. Let’s explore those next.