Using the new technology of voice style transfer via neural networks, it is becoming easier and easier to convincingly impersonate a target's voice. In this section, we show you how to use deep learning to have a recording of a target saying whatever you want them to say, for example, to have a target's voice used for social engineering purposes or, a more playful example, using Obama's voice to sing Beyoncé songs. We selected the architecture in mazzzystar/randomCNN-voice-transfer that allows for fast results with high quality. In particular, there is no need to pre-train the model on a large dataset of recorded audio.
In the accompanying code for this book, you will find two versions of the voice transfer neural network code, one for GPU and one for CPU. We describe here the one for CPU, though the one for GPU is very similar...