Video synthesis overview
Let's say your doorbell rings while you're watching a video, so you pause the video and go to answer the door. What would you see on your screen when you come back? A still picture where everything is frozen and not moving. If you press the play button and pause it again quickly, you will see another image that looks very similar to the previous one but with slight differences. Yes – when you play a series of images sequentially, you get a video.
We say that image data has three dimensions, or (H, W, C); video data has four dimensions, (T, H, W, C), where T is the temporal (time) dimension. It's also the case that video is just a big batch of images, except that we cannot shuffle the batch. There must be temporal consistency between the images; I'll explain this further.
Let's say we extract images from some video datasets and train an unconditional GAN to generate images from random noise input. As you can imagine, the...