A StackGAN is named as such because it has two GANs that are stacked together to form a network that is capable of generating high-resolution images. It has two stages, Stage-I and Stage-II. The Stage-I network generates low-resolution images with basic colors and rough sketches, conditioned on a text embedding, while the Stage-II network takes the image generated by the Stage-I network and generates a high-resolution image that is conditioned on a text embedding. Basically, the second network corrects defects and adds compelling details, yielding a more realistic high-resolution image.
We can compare a StackGAN network to the work of a painter. As a painter starts working, they draw primitive shapes such as lines, circles, and rectangles. Then, they try to fill in the colors. As the painting progresses, more and more detail is added. In a StackGAN, Stage...