Generating recipes with deep learning
A final example we will discuss is related to earlier examples in this book, on generating textual descriptions of images using GANs. A more complex version of this same problem is to generate a structured description of an image that has multiple components, such as the recipe for a food depicted in an image. This description is also more complex because it relies on a particular sequence of these components (instructions) in order to be coherent (Figure 13.12):
Figure 13.12: A recipe generated from an image of food17
As Figure 13.13 demonstrates, this "inverse cooking" problem has also been studied using generative models17 (Salvador et al.).
Figure 13.13: Architecture of a generative model for inverse cooking17
Like many of the examples we've seen in prior chapters, an "encoder" network receives an image as input, and then "decodes" using a sequence model into text representations...