Multimodal models
So far in this book, we have looked at single-modal model architecture patterns, such as text-to-text generation, that includes QA, summarization, code generation, and so on. Let us now expand our understanding to another type of generative AI model: multimodal models.
Multimodal models are a type of model that can understand and interpret more than one modality, such as image, audio, and video, as shown in Figure 9.1.
Figure 9.1 – Multimodality
The response received from these models can also be multimodal. Behind the scenes, these FMs comprise multiple single-modal neural networks that process text, image, audio, and video separately.
Now let us look at the multimodality models that are available within Amazon Bedrock.
Stable Diffusion
Stable Diffusion is a state-of-the-art image generation model that has gained significant attention in the field of generative AI. Unlike many other image generation models, Stable...