Building Multimodal Applications with LLMs
In this chapter, we are going beyond LLMs, to introduce the concept of multimodality while building agents. We will see the logic behind the combination of foundation models in different AI domains – language, images, and audio – into one single agent that can adapt to a variety of tasks. By the end of this chapter, you will be able to build your own multimodal agent, providing it with the tools and LLMs needed to perform various AI tasks.
Throughout this chapter, we will cover the following topics:
- Introduction to multimodality and large multimodal models (LMMs)
- Examples of emerging LMMs
- How to build a multimodal agent with single-modal LLMs using LangChain