Code lab 14.3 – MM-RAG
The code for this lab can be found in the CHAPTER14-3_MM_RAG.ipynb
file in the CHAPTER14
directory of the GitHub repository.
This is a good example of when an acronym can really help us talk faster. Try to say multi-modal retrieval augmented regeneration out loud once, and you will likely want to use MM-RAG from now on! But I digress. This is a groundbreaking approach that will likely gain a lot of traction in the near future. It better represents how we as humans process information, so it must be amazing, right? Let’s start by revisiting the concept of using multiple modes.
Multi-modal
Up to this point, everything we have discussed has been focused on text: taking the text as input, retrieving text based on that input, and passing that retrieved text to an LLM that then generates a final text output. But what about non-text? As the companies building these LLMs have started to offer powerful multi-modal capabilities, how can we incorporate...