Multimodal models
Given that LLMs, as the name suggests, focus on language, the primary modality for many of the popular LLMs you are likely interacting with at the time of writing this in early 2024 is likely to be text. However, the concept of GenAI also goes beyond text and expands into other modalities, such as pictures, sound, and video.
The term multimodal is becoming ever more prominent within the field of GenAI. In this section, we explore what that means, starting with a definition of “modality.”
Definition
The Oxford English Dictionary defines modality as:
“Those aspects of a thing which relate to its mode, or manner or state of being, as distinct from its substance or identity; the non-essential aspect or attributes of a concept or entity. Also: a particular quality or attribute denoting the mode or manner of being of something.”
Oxford English Dictionary, s.v. “modality (n.), sense 1.a,” December 2023, https://doi...