Beyond Language Models
Insofar, we have only be covering language-specific foundation models, as it also being the focus of this book. Nevertheless, in the context of AI-powered applications, it is worth mentioning that there are additional foundation models that can handle data different from text, that can be embedded and orchestrated.Below you can find a sample of Large Foundation Models available in the market today:
- Whisper. It is is a general-purpose speech recognition model developed by OpenAI that can transcribe and translate speech in multiple languages. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, spoken language identification, and voice activity detection.
- Midjourney. Developed by the homonymous independent research lab, Midjourney is based on a Transformer sequence-to-sequence model that takes text prompts and outputs a set of four images that match the prompts. Midjourney...