Example designs of state-of-the-art LLMs
In this part, we are going to dig more into the design and architecture of some of the newest LLMs at the time of writing this book.
GPT-3.5 and ChatGPT
The core of ChatGPT is a Transformer, a type of model architecture that uses self-attention mechanisms to weigh the relevance of different words in the input when making predictions. It allows the model to consider the full context of the input when generating a response.
The GPT model
ChatGPT is based on the GPT version of the Transformer. The GPT models are trained to predict the next word in a sequence of words, given all the previous words. They process text from left to right (unidirectional context), which makes them well-suited for text generation tasks. For instance, GPT-3, one of the versions of GPT on which ChatGPT is based, contains 175 billion parameters.
Two-step training process
The training process for ChatGPT is done in two steps: pretraining and fine-tuning...