Summary
This chapter covered the main components of a modern transformer-based LLM and a quick overview of the LLM landscape as it stands today.
It detailed how text can be transformed into numeric data to be processed by ANNs. To summarize, sentences of a large text corpus are tokenized and assigned integer token IDs. Token IDs index into an embedding matrix, turning the integers into real-valued embedding vectors of fixed length. To create the targets for supervised training, the inputs are shifted by one token to the right, so that the target at each position becomes the token that follows in the sequence.
Sequential data can be learned with recurrent neural networks, but these have been superseded by transformers, which use an attention mechanism to learn which previous tokens are most relevant to predict the next. At every step in the sequence, the model predicts probabilities for each token in the vocabulary, which can be used to generate the next token.
The training...