Defining memory and embeddings
LLMs provided by AI services such as OpenAI are stateless, meaning they don’t retain any memory of previous interactions. When you submit a request, the request itself contains all the information the model will use to respond. Any previous requests you submitted have already been forgotten by the model. While this stateless nature allows for many useful applications, some situations require the model to consider more context across multiple requests.
Despite their immense computing power, most LLMs can only work with small amounts of text, about one page at a time, although this has been increasing recently — the new GPT-4 Turbo, released in November 2023, can receive 128,000 tokens as input, which is about 200 pages of text. Sometimes, however, there are applications that require a model to consider more than 200 pages of text — for example, a model that answers questions about a large collection of academic papers.
Memories...