Using memory within chats and LLMs
As we have seen before, models have a size limit called a context window. The size limit includes both the prompt with the user request and the response. The default context window for a model such as GPT-3.5, for example, is 4,096 bytes, meaning that both your prompt, including the user request, and the answer that GPT-3.5 provides can have at most 4,096 bytes; otherwise, you will get an error, or the response will cut off in the middle.
If your application uses a lot of text data, for example, a 10,000-page operating manual, or allows people to search and ask questions about a database of hundreds of documents with each one having 50 pages, you need to find a way of including just the relevant portion of this large dataset with your prompt. Otherwise, the prompt alone could be larger than the context window, resulting in an error, or the remaining context window could be so short that there would be no space for the model to provide a good answer...