Risks from bringing your own data
As we discussed in the previous chapter, organizations will want to tailor their LLM solutions by bringing their own data, by using RAG or fine-tuning the model.
Needless to say, fine-tuning brings its own risks, such as data poisoning, memorization, lack of access control over what is used in responses, and privacy attacks, which we have already mentioned while discussing predictive AI. We will delve into these more in the next chapter.
For now, we will focus on the risks of RAG and use our simple pricing document example to illustrate the risks. This is an oversimplified example to help us understand external content in the context of prompt injection. Typically, we would implement some restrictions on the sources we can trust. Additionally, we would tokenize external data into embeddings, and for documents, we would store embeddings as vectors in a vector database designed for this purpose. We have kept this example simple so that we can focus...