What is the ground truth?
Simply put, ground-truth data is data that represents the ideal responses you would expect if your RAG system was operating at peak performance.
As a practical example, if you had a RAG system focused on allowing someone to ask questions about the latest cancer research in veterinarian medicine for dogs, with your data source being all the latest research papers on the subject that have been submitted to PubMed, your ground truth would likely be questions and answers that could be asked and answered of that data. You would want to use realistic questions that your target audience would really ask, and the answers should be what you consider to be the ideal answer expected from the LLM. This could be somewhat objective, but nonetheless, having a set of ground-truth data to compare against the input and output of your RAG system is a critical way to help compare the impact of changes you make and ultimately make the system more effective.