Data annotation
So, you want to know how you, as a data engineer, can help a data scientist? You can begin by curating your data into indices that are semantically aligned with the object in your knowledge base. These, in turn, are correctly modeled for your business domains with the current known truths that are relevant for your enterprise. We bet you thought we would say, just build a vector store and make it available to your LLM, using the cloud provider of choice’s tool. However, you will fail to have that model propagate to production, since it would suffer from many of the failings of untuned GenAI models (such as hallucinations, quality errors, and the inappropriate exposure of training source text in model output). If you did as some of the cloud providers propose and built your RAG with an embedding, without semantic structure, you would be creating several iterations in your factory. Your vision should be to implement a knowledge-aware semantic form for embeddings...