Knowing where to go from here
AI orchestration is one of the new frontiers in AI. LLMs have made generative AI scenarios possible that were hard to imagine a decade ago. But, as we’ve seen, AI orchestration is not perfect. It can have a real financial cost, it can be sensitive to how you describe your capabilities, it can take additional time to execute, it can get answers to questions wrong, and it can be sensitive to change.
I believe we still need to see a few key innovations in AI orchestration.
First, we need to mature our testing practices for generative AI systems. Because the output of an LLM is non-deterministic and will change from request to request, interactions with LLMs are inherently hard to test in an automated manner.
There are some exciting innovations in this space, such as prompt flow, which helps automate the evaluation of LLM responses in an automated way. There are also a few LLM testing frameworks, such as DeepEval, which aims to test LLM outputs...