Synthetic testing with LLMs
At the time of this writing, LLMs have strength in overall analysis, parsing, and interpretation of technical inputs from chat prompts. Public-facing AI models are not yet trusted for “agent”-based tasks where we provide a general directive, and let the AI execute actions on our behalf with minimal errors. While LLM strength doesn’t help us much with integration testing, we can achieve moderately accurate linting and unit-level testing through synthetic means.
Instead of spinning up infrastructure or using a full emulation of local-level testing, when provided with “known good” references such as official documentation, example code, and example detections, an LLM can quickly interpret whether most detections use cases will pass or fail by testing them in a CI/CD pipeline. When asked to provide a quantitative score, however, the prompts do not seem to respond well.
But if directed to respond with a probability in...