Performance issues in generative AI applications
The most obvious failures of GenAI are performance- and reliability-related issues. Since you’ve learned about accuracy in Chapter 10, Refining the Semantic Data Model to Improve Accuracy, performance in this chapter’s context means slowness. If a user asks your AI application a question and there is either no response, a metered response, or a partial response, it is typically much more apparent than if the response was hallucinated or sycophantic.
Several factors can contribute to the slowness of a GenAI application. Some of the most common causes of performance issues in GenAI are computational load, network latency, model serving strategies, and high input/output (I/O) operations.
There can be many more causes, of course. The rest of this section will explain some of these performance killers in detail and their impact on your application and users.
Computational load
As you already know, LLMs require significant...