Evaluating and Monitoring Models with Amazon Bedrock
To work with the best-performing model for your generative AI solution, you need to evaluate the models that are available to you. This chapter explores various techniques to assess the performance of different models.
The chapter introduces two primary evaluation methods provided by Amazon Bedrock: automatic model evaluation and human evaluation. We will do a detailed walk-through of these two methods. In addition, we will look at open source tools such as Foundation Models Evaluation (FMEval) and RAG Assessment (Ragas) for model evaluation and evaluating RAG pipelines.
The second part of the chapter goes into monitoring. We will explore how to leverage Amazon CloudWatch for real-time monitoring of model performance, latency, and token counts. We will further look at model invocation logging to capture requests, responses, and metadata for model invocations. Furthermore, we will highlight the integration of Amazon Bedrock...