Agents for Amazon Bedrock
One of the powerful capabilities offered by Amazon Bedrock is the ability to build and configure autonomous agents within your applications. These agents act as intelligent assistants, helping end users complete tasks based on organizational data and user input. Agents orchestrate interactions between FMs (LLMs), data sources, software applications, and user conversations. They can automatically call APIs to take actions and invoke knowledge bases to supplement information for these actions. By integrating agents, developers can save weeks of development effort and accelerate the delivery of GenAI applications.
Agents on Amazon Bedrock are designed to automate tasks for customers and provide intelligent responses to their questions. For example, you could create an agent that assists customers in processing insurance claims or making travel reservations. The beauty of agents is that you don’t have to worry about provisioning capacity, managing infrastructure, or writing custom code from scratch. Amazon Bedrock handles the complexities of prompt engineering, memory management, monitoring, encryption, user permissions, and API invocation.
Agents on Amazon Bedrock perform the following key tasks:
- Extend FMs: Agents leverage LLMs to understand user requests and break down complex tasks into smaller, manageable steps.
- Collect additional information: Through natural conversation, agents can gather additional information from users to fulfill their requests effectively.
- Take actions: Agents can make API calls to your company’s systems to perform actions and fulfill customer requests.
- Augment performance and accuracy: By querying data sources and knowledge bases, agents can enhance their performance and provide more accurate responses.
In order to harness the power of Agents for Amazon Bedrock, developers follow a straightforward process:
- Create a knowledge base to store your organization’s private data, which can be used to enhance the agent’s performance and accuracy. This step is optional because not all agents require access to private organizational data to carry out their assigned objectives. If the agent’s tasks and objectives do not depend on or benefit significantly from access to such data, creating a knowledge base may not be necessary. It depends on the specific use case and requirements of the agent being developed.
- Configure an agent for your specific use case, defining the actions it can perform. Lambda functions, written in your preferred programming language, dictate how the agent handles these actions. This is an optional step as an agent doesn’t necessarily require an action group to be created.
- Associate the agent with a knowledge base to augment its capabilities further.
- Customize the agent’s behavior by modifying prompt templates for preprocessing, orchestration, knowledge-base response generation, and postprocessing steps. Note that not all agents require extensive modification of prompt templates for their goal. The need for customization depends on the complexity of the tasks the agent is expected to perform and the level of control and fine-tuning desired by developers. For simpler tasks or generic use cases, the default prompt templates may suffice, making extensive customization unnecessary.
- Test the agent using the Amazon Bedrock console or API calls, modifying configurations as necessary. Utilize traces to gain insights into the agent’s reasoning process at each step of its orchestration.
- When the agent is ready for deployment, create an alias that points to a specific version of the agent.
- Integrate your application with the agent alias, enabling seamless API calls and interactions.
- Iterate on the agent as needed, creating new versions and aliases to adapt to changing requirements.
Throughout the development process, Amazon Bedrock handles the complexities of prompt engineering, memory management, monitoring, encryption, user permissions, and API invocation, allowing you to focus on building intelligent agents tailored to your specific use cases.
Unveiling the inner workings of GenAI agents with Amazon Bedrock
When delving into the realm of Amazon Bedrock, one encounters a powerful toolset designed to facilitate the creation and management of intelligent agents. This toolset is composed of two distinct categories of API operations, each serving a specific purpose in the agent’s life cycle:
- The first category, aptly termed build-time API operations, enables developers to construct, configure, and oversee their agents and their associated resources. These operations act as the foundational building blocks, enabling the creation of agents tailored to specific requirements and objectives. Through these APIs, developers can fine-tune various aspects of their agents, ensuring they are equipped with the necessary capabilities to tackle the tasks at hand. More details on build-time API operations are listed here: https://docs.aws.amazon.com/bedrock/latest/APIReference/API_Operations_Agents_for_Amazon_Bedrock.html
- The second category, runtime API operations, breathes life into agents, allowing them to interact with user input and initiate an intricate orchestration process to accomplish their designated tasks. When a user provides input, these APIs enable the agent to process and interpret the information, triggering a sequence of actions that ultimately lead to the desired outcome.
Now, let us dive into build-time and runtime configurations.
Build-time configuration
During the build phase, an agent is assembled from the following key components:
- FM: You select a pre-trained language model that the agent employs to interpret user input, generate responses, and guide its decision-making process.
- Instructional prompts: You craft instructions that delineate the agent’s purpose and desired behavior. With advanced prompting techniques, you can dynamically tailor these instructions at each stage of the agent’s workflow and incorporate custom logic through serverless functions.
- Action groups: You define actions the agent can perform by providing the following:
- An OpenAPI schema specification that outlines the operations the agent can invoke.
- A serverless function that executes the specified API operation based on the agent’s input and returns the result.
- Knowledge bases: You can associate knowledge bases with the agent, allowing it to retrieve relevant context to enhance its response generation and decision-making capabilities.
- Prompt templates: The orchestrator exposes default prompt templates used during various stages of the agent’s life cycle, such as preprocessing input, orchestrating actions, querying knowledge bases, and postprocessing outputs. You can customize these templates to modify the agent’s behavior or disable specific stages as needed.
During the build process, these components are combined to create base prompts that guide the agent’s orchestration flow until the user’s request is fulfilled. With advanced prompting techniques, you can augment these base prompts with additional logic, examples, and metadata to improve the agent’s accuracy and performance at each stage of its invocation. After configuring the agent’s components and security settings, you can prepare the agent for deployment and testing in a runtime environment, as shown in Figure 10.3:
Figure 10.3 – Build-time API operations for Agent creation
Runtime process
At the heart of this runtime process lies the InvokeAgent
API operation, a powerful conductor that sets the agent sequence in motion. The agent’s performance unfolds in three harmonious acts: preprocessing, orchestration, and postprocessing.
Act I – Preprocessing
Before the curtains rise, the preprocessing phase meticulously manages how the agent contextualizes and categorizes user input. This crucial step can also validate the input, ensuring a seamless transition to the subsequent stages.
Act II – Orchestration – the grand performance
The orchestration phase is where the true magic unfolds, a symphonic interplay of interpretation, invocation, and knowledge synthesis. This act consists of the following movements:
- Interpretation: The agent deftly interprets the user input with an FM, generating a rationale that lays out the logical path for the next steps.
- Invocation and synthesis: Like a skilled conductor, the agent invokes action groups and queries knowledge bases, retrieving additional context and summarizing the data to augment its generation capabilities.
- Observation and augmentation: From the invoked action groups and summarized knowledge-base results, the agent generates an output, known as an observation. This observation is then used to enrich the base prompt, which is subsequently interpreted by the FM. The agent then determines if further orchestration iterations are necessary.
This iterative loop continues until the agent delivers its final response to the user or requires additional information from the user.
Throughout the orchestration phase, the base prompt template is augmented with agent instructions, action groups, and knowledge bases, creating a rich tapestry of information. This enhanced base prompt is then fed into the FM, which predicts the optimal trajectory to fulfill the user’s request. At each iteration, the FM selects the appropriate API operation or knowledge-base query, resulting in a responsive and contextually accurate output.
Act III – Postprocessing – the finale
In the final act, the postprocessing phase, the agent formats the culmination of its efforts – the final response to be returned to the user. However, this step can be gracefully bypassed, leaving the performance open to interpretation.
During the agent’s performance, users have the option to invoke a trace at runtime, unlocking a window into the agent’s thought process. This trace meticulously tracks the agent’s rationale, actions, queries, and observations at each step of the sequence. It includes the full prompts sent to the FM, as well as outputs from the model, API responses, and knowledge-base queries. By examining this trace, users can gain invaluable insights into the agent’s reasoning, paving the way for continuous improvement and refinement.
As the user’s session with the agent continues through successive InvokeAgent
requests, the conversation history is diligently preserved, continually augmenting the orchestration base prompt template with context. This enrichment process aids in enhancing the agent’s accuracy and performance, forging a symbiotic relationship between the user and the AI.
The agent’s process during runtime is a captivating interplay of interpretation, synthesis, and adaptation, as showcased in Figure 10.4:
Figure 10.4 – Runtime process flow for Agent workflow
Advancing reasoning capabilities with GenAI – a primer on ReAct
GenAI models have demonstrated splendid capabilities in processing and generating human-like text, but their ability to reason through complex tasks and provide step-by-step solutions remains a challenge. Yao et. al have developed a technique called ReAct, as articulated in the paper ReAct: Synergizing Reasoning and Acting in Language Models (https://arxiv.org/abs/2210.03629), to enhance the reasoning abilities of these models, enabling them to systematically approach and solve user-requested tasks.
The ReAct technique involves structuring prompts that guide the model through a sequence of reasoning steps and corresponding actions. These prompts consist of a series of question-thought-action-observation examples, where the following applies:
- The question represents the user-requested task or problem to be solved
- The thought is a reasoning step that demonstrates how to approach the problem and identify a potential action
- The action is an API call or function that the model can invoke from a predefined set of allowed operations
- The observation is the result or output obtained from executing the chosen action
The set of allowed actions is defined by instructions prepended to the example prompt text. This structured approach encourages the model to engage in a step-by-step reasoning process, breaking down complex tasks into smaller, actionable steps.
To illustrate the construction of a ReAct prompt, consider the following example prompt structure with question-thought-action-observation sequences:
Example 1:
- Question: What is the optimal inventory level to minimize stockouts?
- Thought: To avoid stockouts, we must balance inventory levels based on demand forecasts and reorder points.
- Action: Invoke the
optimizeInventoryLevels
function using historical sales data and demand projections. - Observation: Maintaining inventory at 80% of forecasted demand reduced stockouts by 30% while optimizing carrying costs.
Example 2:
- Question: How can we improve customer satisfaction ratings?
- Thought: To enhance satisfaction, we should analyze feedback data and implement targeted improvements.
- Action: Execute the
analyzeCustomerFeedback
API to identify trends and insights. - Observation: Based on the analysis, implementing personalized customer support led to a 20% increase in satisfaction scores.
These examples demonstrate how the ReAct technique guides the model through reasoning steps, leading to actionable outcomes.
While the process of manually crafting these prompts can be time-consuming and intricate, the Amazon Bedrock Agent streamlines this process by automatically generating the prompts based on the provided information and available actions. Bedrock agents handle the complexities of prompt engineering, allowing researchers and developers to focus on defining the task requirements and available actions.
Readers are encouraged to check out https://github.com/aws-samples/agentsforbedrock-retailagent, which uncovers the creation of an FM-powered customer service bot by leveraging Agents for Amazon Bedrock.
The ReAct technique and Bedrock Agents represent a significant advancement in the field of GenAI, enabling models to demonstrate improved reasoning abilities and tackle complex tasks more effectively. By providing a structured approach to problem-solving and leveraging the power of prompts, this technique has the potential to unlock new possibilities and applications for GenAI in various domains. Let us explore the functioning of Amazon Bedrock Agents with some practical use cases.
Practical use case and functioning with Amazon Bedrock Agents
In this section, we will dive into real-world applications and operational insights of leveraging Amazon Bedrock Agents in GenAI. Let us consider an example scenario of a multilingual summarizer bot, wherein a GenAI agent can be employed to streamline operations and automate how to translate the content in a summarized fashion in the language of the user’s choice. In order to begin, the developer must access the Bedrock console and initiate the agent creation workflow, as highlighted in Figure 10.5:
Figure 10.5 – Agent creation within the Bedrock console
This process involves providing essential details, such as the agent’s name, description, and the necessary permissions through an AWS Identity and Access Management (IAM) service role. This role grants the agent access to required services such as Amazon Simple Storage Service (Amazon S3) and AWS Lambda, as illustrated in Figure 10.6. As an example, the figure demonstrates the creation of a multilingual document summarizer and translator agent for extracting relevant information from the documents and relaying the information to the user in the translated language:
Figure 10.6 – Bedrock Agent creation process with IAM permissions
By default, Amazon Bedrock employs encryption for agent sessions with users, utilizing a key that AWS owns and manages on your behalf. However, if you prefer to use a customer-managed key from AWS Key Management Service (KMS) that you have set up, you have the option to customize your encryption settings accordingly. This allows you to take control of the encryption key used for securing agent-user interactions, aligning with your organization’s security and compliance requirements.
Next, the developer selects an FM from Bedrock that aligns with the desired use case. This step involves providing natural language instructions that define the agent’s task and the persona it should assume. For instance, in the example demonstrated in Figure 10.7, the instruction could be You are a multi-lingual agent designed to help with extracting inquired information from relevant documents and providing the response in
translated language
:
Figure 10.7 – Amazon Bedrock Agent configuration for model selection and Agent persona
The console also provides the option for the user to select guardrails to implement application-specific safeguards abiding by responsible AI policies. For simplicity, we can leave this blank and move to the next section. We will be covering guardrails in detail in Chapter 12.
Subsequently, the developer adds action groups, which are sets of tasks the agent can perform automatically by making API calls to the company’s systems. This step involves defining an API schema that outlines the APIs for all actions within a group and providing a Lambda function that encapsulates the business logic for each API. For example, an action group named Summarizer_Translator_ActionGroup
could handle documents stored either in a database or within a particular location, identifying the information requested by the user and sending a summarized response to the user in the translated language inquired by the user. Figure 10.8 showcases the creation of an action group to handle tasks for agents to execute autonomously:
Figure 10.8 – Creating Bedrock Agent’s action group
As shown previously, you will have to create a Lambda function to handle incoming requests from the agents and select an API schema. Please ensure you have provided the right permissions to your AWS Lambda function to invoke Bedrock agents.
For the case of document identification, summarization, and translation, we have provided the following Lambda function that users can leverage for executing the workflow:
import json import time import boto3 # Define a mock dictionary with document IDs and content Document_id  = {     "doc_1": {         "title": "The Importance of Mindfulness",         "author": "Jane Smith",         "content": "Mindfulness is the practice of being fully present and engaged in the current moment, without judgment or distraction. It involves paying attention to your thoughts, feelings, and bodily sensations with a curious and non-judgmental attitude. By cultivating mindfulness, you can reduce stress, improve emotional regulation, and enhance overall well-being. In this document, we will explore the benefits of mindfulness and provide practical techniques for incorporating it into your daily life."       },     "doc_2": {         "title": "Sustainable Living: A Guide to Eco-Friendly Practices",         "author": "Michael Johnson",         "content": "In today's world, it's essential to adopt sustainable living practices to protect our planet's resources and ensure a better future for generations to come. This document will provide you with practical tips and strategies for reducing your environmental impact in various aspects of your life, such as energy consumption, waste management, transportation, and food choices. Together, we can make a significant difference by embracing eco-friendly habits and promoting a more sustainable lifestyle."       },     "doc_3": {         "title": "The Art of Effective Communication",         "author": "Emily Davis",         "content": "Effective communication is a crucial skill in both personal and professional settings. It involves the ability to convey your thoughts, ideas, and emotions clearly and respectfully, while also actively listening and understanding the perspectives of others. In this document, we will explore the key elements of effective communication, such as active listening, nonverbal cues, and empathy. By mastering these techniques, you can improve your relationships, resolve conflicts more effectively, and achieve greater success in your personal and professional endeavors."       }     } def getDocID(event):         docID = event['parameters'][0]['value']         print("NAME PRINTED: ", docID)         if(docID== "doc_1" or "doc1"):             return Document_id["doc_1"]["content"]         elif docID == "doc_2" or "doc2":             return Document_id["doc_2"]["content"]         elif docID == "doc_3" or "doc3":             return Document_id["doc_3"]["content"]         else:             return "No document found by that ID" def lambda_handler(event, context):     response_code = 200     """Main lambda handler directing requests based on the API path, preserving the specified response structure."""     print("event OUTPUT : ")     print(event)     action_group = event.get("actionGroup")     print("action group :" + str(action_group))     api_path = event.get("apiPath")     print ("api_path : " + str(api_path))     result = ''     response_code = 200     if api_path == '/getDoc':         result = getDocID(event)         print(result)     else:         response_code = 404         result = f"Unrecognized api path: {action_group}::{api_path}"     response_body = {         'application/json': {             'body': result         }     }     action_response = {         'actionGroup': event['actionGroup'],         'apiPath': event['apiPath'],         'httpMethod': event['httpMethod'],         'httpStatusCode': response_code,         'responseBody': response_body     }     api_response = {'messageVersion': '1.0', 'response': action_response}     return api_response
Users running the preceding workflow can also use the following OpenAPI schema and store it in S3, as part of this example:
{ Â Â Â Â "openapi": "3.0.1", Â Â Â Â "info": { Â Â Â Â Â Â Â Â "title": "DocSummarizerTranslator API", Â Â Â Â Â Â Â Â "version": "1.0.0", Â Â Â Â Â Â Â Â "description": "APIs for fetching, translating and summarizing docs by fetching the document ID and identifying the language to translate the document" Â Â Â Â }, Â Â Â Â "paths": { Â Â Â Â Â Â Â Â "/getDoc": { Â Â Â Â Â Â Â Â Â Â Â Â "get": { Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â "description": "Get the document content for a document by document ID.", Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â "operationId": "getDoc", Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â "parameters": [ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â { Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â "name": "DocID", Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â "in": "query", Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â "description": "ID of the document to retrieve", Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â "required": true, Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â "schema": { Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â "type": "string"}}], Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â "responses": { Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â "200": { Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â "description": "Successful response with document content data", Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â "content": { Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â "text/plain": { Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â "schema": { Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â "type": "string" Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â }}}}}}}, Â Â Â Â Â Â Â Â "/getDoc/summarize": { Â Â Â Â Â Â Â Â Â Â Â Â "get": { Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â "description": "Summarize the content of the document for given document ID", Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â "operationId": "summarizeDoc", Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â "parameters": [ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â { Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â "name": "DocID", Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â "in": "query", Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â "description": "ID of the document to summarize", Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â "required": true, Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â "schema": { Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â "type": "string" Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â } Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â } Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â ], Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â "responses": { Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â "200": { Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â "description": "Successful response with the summary of the document content for given document ID", Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â "content": { Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â "application/json": { Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â "schema": { Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â "type": "string", Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â "properties": { Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â "summary": { Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â "type": "string", Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â "description": "Summary of the document"}}}}}}}}}}}
In the next step, users have the option to select a knowledge base, as depicted in Figure 10.9. This showcases the power of Bedrock Agents to easily create a RAG-based solution for extracting information from relevant sources stored in the knowledge base, by performing similarity searches and providing desired responses back to the user. For simplicity, we will ignore that and move to the final creation step:
Figure 10.9 – Knowledge-base creation with Bedrock Agents integration
Note
If you would like to dive deep into use cases involving knowledge-base integration with your agents, you can execute the following code samples: https://github.com/aws-samples/amazon-bedrock-workshop/tree/main/05_Agents/insurance_claims_agent/with_kb.
Additional code within the GitHub repository further illustrates how to create and invoke Bedrock Agents with the Python SDK, as evidenced in the following notebook: https://github.com/aws-samples/amazon-bedrock-workshop/blob/main/05_Agents/insurance_claims_agent/with_kb/create_and_invoke_agent_with_kb.ipynb.
Once the preceding steps are done, you can verify the agent configuration and select Create Agent. Congratulations on creating your Amazon Bedrock Agent (Figure 10.10)!
Figure 10.10 – Amazon Bedrock Agent version
On the right side of the screen, you can easily test your agent by asking it questions about the document and requesting it to summarize and translate the document into your desired language, as shown in Figure 10.11:
Figure 10.11 – Testing Bedrock Agent within AWS console
In this section, we acquired a practical comprehension of developing and evaluating Amazon Bedrock Agents tailored for a text summarization use case. Upon ensuring the agent’s configuration and functionality align with the designated tasks, it’s time to transition into the deployment phase.