Architecture
To build our intent classification system, we’ll leverage a serverless, event-driven architecture built on Google Cloud (for example: https://cloud.google.com/architecture/serverless-functions-blueprint). This approach aligns with cloud-native principles and allows for seamless integration with other cloud services.
Figure 7.1: Intent classification example architecture diagram
The architecture consists of the following key components:
- Ingestion layer: This layer is responsible for accepting incoming user inputs from various channels, such as web forms, chat interfaces, or API endpoints. We’ll use Google Cloud Functions as the entry point for our system, which can be triggered by events from services like Cloud Storage, Pub/Sub, or Cloud Run.
- AI processing layer: In this layer, we’ll integrate Google’s Gemini Pro through Vertex AI. Vertex AI provides a managed environment for deploying and scaling machine learning models, ensuring high availability and performance.
- Intent classification model: This is the core component of our system, responsible for analyzing the user input and determining the corresponding intent. We’ll leverage Google Gemini Pro’s natural language understanding capabilities for our intent classification model.
- Orchestration and routing: Based on the classified intent, we’ll need to route the user input to the appropriate downstream system or service. This could involve integrating with customer relationship management (CRM) systems, knowledge bases, or other enterprise applications. We’ll use Cloud Functions or Cloud Run to orchestrate this routing process.
- Monitoring and logging: To ensure the reliability and performance of our system, we’ll implement robust monitoring and logging mechanisms. We’ll leverage services like Cloud Logging, Cloud Monitoring, and Cloud Operations to gain visibility into our system’s behavior and quickly identify and resolve any issues.
By adopting this architecture, the intent classification system won’t just be scalable but also flexible enough to adapt to varying workloads and integration requirements. We’ll be able to handle high volumes of customer inquiries in real time and deliver swift and consistent responses that improve the overall customer experience.
The serverless nature of this architecture brings several additional benefits. It allows for automatic scaling based on demand, ensuring that we can handle sudden spikes in customer inquiries without manual intervention. This elasticity not only improves system reliability but also optimizes costs, as we only pay for the resources we actually use.
This event-driven design facilitates easy integration with other systems and services. As our customer service ecosystem evolves, we can easily add new triggers or outputs to our intent classification system.
This could include integrating with new communication channels, connecting to additional backend systems, or incorporating advanced analytics for deeper insights into customer behavior and preferences.
In the following sections, we’ll dive deeper into each component of our architecture, exploring the specific Google Cloud services we’ll use, best practices for implementation, and strategies for optimizing performance and cost-efficiency. We’ll also discuss a concrete example that will help you get started.
Entry point
For real-time interactive applications, the entry points where prompts originate need to be highly streamlined, with simplicity and ease of use in mind. These prompts often originate from unpredictable contexts, so interfaces have to feel natural across device types and usage scenarios.
In our use case, the entry point could be a web form, chat interface, or API endpoint where customers submit their inquiries. These inputs will be sent to a cloud function, which acts as the ingestion layer for our system.
Let’s start with a sample user query:
#In this case we will simulate the input from a chat interface
message = "I want to open an account"
Prompt pre-processing
In a real-time system, every step in the prompt pre-processing workflow adds precious latency, commonly measured in milliseconds or microseconds depending on your application’s SLAs, to the overall response time. Higher-latency experiences can be detrimental to the user experience. Therefore, pre-processing should be kept as lightweight as possible.
For our intent classification use case, the prompt pre-processing may involve simple text normalization, such as removing punctuation, converting to lowercase, or handling abbreviations. Additionally, we may apply some basic filtering to remove any potentially harmful or inappropriate content before sending the prompt to the model.
Let’s dive deep into an example prompt:
#In this section we define the prompt, as the task is to perform intent
#classification we will identify the intent by exposing
#the possible values to the LLM
prompt_template = """
You are a helpful assistant for an online financial services company that allows users to check their balances, invest in certificates of deposit (CDs), and perform other financial transactions.
Your task is to identify what your customers are trying to do and return a well formed JSON object.
1. Carefully analyze the content of the message.
2. Classify what the user is trying to do within these options:
* New Account: The user is trying to sign up. Return {{"intent": "signup", "content":"null"}}
* Change Password: The user needs to reset their password. Return {{"intent":"change_password", "content":"null"}}
* Check Balance: The user needs to check their balance. Return {{"intent": "check_balance", "content":"null"}}
* Invest in CD: The user wants to invest in a certificate of deposit. Return {{"intent": "invest_cd", "content": "Extract relevant information such as investment amount and term"}}
* Withdraw Funds: The user wants to withdraw money. Return {{"intent": "withdraw_funds", "content": "Extract information like amount and withdrawal method"}}
* Transfer Funds: The user wants to transfer money between accounts. Return {{"intent": "transfer_funds", "content": "Extract information like amount, source account, and destination account"}}
* Account Information: The user wants to access or update their account information. Return {{"intent": "account_info", "content": "Identify the specific information the user needs"}}
* Lost/Stolen Card: The user wants to report a lost or stolen card. Return {{"intent": "lost_card", "content": "null"}}
* Support: The user needs help and is not sure what to do. Return {{"intent": "support", "content": "null"}}
* Other: For other queries, politely decline to answer and clarify what you can help with.
3. Only return the proper JSON result from your classification.
4. Always think step by step.
User question: {query}
JSON:
"""
The previous prompt defines the template for the intent classification task. The prompt provides context that explains that the assistant is helping users of an online financial services company perform various actions, such as signing up, checking balances, investing in CDs, withdrawing funds, and more.
Additionally, this prompt instructs the model to carefully analyze the user’s input message and classify the intent into one of the predefined categories. For each intent category, the prompt specifies the JSON object that should be returned, including any additional information that needs to be extracted from the user’s message.
For example, if the user’s intent is to invest in a CD, the assistant should return the JSON object in the following format:
{
"intent": "invest_cd",
"content": "Extract relevant information such as investment amount and term"
}
This means that the virtual assistant should not only identify the intent as "invest_cd"
but also extract relevant information like the investment amount and term from the user’s message and include it in the "content"
field.
The prompt also provides instructions for handling intents that do not fall into any of the predefined categories (the "Other"
case).
By providing this detailed prompt template, the system can effectively guide the language model to perform the intent classification task for financial services scenarios, ensuring that the model’s responses are structured and formatted correctly.
Inference
At the inference stage, we’ll leverage Google’s Gemini Pro model hosted on Vertex AI. Within the cloud function triggered by the user input, we’ll invoke the Vertex AI endpoint hosting the Gemini Pro model, passing the pre-processed input as the prompt.
Gemini Pro will process the input and return the predicted intent, leveraging its natural language understanding capabilities. Since we’re using an out-of-the-box model, the underlying infrastructure and resource allocation are abstracted away, ensuring that individual requests are processed efficiently while adhering to the service’s performance and cost objectives:
generation_config = {
"max_output_tokens": 8192,
"temperature": 0,
"top_p": 0.95,
}
safety_settings = {
generative_models.HarmCategory.HARM_CATEGORY_HATE_SPEECH: generative_models.HarmBlockThreshold.BLOCK_ONLY_HIGH,
generative_models.HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: generative_models.HarmBlockThreshold.BLOCK_ONLY_HIGH,
generative_models.HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: generative_models.HarmBlockThreshold.BLOCK_ONLY_HIGH,
generative_models.HarmCategory.HARM_CATEGORY_HARASSMENT: generative_models.HarmBlockThreshold.BLOCK_ONLY_HIGH,
}
def generate(prompt):
vertexai.init(project=PROJECT, location=LOCATION)
model = GenerativeModel(MODEL)
responses = model.generate_content(
[prompt],
generation_config=generation_config,
safety_settings=safety_settings,
stream=False,
)
return(responses)
result = generate(prompt_template.format(query=message))
Result post-processing
For our intent classification use case, the post-processing step may involve formatting the predicted intent into a suitable response format, such as JSON or a human-readable string. Additionally, we may apply some basic filtering or ranking mechanisms to ensure that the most relevant and helpful responses are prioritized.
### Sometimes model return markdown friendly content, in this case we will implement a function to filter this.
def extract_json(text):
"""
Extracts the JSON portion from a string containing backticks.
Args:
text: The string containing JSON data within backticks.
Returns:
A dictionary representing the extracted JSON, or None if no valid JSON is found.
"""
start_index = text.find("```json")
end_index = text.find("```", start_index + 7) # +7 to skip "```json"
if start_index != -1 and end_index != -1:
json_string = text[start_index + 7: end_index] # Extract the JSON string
else:
json_string = text
try:
json_data = json.loads(json_string)
return json_data
except json.JSONDecodeError:
return None
The previous code snippet defines a function called extract_json
that is designed to handle cases where the language model’s output contains JSON data wrapped in backticks: json```
. This is a common practice in Markdown-friendly environments, where backticks are used to delineate code blocks or structured data.
The extract_json
function takes a string text as input and attempts to extract the JSON portion from within the backticks. Here’s a breakdown of how the function works:
- The function first looks for the string
"```json"
in the input text using thefind
method. This is the marker that indicates the start of a JSON block. - If the
start
marker is found, the function then looks for the closing""
marker by searching for it from the end of thejson
marker (start_index + 7
). If both the start and end markers are found, the function extracts the JSON string by slicing the input text between these markers. If no start or end markers are found, the function assumes that the entire input text is the JSON string. - The function then attempts to parse the extracted JSON string using the
json.loads
method from thejson
module. If the parsing is successful, the function returns the resulting JSON data as a dictionary. If the parsing fails (for example, due to invalid JSON syntax), the function returnsNone
. By incorporating this function into the post-processing stage, the system can handle cases where the language model’s output contains JSON data wrapped in backticks. This functionality can be particularly useful when working with Markdown-friendly environments or when integrating the intent classification system with other components that expect JSON-formatted data. - The post-processing stage can then proceed to format the extracted JSON data into a suitable response format, apply filtering or ranking mechanisms, and render the final response for display to the user.
The process_intent
function is designed to handle the JSON data returned by the intent classification model. It takes a dictionary intent as input, which is expected to have an “intent” key with a value representing the predicted intent category.
def process_intent(intent):
if intent["intent"] == "signup":
#If a user is trying to sign up you could
#redirect the to a sign up page for example.
return("Sign up process")
elif intent["intent"] == "change_password":
#If a user is looking into changing their password,
#you could either do it through the chatbot,
#or redirect to a password change page.
return("Change password")
elif intent["intent"] == "check_balance":
#In this case you could have a function that
#would query a database to obtain the
#balance (as long as the user is logged in or not)
return("Check account balance")
elif intent["intent"] == "invest_cd":
#For the investment intent, this could redirect
#to a page where investment options can be selected.
return("Invest in a CD")
elif intent["intent"] == "withdraw_funds":
return("Withdraw funds")
elif intent["intent"] == "transfer_funds":
return("Transfer funds")
elif intent["intent"] == "account_info":
return("Account information")
elif intent["intent"] == "lost_card":
return("Report lost card")
elif intent["intent"] == "support":
return("Contact support")
elif intent["intent"] == "other":
return("Other kind of intent")
else:
return("If a intent was classified as something else you should investigate what is going on.")
intent = process_intent(extract_json(result.text))
The process_intent
function checks the value of the "intent"
key in the input dictionary. Depending on the intent category, the function performs a specific action or returns a corresponding message.
For example, if the intent is "signup"
, the function returns the string "Sign up process"
, which could be used to redirect the user to a sign-up page or initiate the sign-up process. Similarly, if the intent is "change_password"
, the function returns "Change password"
, which could trigger a password reset process or redirect the user to a password change page.
For intents like "check_balance"
, "invest_cd"
, "withdraw_funds"
, "transfer_funds"
, "account_info"
, "lost_card"
, and "support"
, the function returns corresponding messages that could be used to initiate the relevant processes or provide instructions to the user.
If the intent is "other"
, the function returns "Other kind of intent"
, indicating that the user’s query did not match any of the predefined intent categories.
If the intent does not match any of the cases handled by the function, it returns a message suggesting that further investigation is needed to understand the intent.
Finally, the last line of code intent = process_intent(extract_json(result.text))
combines the extract_json
and process_intent
functions. It first extracts the JSON data from the result.text
string using extract_json
. Then, it passes the extracted JSON data to the process_intent
function, which processes the intent and returns an appropriate message or action.
This code snippet demonstrates how the intent classification system can be integrated with further processing steps to handle different user intents. The process_intent
function can be extended or modified to include additional logic or actions based on the specific requirements of the application.
Result presentation
The result presentation stage for real-time applications demands instantaneous updates, often server-rendered or via data-binding frameworks.
In our use case, the formatted response containing the predicted intent can be sent back to the customer through the channel from which the inquiry originated (for example, web form, chat interface, or API response). This response can then be used to route the inquiry to the appropriate downstream system or provide an automated response for common intents.
In this example, we will use a Gradio interface to render the replies in a visually appealing UI. Gradio (https://www.gradio.app/) is an open-source Python package that allows you to quickly create easy-to-use, customizable UI components for your ML model, any API, or even an arbitrary Python function using a few lines of code.
You can find more information about Gradio using the following links:
Docs: https://www.gradio.app/docs
The following code provides an example that creates a Gradio interface:
import gradio as gr
def chat(message, history):
response = generate(prompt_template.format(query=message))
intent_action = process_intent(extract_json(response.text))
history.append((message, intent_action))
return "", history
with gr.Blocks() as demo:
gr.Markdown("Fintech Assistant")
chatbot = gr.Chatbot(show_label=False)
message = gr.Textbox(placeholder="Enter your question")
message.submit(chat, [message, chatbot],[message, chatbot] )
demo.launch(debug=True)
The previous code illustrates the result presentation stage for the intent classification system using the Gradio library.
In our example, the chat(message, history)
function is the core of the chatbot interface. It takes two arguments: message
(the user’s input message) and history
(a list containing the previous messages and responses). Here’s what the function does:
- It calls the
generate
function (not shown in the provided code) to get the response from the intent classification model, passing the user’s message as part of the prompt template. It then processes the model’s response using theextract_json
function (not shown) to extract the predicted intent data. - The extracted intent data is passed to the
process_intent
function (which is not shown) to determine the appropriate action or response based on the predicted intent. The user’s message and the generated response are appended to the history list, which keeps track of the conversation. - The function returns an empty string for the response message and the updated history list.
- The code then creates a Gradio interface using the
gr.Blocks
context manager. Inside the context, it does the following:- Displays a title using the gr.Markdown component.
- Creates a
gr.Chatbot
component to display the conversation history. - Creates a
gr.Textbox
component for the user to enter their message. - Binds the chat function to the
submit
event of theTextbox
component. When the user submits their message, the chat function is called with the user’s message and the current history as arguments. - Updates the
Textbox
andChatbot
components with the new message and updated history, respectively. - Launches the Gradio interface in debug mode using
demo.launch(debug=True)
.
The result is an interactive chatbot interface where users can enter their messages as illustrated in Figure 7.2, and the system will process the message, predict the intent, and provide an appropriate response based on the process_intent
function. The conversation history is displayed in the Chatbot
component, allowing users to track the flow of the conversation.
Figure 7.2: Example Gradio interface
Logging and monitoring
Real-time systems require tight instrumentation around per-request metrics, such as latencies, errors, and resource usage.
In our architecture, we’ll leverage services like Cloud Logging (https://cloud.google.com/logging/docs/overview) and Cloud Monitoring (https://cloud.google.com/monitoring/docs/monitoring-overview) to gain visibility into the system’s behavior and quickly identify and resolve any issues. We can monitor metrics like request latency, error rates, and resource utilization, and set up alerts for anomalies or performance degradation.
By following this integration pattern and leveraging the power of Google’s Gemini Pro, businesses can unlock the power of generative AI to build intelligent systems that accurately classify user intents, enhance customer experiences, and streamline operations.
Refer to the GitHub directory of this chapter for the complete code that demonstrates how all the pieces described above fit together.