Detecting & Addressing LLM 'Hallucinations' in Finance

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!

This article is an excerpt from the book, The Future of Finance with ChatGPT and Power BI, by James Bryant, Alok Mukherjee. Enhance decision-making, transform your market approach, and find investment opportunities by exploring AI, finance, and data visualization with ChatGPT's analytics and Power BI's visuals.

Introduction

LLMs, such as OpenAI’s GPT series, can sometimes generate responses that are referred to as “hallucinations.” These are instances where the output from the model is factually incorrect, it presents information that it could not possibly know (given it doesn’t have access to real-time or personalized data), or it might output something nonsensical or highly improbable.

Let’s explore deeper into what hallucinations are, how to identify them, and what steps can be taken to mitigate their impact, especially in a context where accurate and reliable information is crucial, such as financial analysis, trading, or visual data presentations.

Understanding hallucinations

Let’s look at some examples:

Factual inaccuracies: Suppose an LLM provides information stating that Apple Inc. was founded in 1985. This is a clear factual inaccuracy because Apple was founded in 1976.
Speculative statements: If an LLM were to suggest that “As of 2023, Tesla’s share price has hit $3,000,” this is a hallucination. The model doesn’t know real-time data and any post-2021 prediction or speculation it makes about specific stock prices is unfounded.
Confident misinformation: For instance, if an LLM confidently states that “Amazon has declared bankruptcy in late 2022,” this is a hallucination and can have serious consequences if it’s acted upon without verification.

How can we spot hallucinations?

Here are some useful ways to spot hallucinations:

Cross-verification: If an LLM suggests an unusual trading strategy, such as shorting a typically stable blue-chip stock based on some supposed insider information, always cross-verify this advice with other reliable sources or consult a financial advisor.
Questioning the source: If an LLM claims that “our internal data shows a bullish trend for cryptocurrency X,” this is likely a hallucination. The model doesn’t have access to proprietary internal data.
Time awareness: If the model provides information or trends post-September 2021 without the user explicitly asking for a hypothetical or simulated scenario, consider this a red flag. For example, GPT-4 giving specific “real-time” market cap values for companies in 2023 would be a hallucination.

What can we do about hallucinations?

Here are some ideas:

Promote awareness: If you are developing an AI-assisted trading app that uses an LLM, ensure users are aware of potential hallucinations, perhaps with a disclaimer or notification upon usage
Implement checks: You might integrate a news API that could help validate major financial events or claims made by the model

Minimizing hallucinations in the future

There are various ways we can minimize hallucinations. Here are some examples:

Training improvements: Imagine developing a better model that understands context and sticks to the known data more closely, avoiding speculative or incorrect financial statements. Future versions of the model could be specifically trained on financial data, news, and reports to understand the context and semantics of financial trading and investment better. We could do this to ensure that it understands a short squeeze scenario accurately, or is aware that penny stocks typically come with higher risks.
Better evaluation metrics: For instance, develop a specific metric that calculates the percentage of the model’s outputs that were flagged as hallucinations during testing. In the development phase, the models could be evaluated on more focused tasks such as generating valid trading strategies or predicting the impact of certain macroeconomic events on stock prices. The better the model performs on these tasks, the lower the chance of hallucinations occurring.
Post-processing methods: Develop an algorithm that cross-references model outputs against reliable financial data sources and flags potential inaccuracies. After the model generates a potential trading strategy or investment suggestion, this output could be cross-verified using a rules-based system. For instance, if the model suggests shorting a stock that has consistently performed well without any recent negative news or poor earnings reports, the system might flag this as a potential hallucination.

As an example, you can use libraries such as yfinance or pandas_datareader to access real-time or historical financial data:

!pip install yfinance pandas_datareader
import yfinance as yf
def get_stock_data(ticker, start, end):
   stock = yf.Ticker(ticker)
   data = stock.history(start=start, end=end)
   return data
# Example Usage:
data = get_stock_data("AAPL", "2021-01-01", "2023-01-01")

You could also develop a cross-verification algorithm and compare the model’s outputs with the collected financial data to flag potential inaccuracies.

Integration with real-time data: While creating Power BI visualizations, data that’s been pulled from the LLM could be cross-verified with real-time data from financial databases or APIs. Any discrepancies, such as inconsistent market share percentages or revenue growth rates, could be flagged. This reduces the risk of presenting hallucinated data in visualizations. Let’s look at some examples:

Extracting real-time data: You can continue to use yfinance or pandas_datareader to extract real-time data

Cross-verifying with real-time data: You can compare the model’s output with real-time data to identify discrepancies:

def real_time_cross_verify(output, real_time_data):
# Assume output is a dict with keys 'market_share', 'revenue_
growth', and 'ticker'
                       ticker = output['ticker']

# Fetch real-time data (assuming a function get_real_time_
data is defined)
                       real_time_data = get_real_time_
data(ticker)
                       # Compare the model's output with 
real-time data
if abs(output['market_share'] - real_time_data['market_
share']) > 0.05 or \
abs(output['revenue_growth'] - real_time_data['revenue_ growth']) > 0.05:
                           return True  # Flagged as a 
potential hallucination
                       return False  # Not flagged

# Example Usage:
output = {'market_share': 0.25, 'revenue_growth': 0.08, 'ticker': 'AAPL'}
real_time_data = {'market_share': 0.24, 'revenue_growth': 0.07, 'ticker': 'AAPL'}
flagged = real_time_cross_verify(output, real_time_data)

User feedback loop: A mechanism can be incorporated to allow users to report potential hallucinations. For instance, if a user spots an error in the LLM’s output during a Power BI data analysis session, they can report this. Over time, these reports can be used to further train the model and reduce hallucinations.

OpenAI is on the case

To tackle the chatbot’s missteps, OpenAI engineers are working on ways for its AI models to reward themselves for outputting correct data when moving toward an answer, instead of rewarding themselves only at the point of conclusion. The system could lead to better outcomes as it incorporates more of a human-like chain-of-thought procedure, according to the engineers.

These examples should help in illustrating the concept and risks of LLM hallucinations, particularly in high-stakes contexts such as finance. As always, these models should be seen as powerful tools for assistance, but not as a final authority.

Trading examples

Hallucination scenario: Let’s assume you’ve asked an LLM for a prediction on the future performance of a specific stock, let’s say Tesla. The LLM might generate a response that appears confident and factual, such as “Based on the latest earnings report, Tesla has declared bankruptcy.” If you acted on this hallucinated information, you might rush to sell Tesla shares only to find out that Tesla is not bankrupt at all. This is an example of a potentially disastrous hallucination.
Action: Before making any trading decision based on the LLM’s output, always cross-verify the information from a reliable financial news source or the company’s official communications.

Power BI visualization examples

Hallucination scenario: Suppose you’re using an LLM to generate text descriptions for a Power BI dashboard that tracks the market share of different automakers in the EV market. The LLM might hallucinate and produce a statement such as “Rivian has surpassed Tesla in terms of global EV market share.” This statement might be completely inaccurate as Tesla had a significantly larger market share than Rivian.
Action: When using LLMs to generate text descriptions or insights for your Power BI dashboards, it’s crucial to cross-verify any assertions that are made by the model. You can do this by cross-referencing the underlying data in your Power BI dashboard or by referring to reliable external sources of information.
To minimize hallucinations in the future, the model can be fine-tuned with a dataset that’s been specifically curated to cover the relevant domain. The use of a structured validation set can help spot and rectify hallucinations during the model training process. Also, employing a robust fact-checking mechanism on the output of the model before acting on its suggestions or insights can help catch and rectify any hallucinations.
Remember, while LLMs can provide valuable insights and suggestions, their output should always be used as one of many inputs in your decision-making process, particularly in high-stakes environments such as financial trading and analysis.

Conclusion

In the dynamic world of financial analysis and data visualization, the presence of LLM 'hallucinations' poses a challenge. Awareness, verification, and ongoing improvement strategies stand as pillars against these inaccuracies. While LLMs offer invaluable support, their outputs must be scrutinized, verified, and used as one among many tools in decision-making. As we navigate this landscape, vigilance, continuous refinement, and a critical eye will fortify our ability to harness the power of LLMs while mitigating the risks they present in high-stakes financial contexts.

Author Bio

James Bryant, a finance and technology expert, excels at identifying untapped opportunities and leveraging cutting-edge tools to optimize financial processes. With expertise in finance automation, risk management, investments, trading, and banking, he's known for staying ahead of trends and driving innovation in the financial industry. James has built corporate treasuries like Salesforce and transformed companies like Stanford Health Care through digital innovation. He is passionate about sharing his knowledge and empowering others to excel in finance. Outside of work, James enjoys skiing with his family in Lake Tahoe, running half marathons, and exploring new destinations and culinary experiences with his wife and daughter.

Aloke Mukherjee is a seasoned technologist with over a decade of experience in business architecture, digital transformation, and solutions architecture. He excels at applying data-driven solutions to real-world problems and has proficiency in data analytics and planning. Aloke worked at EMC Corp and Genentech and currently spearheads the digital transformation of Finance Business Intelligence at Stanford Health Care. In addition to his work, Aloke is a Certified Personal Trainer and is passionate about helping his clients stay fit. Aloke also has a passion for wine and exploring new vineyards.