Managing Data Integrity for Finance

Recognizing the Importance of Data Integrity in Finance

Imagine if everyone online suddenly started complaining on social media that their bank savings accounts had unauthorized deductions. This is exactly what happened when thousands of customers of one of the major banks in Southeast Asia discovered that their account balances ended up negative due to duplicate transactions! This led to customers feeling anxious while reports of this data integrity issue went viral online. How would you feel if your hard-earned money suddenly disappeared overnight due to a data integrity issue?

Maintaining the integrity, accuracy, and reliability of financial data is key to the success of any organization. Data integrity plays a crucial role in finance, as business owners and decision-makers utilize financial and operational data in making long-term business decisions. If you’ve been working as a finance professional for a long time, you probably know by now that data integrity management plays a significant role in helping ensure compliance and avoiding significant financial penalties. Understanding the relevant concepts and strategies is the first step for every professional trying to master the art of financial data integrity management. In this introductory chapter, we will examine the importance of data integrity in finance and demystify various key concepts relevant to the succeeding chapters of this book.

That said, we will cover the following:

Understanding the impact of data integrity issues in finance
A quick tour of concepts relevant to data integrity management
Debunking the myths and misconceptions surrounding finance data integrity management

With these in mind, let’s get started!

Understanding the impact of data integrity issues in finance

Can you spot the wolf hiding among the sheep in Figure 1.1? In finance, the presence of data integrity issues can be compared to a wolf hiding among a flock of sheep. Much like the wolf presents a hidden threat to the sheep, a single data integrity issue can negatively impact the entire financial system’s reputation and stability.

Figure 1.1 – A wolf hidden among sheep

The wolf symbolizes the subtle yet potentially catastrophic effects of a data integrity breach. While data integrity issues such as corrupted financial records, inaccurate reporting, and duplicated transactions due to software bugs might initially go unnoticed, they might cause serious financial losses in the long term. That said, the inability to manage data integrity issues properly can lead to a wide range of implications on the integrity of financial transactions and systems. Let’s look at these in the following subsections.

Lack of trust in systems

In order to properly make informed business decisions based on reports and numbers, the financial data used for the reports needs to be as accurate as possible. When decision-makers encounter discrepancies in the reports generated using the data stored in an organization’s internal systems, they lose their trust and confidence in these systems and databases.

At the same time, when customers encounter inconsistencies in their financial statements, accounts, or transactions, they lose trust in the financial institution’s ability to manage their accounts and personal data effectively. This not only damages the institution’s reputation, but it also leads to the loss of customers. That said, taking care of the integrity of financial data is essential not only for internal decision-making but also for securing customers’ trust as well.

Damage to reputation

If not addressed, data integrity issues can significantly harm an organization’s reputation after an incident. Continuing the story where the bank’s customers were affected by erroneous duplicate transactions, even if the data integrity issue was resolved after a few days, there were a lot of social media posts from customers wanting to move their accounts to another bank.

Important note

Unfortunately, all it takes is a single incident to negatively impact the trust and confidence customers have in a company that it has worked hard to build over a long time.

Financial impact

Data integrity issues can lead to errors and discrepancies in financial reports and documents that detail an organization’s financial performance and position. This in turn could negatively impact the organization’s revenue and income.

Note

In Chapter 2, Avoiding Common Data Integrity Issues and Challenges in Finance Teams, we will discuss how a transaction coding error in one of the world’s biggest banks failed to capture the complete threshold transaction reports from its intelligent deposit machines (IDMs), which led to significant financial penalties for the company.

Compliance issues with laws and regulations

In addition to what has been discussed already, data integrity issues can lead to compliance issues with global laws and regulations that have been established to counter fraud and improve the reliability of financial reporting. Included in this list are the Sarbanes-Oxley Act (SOX), Basel III, and even the General Data Protection Regulation (GDPR), all of which mandate strict data management and protection standards to ensure integrity, transparency, and accountability in financial practices. Non-compliance with these regulations can result in significant financial penalties that can negatively impact an organization’s financial health and public image.

At this point, you should have a better appreciation of why financial data integrity management is important. In the next section, we’ll discuss various concepts relevant to data integrity management to prepare us for the succeeding chapters in this book.

A quick tour of concepts relevant to data integrity management

Making better business decisions relies on having accurate and trustworthy financial data. To help us get started, we’ll begin with several foundational concepts, which will be essential in understanding the topics in later chapters.

Levenshtein distance

With companies often dealing with transactions and records from multiple sources, utilizing string similarity algorithms such as the Levenshtein distance can help reconcile these datasets by matching similar entries especially when there are issues finding the exact match due to typos or minor discrepancies.

The Levenshtein distance, invented by Vladimir Levenshtein, measures the similarity between two strings by counting the number of edits needed to transform one word into another. It quantifies this similarity in terms of inserting, deleting, or substituting characters required for the conversion. Let us take a simple example between health and wealth. The computed distance is one (1) since it will take a single edit operation to substitute h with w.

To help demonstrate how this metric works, here are a few more examples:

From rat to cat: 1
From book to back: 2
From saturday to sunday: 3
From apple to apricot: 5

This can be used for identifying and managing data integrity issues, as it can help detect potential duplicate entries as well as any typographic errors in the data. The Levenshtein distance can be used to check whether an account already exists in the database by looking for similar names in the database (with the search results sorted with the smallest computed distance presented first).

Note

For example, searching Sara Lat in a database should return results including Sarah Lat since the Levenshtein distance is very small (just a one-letter difference). Here, even if there are encoding issues, the algorithm can still identify and match similar entries despite not having an exact match. Imagine the possibilities this opens up for enhancing data retrieval accuracy, particularly in large databases where typos, abbreviations, or minor discrepancies are common! For more information on this topic, feel free to check https://en.wikipedia.org/wiki/Levenshtein_distance.

Machine learning

Machine learning (ML) is a subset of artificial intelligence (AI) that enables computers to learn from data and mimic human intelligence. One of the practical applications of machine learning is anomaly detection, which involves identifying unusual patterns or outliers in the datasets. This is particularly useful for detecting unusual transactions that may indicate fraudulent activity automatically. Now, instead of looking for unusual transactions manually, machine learning-powered automated systems can efficiently process and analyze large amounts of transactional data in real-time to flag discrepancies and issues. Awesome, right?

Note

Recently, powerful AI-powered solutions such as ChatGPT and Google Bard became available for a wide range of applications including data quality management. We’ll cover this in more detail in Chapter 10, Using Artificial Intelligence for Finance Data Quality Management.

Orphaned records

Orphaned records are database records whose corresponding parent or related record(s) supposedly stored in another database table no longer exists. This situation may occur due to various reasons such as a record being deleted or modified without the related record(s) being updated as well.

Let’s say we have two related tables called Product Details and Transactions, as seen in Figure 1.2. We can see that the Product_ID column in the Product Details table contains additional details about the product (for example, the product name, price, and cost). The Transactions table, on the other hand, contains information on when a product was sold and how many were sold.

Figure 1.2 – Related tables being connected by the Product_ID column

Here, the Product_ID column serves as the bridge connecting these two tables. We can see that there are four unique Product_ID values under the Transactions table, which have matching records in the Product Details table.

Figure 1.3 – Orphaned records in the Transactions table

If, for example, the record with Product_ID = 151002 was inadvertently deleted from the Product Details table (similar to what we have in Figure 1.3), we would have several orphaned records in the Transactions table not having the corresponding Product Details record in the first table. This could cause reconciliation errors that can become discrepancies in financial reports.

Note

We will cover how to detect orphaned records and manage data integrity in our hands-on examples in Chapter 6, Implementing Best Practices When Using Business Intelligence Tools, and Chapter 10, Using Artificial Intelligence for Finance Data Quality Management.

Financial reporting

Distributing financial data to interested parties, such as creditors, investors, and regulatory agencies, is the process of financial reporting. Since they form the basis for the financial statements that provide a snapshot of an organization’s financial status, taking care of data integrity is essential for accurate financial reporting.

One of the ways that an organization can ensure the integrity of financial reports is by setting up internal controls. These processes are set up by the company to provide a level of reasonable assurance as to the reliability of the financial statements, prevent fraud, and ensure compliance with regulations. It is important to note that if there are errors or discrepancies relating to inaccurate or misleading reporting, the company is at risk of penalties and fines.

Note

Being data-rich and information-poor is a common challenge. This emphasizes the importance of data integrity in transforming raw data into usable information, as exemplified in the creation and analysis of financial reports.

Balance sheet

The company’s financial position and state can be understood by referring to the balance sheet. At a specific point in time, it discloses information regarding its assets, liabilities, and shareholders’ equity. The balance sheet, together with the statement of profit and loss, and the statement of cash flows are the most common financial statements prepared by businesses. We will discuss the balance sheet here and the other two in the later part of this section.

These financial statements can be prepared by different individuals depending on the size of the company. If it is a small business, the owner can prepare it. Alternatively, external accountants can be hired to assist in the preparation of these reports. A balance sheet is usually prepared on a monthly basis or depending on the needs of the business.

Note

In order to make sure that the numbers add up, accounts reconciliation procedures are done as part of the internal controls. After the balance sheet has been prepared and if the company is publicly held, public accounting firms can be hired to review the balance sheet and conduct external audits.

Nowadays, companies may utilize software to speed up the preparation of financial reports. However, despite this, data integrity issues can still be present since the process of accounts reconciliation and recording may still be manual.

Figure 1.4 – Sample balance sheet

Let’s have a quick look at an example of a balance sheet. In Figure 1.4, we have a sample report for a hypothetical business detailing the company’s assets, liabilities, and shareholders’ equity. To help us understand the balance sheet better, we need to familiarize ourselves with the following key concepts:

Asset: An asset is a resource that a company owns or controls, which is expected to provide future financial benefits. In the balance sheet example provided in Figure 1.4, assets include those that contribute to the value of the business. These include current assets, property, plant and equipment, and intangible assets. The assets of a company generally depend on the business type, and the components can change as necessary.
Liability: This is a debt that a business is obligated to settle in the future. It represents a future outflow of resources. In simple terms, liabilities (and equity, which we will discuss next) are the sources of funding used to acquire assets.
Equity: This is what remains from the assets after all liabilities are paid. Types of equity include common stock, preferred stock, and retained earnings. Simply put, this is the portion that the owners have claim to once the liabilities are settled. When money is invested in a company, it can be in the form of common stock or preferred stock. Then, when the business starts generating profits, the earnings are kept in the business and placed under the retained earnings account. A portion of the earnings are sometimes paid to the stockholders in the form of dividends.

A balance sheet needs to be always balanced. The reason it is called a balance sheet is that the assets should equal the sum of the liabilities and shareholders’ equity. The liabilities and equity fund the acquisition of business assets, which is why they need to be equal. Insufficient data, erroneous transactions, mistakes in inventory, or errors in equity calculations could likely be the causes of an unbalanced balance sheet.

Important note

If errors in the balance sheet go undetected and the balance sheet is publicly reported, this can potentially negatively impact the company’s reputation as well as its stock price.

Profit and loss statement

Business owners and accountants utilize the profit and loss (P&L) report as an essential financial statement. Based on the company’s revenues and expenses, the report details its net profit or loss. It describes how a company can generate revenue and earn income after deducting the expenses.

Now, let’s look at a quick example of a P&L report. Figure 1.5 shows an example P&L statement for our fictional company where we break down its sales, cost of goods sold, expenses, and net income.

Figure 1.5 – Statement of Profit and Loss

Here, we can see that the business earns its revenue from selling products. Then, the cost of goods sold is deducted from the sales to get the gross profit. Afterward, the expenses needed to run the business—such as the franchise fee, insurance, maintenance, and taxes—are deducted to get the net income after taxes. The P&L report also enables the company to look at net income and overall profitability to determine how to best manage its resources.

Note

Imagine we have a P&L report where the cost of goods sold account increased significantly without the corresponding increase in revenue. This could flag the presence of potential data integrity issues, which affect the accuracy of the profitability of the business. As mentioned earlier, one way to ensure the integrity of financial statements in general—and the profit and loss statement in particular—is to set up internal controls. Take, for example, internal controls for payroll in order to minimize the risk of fraud. Some examples of this are the segregation of duties between the timesheet approver, payroll processor, and payroll issuer, having a different bank account for payroll, and comparing the actual payroll expense with the budget. This will be discussed in more detail in Chapter 7, Detecting Fraudulent Transactions Affecting Financial Report Integrity.

Cash flow statement

Another major financial report in addition to the balance sheet and the P&L statement is the cash flow statement. This report connects the balance sheet and income statement because it shows how money flows in and out of the company. The beginning and ending balances of this account, as well as the change in the cash level over the course of the period, are shown in the cash flow statement.

Figure 1.6 shows the statement of cash flows for an imaginary company. The cash flow statement can be prepared in two ways, either through the direct method or indirect method. The direct method details the incoming and outgoing cash flows from operations, while the indirect method presents cash flows using the net income as a starting point.

Figure 1.6 – Statement of Cash Flows

In our example, we are using the indirect method where we calculated the Net cash provided by operating activities by starting with the net income and adjusting it based on changes in cash provided by operating activities.

The statement also shows the Cash flows from investing activities, such as funding capital expenditures or selling equipment, and Cash flows from financing activities, such as proceeds from issuing debt or paying dividends to stockholders.

Important note

The financial reports that we have discussed are external reports and there are various internal reports that are generated and processed by internal teams to address a company’s specific needs and ensure that reviews, integrity checks, and analysis are done properly.

Budgeting

Budgeting is the process of creating a plan or estimate regarding the expected revenue and expenses in the future. This may also involve establishing financial objectives and allocating the required resources to meet these objectives. In order to plan effectively for the future and allocate resources properly, it is essential to make sure that the budgeting data is accurate.

Note

For example, a payroll employee creates a ghost employee in the payroll system and is able to pocket these payments. Having a budgeting process would help flag data integrity issues and potentially fraudulent transactions. That being said, we can estimate how much the expected payroll cost each month would be. Having a different payroll employee compare the budget with the actual payouts will assist in identifying a ghost employee.

Forecasting

Making projections about potential financial outcomes based on past performance and other relevant information is known as forecasting. Ensuring data integrity is critical to the forecasting process since accurate and reliable data is needed as a basis for the projections. One of the key activities done by companies while facing the challenges during COVID-19 was cash flow forecasting. Gaining visibility over cash flows allows a company to manage and run its operations effectively and have a buffer, especially when things do not go necessarily as planned. For the cash flow forecast to be effective and useful, the underlying data needs to be reliable and accurate. Thus, there is a need for data quality and integrity.

Say, for example, the business owners wanted to invest USD 200,000 to install building improvements that would make the business more profitable in the long run. However, they want to make sure that this is feasible and would not put the business in a dire situation in terms of cash flow. Assuming that the management expects to have the same level of operations for next year and pay the same level of dividends as 2023, we can create a simple statement of cash flow forecast, as seen in Figure 1.7:

Figure 1.7 – A simple cash flow statement forecast

We can see that holding other assumptions constant, by investing USD 200,000 into improvements will lead to a negative cash flow of USD 71, 826 for the business. This means that the cash flow from operations will not be sufficient to fund the expenditure, and the business needs to take on a loan from a bank or issue additional stock to finance the spending.

But what if there was an error in the calculation of the USD 140,386 income caused by an oversight in recording the revenue for Product B? Suppose in 2022, we have a revenue of USD 85,000 for this product (as seen in Figure 1.5), but we should have USD 65,000 instead. This mistake will cause a ripple effect in the calculation of the gross profit as well as the net income before taxes and taxes on income. Given that we assumed that for 2023 we will have the same level of operations as 2022 for simplicity in our example, this will have an effect on how much cash needs to be borrowed or raised to fund the investment in the building improvements.

Note

With advancements in technology and artificial intelligence, companies have started using machine learning in forecasting. Similar to the cash flow forecast just seen, for the machine learning forecast to be useful, the training data needs to be reliable and accurate.

Depreciation

Companies can recognize and report the decline in the value of fixed assets over time on their financial statements through depreciation. It is computed using the asset’s cost, expected useful life, and any residual value. This way, depreciation allows businesses to more precisely reflect the true cost of utilizing an asset over its useful life rather than being expensed in full on a one-time basis. At the same time, it helps reflect the cost of utilizing the asset in the same period when revenue was generated. Depreciation is shown as an expense on the income statement and as a decrease in the asset’s value on the balance sheet:

Figure 1.8 – Building depreciation schedule

Figure 1.8 details the building depreciation schedule allocating the cost of the asset over its useful life of 10 years. It can be gathered from the details that the building was acquired in 2021 at a value of USD 260,000 and expected to have a resale value of zero by 2031, when it will be fully utilized.

Newly constructed buildings usually last 40 years. Given that it will be used for 10 years, we can conclude that it is not a new one. The initial cost of the asset can differ depending on whether the business is following Generally Accepted Accounting Principles (GAAP) or International Financial Reporting Standards (IFRS). However, in this example, we will assume that there is no difference for simplicity. Also, estimates regarding how long an asset can be used, the depreciation method, and how much it can be sold for are approximations that could change depending on expectations.

Important note

It is critical to ensure the integrity and correctness of depreciation estimates. If, for example, there was an error in the useful life or depreciation method used to calculate the depreciation expense, this will have a flow-on effect (that is, a ripple effect) on the P&L statement values and calculations. Once these data integrity issues are detected, they could lead to restatements of prior periods' financial statements, particularly when they contain material inaccuracies.

For more information on this topic, feel free to check the following link: https://www.investopedia.com/terms/r/restatement.asp.

Variable cost

While fixed costs stay the same regardless of the amount of production or sales, variable costs fluctuate based on the quantity of goods or services produced or sold. Materials, direct labor costs, and packaging costs are examples of variable costs, sometimes referred to as direct expenses, which are directly associated with the production or sale of a certain good or service. Office rent, administrative staff pay, and real estate taxes are examples of fixed costs.

Important note

Understanding the difference between variable and fixed costs is critical for financial management since it enables a business to determine pricing and profitability while also understanding the entire cost of production or sales at various output levels. Knowing the trend of the variable cost for a product helps identify any irregularities early in the process. If, for example, the variable costs significantly increased, it can mean that the cost of producing the product has increased, or it could indicate errors, inconsistencies, or potentially even fraud.

Risk management

Risk management is the process of identifying, assessing, and minimizing the different risks and threats (including data integrity issues) that could potentially impact an organization’s business operations and financial performance. It is necessary that accurate and reliable data is available in order to identify and evaluate these threats. You can also use this data in the development of effective risk mitigation strategies. In Chapter 2, Avoiding Common Data Integrity Issues and Challenges in Finance Teams, we will discuss how one of the biggest banks globally failed to perform the appropriate risk assessments of possible data corruption for their intelligent deposit machines (IDMs), which led to significant financial penalties.

Insurance

Insurance is a legal contract between two parties where the insurer provides financial coverage for any loss that the insured may suffer from an unforeseen loss. In addition to this, filing for an insurance claim causes business interruptions and requires that the data is current, comprehensive, and accurate.

Fraud in the insurance industry costs billions of dollars per year. One way to address this risk is to improve and manage data integrity by ensuring that the data can be trusted and that the records are up to date and correct.

Note

We will cover how fraud can negatively affect the integrity of financial reports under Chapter 7, Detecting Fraudulent Transactions Affecting Financial Report Integrity.

Transaction

A transaction is an exchange between two parties representing a transfer of resources. Each transaction is recorded as an entry in its financial records. Some examples are selling a product to customers, paying the rent on the company office, paying the wages and salaries of employees, or buying equipment needed for production. One way to help ensure data integrity at a transaction level is by using database locking techniques and features available in database applications. These techniques would make sure that updates performed during transactions are correctly reflected (that is, the numbers are adding up correctly!). We’ll cover these techniques in more detail in Chapter 8, Using Database Locking Techniques for Financial Transaction Integrity.

In addition to this, database transactions can be stored and audited using various solutions including ledger databases such as Amazon Quantum Ledger Database (QLDB). Using these special types of databases, even if transactions are deleted, they can still be validated and audited using the features provided by the ledger database. We will discuss this further in Chapter 9, Using Managed Ledger Databases for Finance Data Integrity.

Mutual exclusion

Another technique to help ensure data integrity in system transactions is to utilize a mutual exclusion lock (or mutex lock), which prevents simultaneous access to a common entity or resource. Imagine having a bank account that contains a total of $200. Suppose two separate deposits are initiated at the same time—one deposit of $50 and another of $100. In this scenario, a mutex would lock the account for one deposit transaction, preventing the other from accessing the account simultaneously. This ensures accurate updating of the account balance, as each deposit is processed in isolation. Once the first deposit ($50) is completed and the account balance is updated, the mutex unlocks, allowing the second deposit ($100) to proceed, thereby maintaining the integrity of the transaction process as well as the final account balance (which should be $350). Without a lock, you might end up having an incorrect final account balance of $250!

To help demonstrate how effective these types of locks are when building financial systems, let’s have a quick example where multiple running processes or threads are updating a shared resource (first without a mutex) using the Python programming language:

import threading
counter = 0

def increment_counter():
    global counter
    for _ in range(100000):
        counter += 1

threads = []
for i in range(10):
    thread = threading.Thread(target=increment_counter)
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join()

print(f"Final counter value without mutex: {counter}")

Here, we have a shared counter variable as well as multiple threads updating the counter at the same time. What happens if we run this code? Without a mutex lock, the concurrent threads will try to update the counter at the same time and accidentally overwrite the updates performed by other threads!

Note

The expected final value of the counter is 1,000,000. However, without a mutex lock, you will most likely get a value less than 1,000,000 (which is incorrect). You would be surprised that running the code multiple times may yield different results as well! In case you want to run this example yourself, make sure to (1) install Python 3.7.X on your laptop/local machine, (2) create a file named no_mutex.py, and (3) run python3 no_mutex.py in your terminal application. Alternatively, you can run the code on websites such as https://www.online-python.com/.

Now, let’s update the previous example and have it use a mutex lock:

import threading
counter = 0
counter_lock = threading.Lock()

def increment_counter():
    global counter
    for _ in range(100000):
        with counter_lock:
            counter += 1

threads = []
for i in range(10):
    thread = threading.Thread(target=increment_counter)
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join()

print(f"Final counter value with mutex: {counter}")

This time, we will get the correct final value of the counter variable (1,000,000) since the usage of a mutex lock helped ensure that an update to the same resource is completed first before the next update operation is performed. Feel free to run the code multiple times and you should see that the result should always be the same (that is, 1,000,000). Awesome, right?

Note

For more information about this topic, feel free to check the following page: https://en.wikipedia.org/wiki/Lock_(computer_science).

Now that we’ve established a good foundation with these concepts, let’s move on to the final section of this chapter.

Debunking the myths and misconceptions surrounding finance data integrity management

There are several myths and misconceptions that can negatively influence the financial practices and processes of various departments in an organization. In this section, we will cover the different beliefs within an organization that can lead to data quality issues and noncompliance, which could in turn lead to major financial consequences. Once we are able to debunk these myths and misconceptions, we can establish more effective strategies and practices that ensure the integrity and reliability of financial data in our own organizations.

Myth 1 – only large financial organizations are concerned about data integrity

Data integrity issues affect organizations of varying sizes, from start-ups and small businesses to large organizations. As mentioned earlier in this chapter, poor financial data integrity management can result in serious financial consequences or regulatory offenses. For example, start-ups may end up deprioritizing financial data integrity management and avoid strict processes that could slow down progress. This could lead to inconsistencies in their financial reporting, budget forecasts, and internal audits, potentially resulting in significant long-term financial and reputational damage.

Though small businesses can have less complexity and less regulation compared to bigger organizations, there is still a need to maintain data integrity throughout the data lifecycle. This enables business owners and management to confidently rely on the data and make informed business decisions.

Myth 2 – only finance professionals should be concerned about data integrity

While finance professionals play a crucial role in data integrity management, this responsibility needs to be shared across the entire organization. This involves promoting a culture of quality within the organization, increasing data literacy through training, and having an environment of openness.

For one thing, software engineers need to be mindful of data integrity issues and risks when building financial applications. Junior software engineers are probably not aware that adding 0.1 and 0.2 using languages such as Python or JavaScript without converting these floating-point values to decimal would yield a result of 0.30000000000000004 instead of 0.3! Where did that extra 0.00000000000000004 come from?! To help solve this mystery, developers should be aware that a floating-point number (often referred to simply as float) is stored in a format that cannot accurately represent all decimal numbers. Using float instead of decimal when developing financial applications is a bad idea, as this would have several significant implications including approximation errors and rounding errors.

This is best demonstrated with a simple code example using the Python programming language:

principal = 1000
rate = 0.001 # 0.1%
time = 1/365
interest_float = principal * rate * time
interest_float

Running this block of code would result in 0.0027397260273972603, similar to what we have in Figure 1.9:

Figure 1.9 – Getting the results for interest_float

Let’s try performing the same calculation, but this time we’ll use the decimal data type:

from decimal import Decimal
principal = Decimal('1000')
rate = Decimal('0.001') # 0.1%
time = Decimal('1') / Decimal('365')
interest_decimal = principal * rate * time
interest_decimal

This will result in Decimal('0.002739726027397260273972602740'), as can be seen in Figure 1.10:

Figure 1.10 – Getting the results for interest_decimal

Once we subtract the interest values (one stored as a floating-point number and the other stored in decimal form), we will get a difference similar to what we have in Figure 1.11:

Figure 1.11 – Getting the difference between the interest calculations

This difference may seem small, however, what if the interest calculations are needed, for example, on overnight deposits worth hundreds of millions of dollars?

Note

For more information about this topic, feel free to check the following link: https://en.wikipedia.org/wiki/Floating-point_arithmetic.

Myth 3 – only internal financial reporting systems are affected by data integrity issues

Any type of financial system with a database can be affected by data integrity issues. This includes banking systems, accounting software, investment management platforms, and even payroll systems.

Note that data integrity issues can negatively impact machine learning-powered financial systems as well. ML-powered financial applications make use of machine learning models, which learn from data to identify patterns, make decisions, or predict outcomes. What if the data used to train these models have data integrity issues? In such cases, these models may produce inaccurate or biased results, as they rely on the quality of the training data to make predictions. This could lead to significant challenges in application performance and even potentially yield harmful results, especially in finance. As the saying goes: garbage in, garbage out.

As more organizations around the world build ML-powered financial applications, taking care of the integrity of data and addressing the phenomenon called machine learning drift is critical. While we won’t discuss machine learning drift in detail, it’s important that we are aware that this drift leads to a decline in model accuracy, which impacts the effectiveness of a machine learning system. It is essential that we know that data integrity issues, such as inconsistencies, missing values, or biases, significantly contribute to this drift.

Myth 4 – processes that improve data integrity are expensive and difficult to implement

Contrary to popular belief, improving the quality and integrity of data used by organizations doesn’t have to be expensive. There are practical ways and processes that can be implemented using the most commonly used tools available, such as Microsoft Excel, Google Sheets, and Power BI. These tools offer functionalities such as data validation, conditional formatting, and pivot tables, which can be leveraged to maintain accurate and consistent data. In addition to this, integrating basic data checks and regular audits into routine processes can go a long way to preserving the integrity of financial data. Training staff in effective data management and the use of these tools can also be done with minimal expense.

Note

Using various solutions and features for data integrity management will be covered further in Chapter 4, Understanding the Data Integrity Management Capabilities of Business Intelligence Tools.

Myth 5 – only electronic data is affected by data integrity issues

All types of data are affected by data integrity, whether stored digitally or on paper. It is important to be able to accurately store and retrieve data whether from an electronic database or hardcopy documents. One way of minimizing risks in the data collection process is anticipating and managing potential human errors ahead of time. This can be addressed by doing data validation checks, double-checking the work, having a standardized process, or enabling automation.

Note

While there are machine learning-powered tools to help automate the encoding process, it is crucial to remember that these tools also require regular monitoring and validation to ensure that they are functioning correctly and adapting to any changes in data formats or structures.

That’s pretty much it! At this point, we should have a better understanding of the myths and misconceptions and a deeper appreciation of the right mindset and approach toward ensuring the integrity of financial data.

Amazon Customer Feb 08, 2024

This book dives deep into various concepts and topics for ensuring the data integrity of financial data. In addition to this, the book features a LOT of examples, solutions, and use cases relevant to data quality and data integrity management.Definitely a must read book for finance, data, and tech professionals!

Amazon Verified review

Dennis David Mar 20, 2024

I was in the first chapter of the book, and I know this will be helpful for finance professionals working in the financial data analysis field. This book offers a comprehensive approach to ensuring data quality and provides practical strategies for maintaining and enhancing data integrity crucial to financial data management. This ensures accurate reporting and prevents unnecessary waste of financial resources. The book starts with a comprehensive knowledge of finance terms and presents real-world examples. Although I work in a similar field where data integrity is crucial to patient care and insurance claims, I learned a great deal of information that I enjoyed learning. As a technical person, I also liked how the Python and SQL code examples are clean and straight to the point.If you would like to have an arsenal of resources that you can reference when you need it, especially if you work in finance data analysis, this book is a must-have.

Bill S Mar 26, 2024

I was given the opportunity to peruse a copy of this book and WOW - it is jam-packed full of good advice and best practices, and TONS of specific how-to's for doing the work.Given my area of expertise, I zoomed in on the 3 chapters focused on using business intelligence tools to do the job, which are very thorough showing how to use Power BI, Tableau, and Alteryx to accomplish specific tasks. This is a really great resource. And it's chock full of screenshots and specific how-to's to get these tools to do what you need.I haven't read the rest of the book yet, but based on the table of contents and some quick scanning, it seems to be a very complete and clear source of information that will be useful to anyone in finance doing this work.

H2N Apr 01, 2024

This is a good guide for finance, data, and technical professionals seeking to manage financial data integrity effectively.The book gives introduces understanding and implementing strategies for maintaining data reliability and compliance, addressing the global gap in financial data analysis and management with topics from data integrity issues to the use of business intelligence tools and artificial intelligence for data quality management.

Managing Data Integrity for Finance: Discover practical data quality management strategies for finance analysts and data professionals

What do you get with Print?

Contact Details

Shipping Address

Billing Address

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with Print?

Contact Details

Shipping Address

Billing Address

Product Details

Packt Subscriptions

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

People who bought this also bought

About the author

FAQs