What is data governance?
Before we dive in, it’s important that we ground ourselves in basic definitions. During my first role in data management, we made the mistake of assuming that our stakeholders around the organization were aligned on what data we were referring to when we were discussing a particular domain of data. After several months of having difficult conversations on scope (if a particular data element, report, or system were in scope), we realized that we needed to go back and ground all stakeholders in a few very simple questions.
Data governance is the formal orchestration of people, processes, and technology by which an organization brings together the right data at the right time with the right controls to enable the company to drive efficient and effective business results. This formal orchestration should control, protect, deliver, and further enhance the value of data and create equity for an organization. Data governance is active and is delivered through capabilities, including the following:
- Metadata management
- Data lineage
- Data quality
- Data architecture
- Mastering data
- Data operations
We will explore these core capabilities, among other methods, in detail in subsequent chapters. The capabilities that make up a successful data governance program are defined slightly differently in just about every organization. Therefore, it is important that we define them here for the purposes of this book. Feel free to use the vocabulary in this text within your organization or the common language of your business.
Important note
Take the time to build a quick reference guide that defines the most basic terms used around your data governance program (e.g., data, governance, metadata, and so on). Make it accessible to the whole organization as a quick reference guide. Add to it as needed.
Data versus information
I want to point out that there is a passion for the use of data versus information terminology among industry veterans. Some practitioners are firm in their beliefs that these terms are not the same and should not be used interchangeably. Others use them synonymously without much thought. In my humble opinion, either can be appropriate for your organization. The important point is to distinguish between the two so that your organization understands the definitions and how to use them appropriately in your organization. Personally, I do not believe either position is correct or incorrect. It is far more important that you meet your stakeholders where they are and that your organization agrees on the alignment you choose to use. For the purpose of this book, I will use the term “data” primarily, and I will be sure to be specific about what that means.
Use case – financial services company
In my very first data governance position, we launched a robust and multi-million dollar transformation to comply with a regulatory requirement around data management and regulatory reporting. About six months into the effort, we found we were really struggling to define what was “in” vs. “out” of the scope of the program. After several curricular and passionate conversations, we learned that we were not able to scope well because, ultimately, our stakeholders had differing views about what constituted “data” vs. “metrics.” We ended up building a full-blown methodology to ground the company and our regulators on how we thought about the reports so as to be in scope, built a full list of all reports, and documented whether each one either met the criteria or did not meet the criteria, and this was to be available for a credible challenge to anyone or any group interested. Instead of debating it theoretically, we documented the criteria with specificity and then clearly articulated the justification.
What I learned in this experience was two-fold: you cannot make assumptions regarding what people know or don’t know when scoping a data program, and that you must have grounding definitions that can be socialized, agreed to, and documented so that all involved could remain grounded.
I’ll ask us to do the same throughout this book. Please come back to these definitions as needed so we can be aligned.
What data governance is not
Too often, companies have a tendency to blame problems on the data and/or the data team. Data governance (team or program) is not the solution to every problem. Data, like air, is everywhere in an organization, and it truly takes the entire organization to manage it well. Similar to the quality of air when a fire breaks out, poor data moves through an organization like smoke moves from a fire. The strong management of data requires prevention, detection, and correction, and to manage data well requires the entire company to be on board. A single data team cannot unilaterally solve every data problem. It will take the involvement and action of the organization at large to drive change and manage data effectively.
Secondly, data will never be perfect. If you or your executive team is expecting perfection from data governance, I would urge you to adjust your expectations. To ensure we align on what the appropriate expectations and objectives of a successful data governance program are, we must define success. To do that, we must start with the objective of data governance.
The objective of data governance – create business value
To put it simply, companies exist to increase value for stakeholders. When it comes to data, there is one very important objective of data to increase equity for stakeholders. Managing data effectively is one of the ways companies can increase value for their organization.
Figure 1.1 – A simple value equation
An asset is something of economic value that is owned by an organization. A liability is an obligation (either current or future) that decreases the overall value of the organization. Thus, when assets minus liabilities result in a positive value, the organization has an increase in value (i.e., has created equity), whereas when assets minus liabilities results in a negative value, the organization has a decrease in value (i.e., has reduced equity).
The same mindset can be applied to data. Data can impact equity in a number of ways. Equity can be created through addressing and minimizing operational risks by sustaining regulatory compliance, avoiding fines and penalties, and increasing or creating revenue. I break this concept down into two key subcomponents to manage data governance more specifically. These two subcomponents (assets and liabilities) are directly influenced by my formal training as an accountant and IT auditor, and this tends to resonate well with management when they translate data solutions into measurable value (ideally, monetary value, but may also consider the time value of employees).
Important note
Data is an asset when it creates value for the organization.
A few examples include:
- Curated datasets that are used for multiple purposes
- Customer health scoring
- An authorized provisioning point
- A data model used for predictive modeling
Important note
Data is a liability when it creates risk for the organization. Data can be both of these things but cannot be either (for example, a data solution may create value and reduce risk).
A few examples include:
- Non-cataloged data
- Data that has not been classified and, therefore, not appropriately secured
- Data leaks/breached data
Ideally, organizations should manage the liability of data while maximizing data as a strategic asset, such that data equity is created. Depending on your business and the maturity of your data governance practices, either asset management or liability management may be a bigger priority.
Data governance should create data equity by increasing the value of data as an asset and minimizing data liabilities. I encourage you to come back to this framing as you apply the principles in this book to your own organization. As you pitch data solutions, consider this:
How is this solution increasing the value of my data (increasing the asset) and/or decreasing the liability?
Both are of value. The momentum created by delivery should translate directly to an increase in data equity over time.
An example of a data asset might be a curated dataset that is reliable because it has clear ownership, is of high quality, and can be leveraged for multiple business purposes organization-wide. An example of a data liability might be as simple as an organization not knowing what data it has, where it lives, or what to do with it. This carries a risk to the company from a security perspective, but also, the lack of accountability means that individuals may be using the data inappropriately for decisions that it is not fit for, increasing the company’s risk of making a decision that it shouldn’t be based on data that were never intended to be used for that particular purpose.
The measurement of the value of an asset is unique to each organization, but in short, being able to tie back the impact to the organization is a good guiding principle. The following are a few example questions to consider as you attempt to value the data asset:
- Does this asset enable additional revenue? How much?
- Does this asset save time? Can you calculate the hours saved by an hourly rate for an individual to calculate the person-hours saved?
- Does this asset improve customer satisfaction? Can this satisfaction be translated or calculated into value for the organization in terms of additional spending or increased customer retention?
Figure 1.2 – Data assets, liabilities, and equity formula
Data assets may provide value across these components, and value should be calculated accordingly. The most important part of this valuation exercise is not the calculation itself; rather, it is the alignment and agreement with the business. Once you have calculated the value, it is important to go to the business and ask for their feedback. Do they agree with your assessment? If yes, then you have a fully vetted value for your data asset. If not, work with the business to iterate on your data asset valuation until you reach an agreement. If you skip this important step (vetting the value), data teams often are seen to be overselling their value to the organization. This immediately undermines your credibility in the organization. Agreeing on the value of the business supports a strong business relationship and provides credibility of past success when seeking future investment into data solutions.
The measure of the liability portion of the equation is of equal importance. Like data assets, the measurement of the liability carried by an organization’s data will vary based on your organization.
Important note
It is not as simple as more data equals more liability.
Rather, the less the data is managed, the higher the liability. When data is unmanaged, the risk to the organization is higher.
A great example is security risk. When an organization does not understand where data is, it cannot effectively or adequately protect it. This comes at a high risk (liability) to the organization and could result in a data leak or, worse, a data breach. Here are a few questions to consider when calculating your organization’s data liability:
- Do data liabilities increase the risk to the organization? How much? Are there fines or regulatory penalties we could be subjected to as a result of this liability?
- Does liability drive inefficiencies in our business? Can you calculate the hours incurred by an hourly rate for an individual to calculate the person-hours impacted due to the inefficiency (for example, a manual process vs. an automated one)?
- Does this liability impact customer satisfaction? Can this satisfaction be translated or calculated into a decrease in value for the organization in terms of additional spending or decreased customer attrition?
Once you have assessed your data asset value and data liability value, you can apply this to calculate data equity. The idea is to increase the equity over time. This initial calculation can serve as your baseline by which to calculate progress over time. Organizations also may like to leverage a data maturity model to measure progress; however, these models can be interpreted widely in an organization and do not take into account the business value associated with data solutions. Instead, they focus on the development of data capabilities, which do not always translate well for executive management. I prefer to focus on business value rather than an organization vs. a maturity model.
We will not dive into data monetization efforts in this book. The economics of the monetization of data is expertly described in Doug Laney’s book, Infonomics, and I would highly recommend his book to anyone looking to dive into the monetization of data further.