Understanding Data Governance
Data governance, in an enterprise context, is a data management concept that aims to ensure a high level of data quality throughout the complete life cycle of the data.
The data governance concept can be extended to several focus areas. Enterprises typically focus on topics such as data usability, availability, security, and integrity. This includes any required processes that need to be followed during the different stages of the data life cycle, such as data stewardship, which ensures that the quality of the data is always up to a high standard, and other activities that ensure the data is accessible and available for all consuming applications and entities.
Data governance aims to do the following:
- Increase consistency and confidence in data-driven decisions. This, in turn, enables better decision-making capabilities across the enterprise.
- Break down data silos.
- Ensure that the right data is used for the right purposes. This is done to block potential misuse and to reduce the risk of creating data errors within the systems.
- Decrease the risk associated with regulatory requirements, avoiding fines.
- Continuously monitor and improve data security, as well as define and verify requirements for data distribution policies.
- Enable data monetization.
- Increase information quality by defining accountabilities.
- Enable modern, customer-centric user journeys based on high-quality, trusted data.
- Minimize the need for rework due to a technical department being created by poorly governed activities.
Data governance bodies usually create and maintain the following artifacts:
- Data mapping and classification: This helps with documenting the enterprise’s data assets, and related data flows. Datasets can be classified based on specific criteria, such as containing personal information or confidential data. This, in turn, influences how data governance policies are applied to each dataset.
- Business glossary: This contains definitions of the business terms used in an enterprise. A good example is the definition of what constitutes an active customer.
- Data catalog: These are normally created by collecting metadata from across systems. They are then used to create an inventory of available data assets. Governance policies and information about topics such as automation mechanisms can also be built into catalogs.
A well-designed data governance program typically includes a combination of data stewards and a team that acts as a governing body. They work together to create the required standards and policies for governing the data, as well as implementing and executing the planned activities and procedures. This is mainly carried out by the data stewards.
A data steward is a role within the enterprise and is responsible for maintaining and using the organization’s data governance processes to ensure the availability and quality of both the data and metadata. The data steward also has the responsibility to utilize policies, guidelines, and processes to administer the organization’s data in compliance with given policy and/or regulatory obligations. The data steward and the data custodian may share some responsibilities.
A data custodian is a role within the enterprise and is responsible for transporting and storing data, rather than topics such as what data is going into the system and why. Data stewards are normally responsible for what is stored in datasets, while data custodians cover the technical details, such as environment and database structure. Data custodians are sometimes referred to as database administrators or extract transform load (ETL) developers.
Now that you understand the activities that are covered by the data governing body, as well as data stewards and custodians, move on to have a look at one of the key topics they need to cover in their data strategy—data security.