Introducing the exam domains
The exam was designed by a group of subject matter experts with different specialties in the field of data science. Together, they decided on common ground that any early career data analyst should know. They then categorized that knowledge into the following five domains:
- Data Concepts and Environments
- Data Mining
- Data Analysis
- Visualization
- Data Governance, Quality, and Control
Data Concepts and Environments
The domains move through the data pipeline chronologically. The first domain, Data Concepts and Environments, is largely about how data is stored. This covers multiple levels, from different database types, structures, and schemas, through file types for specific kinds of data, and even into different variable types. This domain is a broad view of storage concepts mixed with the ability to identify what type of data you can expect from different storage solutions.
Data Mining
This domain is a bit of a misnomer. Data mining is when you already have a huge dataset and you just go through it to find any insights that might be of interest, instead of answering specific questions. While data mining, you must go through all the concepts contained within this domain, but you also go through all these concepts for regular data analysis. What this domain is actually about is every step after storing your data but before you run an analysis. This domain includes collecting, querying, cleaning, and wrangling data. Effectively, these are the steps you need to take to get your data into a useful shape so you can analyze it.
Data Analysis
You have stored your data, you have pulled your data and made it pretty, and now it is time to do something with it. This domain is all about analyses. You will be expected to perform descriptive statistical analyses, understand the concepts behind inferential statistics, be able to pick appropriate types of analysis, and even know some common tools used in the field. You don’t need to be able to use any of these tools because the test is vendor-neutral, just be able to identify them.
Visualization
It doesn’t matter how perfect your analyses are if you can’t communicate the results. What’s the point in coming up with an equation that solves world hunger if you can’t explain it to anyone else? To that end, the next domain is all about visualizations and reporting. This covers what information a report should include, what type of report is most appropriate, who should get a report, when reports should be delivered, the basics of report design, types of visualizations, and even the process of developing a dashboard.
Data Governance, Quality, and Control
The final domain is made up of larger concepts that span the entire life cycle of data analytics. A large part of this is made up of policies. Some of the policies focus on protected data and how it can be handled legally, while other policies are more about how you can ensure the quality of your data. If your data has low quality, you can’t trust anything it says, and if you are mishandling protected information, you could face legal penalties, so these are important factors to know. This domain also includes a short section on the concept of master data management, as an example of an ideal state.
Now that you know what domains will be covered on the certification exam, let’s talk about how the exam is structured.