Understanding the role of data personas
Since data engineering is such a crucial field, you may be wondering who the main players are and what skill sets they possess. Building a data product involves several folks, all of whom need to come together with seamless handoffs to ensure a successful end product or service is created. It would be a mistake to create silos and increase both the number and complexity of integration points as each additional integration is a potential failure point. Data engineering has a fair overlap with software engineering and data science tasks:
All these roles require an understanding of data engineering:
- Data engineers focus on maintaining how the data pipelines that ingest and transform data run. This has a lot in common with a software engineering role coupled with lots of data.
- BI analysts focus on SQL-based reporting and can be operational or domain-specific subject-matter experts (SMEs) such as financial or supply chain analysts.
- Data scientists and ML practitioners are statisticians who explore and analyze the data (via Exploratory Data Analysis (EDA)) and use modeling techniques at various levels of sophistication.
- DevOps and MLOps focus on the infrastructure aspects of monitoring and automation. MLOps is DevOps coupled with the additional task of managing the life cycle of analytic models.
- ML engineers refer to folks who can span across both the data engineer and data scientist roles.
- Data leaders are chief data officers – that is, data stewards who are at the top of the food chain in terms of the ultimate governors of data.
The following diagram shows the typical placement of the four main data personas working collaboratively on a data platform to produce business insights to give the company a competitive advantage in the industry:
Let's take a look at a few of these points in more detail:
- DevOps is responsible for ensuring all operational aspects of the data platform and traditionally does a lot of scripting and automation.
- Data/ML engineers are responsible for building the data pipeline and taking care of the extract, transform, load (ETL) aspects of the pipeline.
- Data scientists of varying skill levels build models.
- Business analysts create reporting dashboards from aggregated curated data.