Until a couple of years ago, people barely knew the term 'data science' which has now evolved into an extremely popular career field. The Harvard Business Review dubbed data scientist within the data science team as the sexiest job of the 21st century and expert professionals jumped on the data is the new oil bandwagon.
As per the Figure Eight Report 2018, which takes the pulse of the data science community in the US, a lot has changed rapidly in the data science field over the years. For the 2018 report, they surveyed approximately 240 data scientists and found out that machine learning projects have multiplied and more and more data is required to power them. Data science and machine learning jobs are LinkedIn's fastest growing jobs. And the internet is creating 2.5 quintillion bytes of data to process and analyze each day.
With all these changes, it is evident for data science teams to evolve and change among various organizations. The data science team is responsible for delivering complex projects where system analysis, software engineering, data engineering, and data science is used to deliver the final solution.
To achieve all of this, the team does not only have a data scientist or a data analyst but also includes other roles like business analyst, data engineer or architect, and chief data officer. In this post, we will differentiate and discuss various job roles within a data science team, skill sets required and the compensation benefit for each one of them.
For an in-depth understanding of data science teams, read the book, Managing Data Science by Kirill Dubovikov, which has interesting case studies on building successful data science teams. He also explores how the team can efficiently manage data science projects through the use of DevOps and ModelOps.
Now let's get into understanding individual data science roles and functions, but before that we take a look at the structure of the team.There are three basic team structures to match different stages of AI/ML adoption:
At times for companies hiring a data science team is not an option, and they have to leverage in-house talent. During such situations, they take advantage of the fully functional in-house IT department. The IT team manages functions like data preparation, training models, creating user interfaces, and model deployment within the corporate IT infrastructure. This approach is fairly limited, but it is made practical by MLaaS solutions. Environments like Microsoft Azure or Amazon Web Services (AWS) are equipped with approachable user interfaces to clean datasets, train models, evaluate them, and deploy.
Microsoft Azure, for instance, supports its users with detailed documentation for a low entry threshold. The documentation helps in fast training and early deployment of models even without an expert data scientists on board.
Within the integrated structure, companies have a data science team which focuses on dataset preparation and model training, while IT specialists take charge of the interfaces and infrastructure for model deployment. Combining machine learning expertise with IT resource is the most viable option for constant and scalable machine learning operations.
Unlike the IT centric approach, the integrated method requires having an experienced data scientist within the team. This approach ensures better operational flexibility in terms of available techniques. Additionally, the team leverages deeper understanding of machine learning tools and libraries – like TensorFlow or Theano which are specifically for researchers and data science experts.
Companies can also have an independent data science department to build an all-encompassing machine learning applications and frameworks. This approach entails the highest cost. All operations, from data cleaning and model training to building front-end interfaces, are handled by a dedicated data science team. It doesn't necessarily mean that all team members should have a data science background, but they should have technology background with certain service management skills.
A specialized structure model aids in addressing complex data science tasks that include research, use of multiple ML models tailored to various aspects of decision-making, or multiple ML backed services.
Today's most successful Silicon Valley tech operates with specialized data science teams. Additionally they are custom-built and wired for specific tasks to achieve different business goals. For example, the team structure at Airbnb is one of the most interesting use cases. Martin Daniel, a data scientist at Airbnb in this talk explains how the team emphasizes on having an experimentation-centric culture and apply machine learning rigorously to address unique product challenges.
As discussed earlier, there are many roles within a data science team. As per Michael Hochster, Director of Data Science at Stitch Fix, there are two types of data scientists: Type A and Type B.
Type A stands for analysis. Individuals involved in Type A are statisticians that make sense of data without necessarily having strong programming knowledge. Type A data scientists perform data cleaning, forecasting, modeling, visualization, etc.
Type B stands for building. These individuals use data in production. They're good software engineers with strong programming knowledge and statistics background. They build recommendation systems, personalization use cases, etc.
Though it is rare that one expert will fit into a single category. But understanding these data science functions can help make sense of the roles described further.
The chief data officer (CDO) role has been taking organizations by storm. A recent NewVantage Partners' Big Data Executive Survey 2018 found that 62.5% of Fortune 1000 business and technology decision-makers said their organization appointed a chief data officer.
The role of chief data officer involves overseeing a range of data-related functions that may include data management, ensuring data quality and creating data strategy. He or she may also be responsible for data analytics and business intelligence, the process of drawing valuable insights from data.
Even though chief data officer and chief analytics officer (CAO) are two distinct roles, it is often handled by the same person. Expert professionals and leaders in analytics also own the data strategy and how a company should treat its data. It does make sense as analytics provide insights and value to the data. Hence, with a CDO+CAO combination companies can take advantage of a good data strategy and proper data management without losing on quality.
According to compensation analysis from PayScale, the median chief data officer salary is $177,405 per year, including bonuses and profit share, ranging from $118,427 to $313,791 annually.
Skill sets required: Data science and analytics, programming skills, domain expertise, leadership and visionary abilities are required.
The data analyst role implies proper data collection and interpretation activities. The person in this job role will ensure that collected data is relevant and exhaustive while also interpreting the results of the data analysis. Some companies also require data analysts to have visualization skills to convert alienating numbers into tangible insights through graphics. As per Indeed, the average salary for a data analyst is $68,195 per year in the United States.
Skill sets required: Programming languages like R, Python, JavaScript, C/C++, SQL. With this critical thinking, data visualization and presentation skills will be good to have.
Data scientists are data experts who have the technical skills to solve complex problems and the curiosity to explore what problems are needed to be solved. A data scientist is an individual who develops machine learning models to make predictions and is well versed in algorithm development and computer science. This person will also know the complete lifecycle of the model development.
A data scientist requires large amounts of data to develop hypotheses, make inferences, and analyze customer and market trends. Basic responsibilities include gathering and analyzing data, using various types of analytics and reporting tools to detect patterns, trends and relationships in data sets.
According to Glassdoor, the current U.S. average salary for a data scientist is $118,709.
Skills set required: A data scientist will require knowledge of big data platforms and tools like Seahorse powered by Apache Spark, JupyterLab, TensorFlow and MapReduce; and programming languages that include SQL, Python, Scala and Perl; and statistical computing languages, such as R. They should also have cloud computing capabilities and knowledge of various cloud platforms like AWS, Microsoft Azure etc.You can also read this post on how to ace a data science interview to know more.
At times a data scientist is confused with machine learning engineers, but a machine learning engineer is a distinct role that involves different responsibilities.
A machine learning engineer is someone who is responsible for combining software engineering and machine modeling skills. This person determines which model to use and what data should be used for each model. Probability and statistics are also their forte. Everything that goes into training, monitoring, and maintaining a model is the ML engineer's job. The average machine learning engineer's salary is $146,085 in the US, and is ranked No.1 on the Indeed's Best Jobs in 2019 list.
Skill sets required: Machine learning engineers will be required to have expertise in computer science and programming languages like R, Python, Scala, Java etc. They would also be required to have probability techniques, data modelling and evaluation techniques.
The data architects and data engineers work in tandem to conceptualize, visualize, and build an enterprise data management framework. The data architect visualizes the complete framework to create a blueprint, which the data engineer can use to build a digital framework. The data engineering role has recently evolved from the traditional software-engineering field. Recent enterprise data management experiments indicate that the data-focused software engineers are needed to work along with the data architects to build a strong data architecture.
Average salary for a data architect in the US ranges from $1,22,000 to $1,29, 000 annually as per a recent LinkedIn survey.
Skill sets required: A data architect or an engineer should have a keen interest and experience in programming languages frameworks like HTML5, RESTful services, Spark, Python, Hive, Kafka, and CSS etc. They should have the required knowledge and experience to handle database technologies such as PostgreSQL, MapReduce and MongoDB and visualization platforms such as; Tableau, Spotfire etc.
A business analyst (BA) basically handles Chief analytics officer's role but on the operational level. This implies converting business expectations into data analysis. If your core data scientist lacks domain expertise, a business analyst can bridge the gap. They are responsible for using data analytics to assess processes, determine requirements and deliver data-driven recommendations and reports to executives and stakeholders.
BAs engage with business leaders and users to understand how data-driven changes will be implemented to processes, products, services, software and hardware. They further articulate these ideas and balance them against technologically feasible and financially reasonable. The average salary for a business analyst is $75,078 per year in the United States, as per Indeed.
Skill sets required: Excellent domain and industry expertise will be required. With this good communication as well as data visualization skills and knowledge of business intelligence tools will be good to have.
This specific role is not present in each of the data science teams as some of the responsibilities are realized by either a data analyst or a data architect. Hence, this role is only necessary for a specialized data science model. The role of a data visualization engineer involves having a solid understanding of UI development to create custom data visualization elements for your stakeholders. Regardless of the technology, successful data visualization engineers have to understand principles of design, both graphical and more generally user-centered design. As per Payscale, the average salary for a data visualization engineer is $98,264.
Skill sets required: A data visualization engineer need to have rigorous knowledge of data visualization methods and be able to produce various charts and graphs to represent data. Additionally they must understand the fundamentals of design principles and visual display of information.
To sum it up, a data science team has evolved to create a number of job roles and opportunities, but companies still face challenges in building up the team from scratch and find it hard to figure where to start from.
If you are facing a similar dilemma, check out this book, Managing Data Science, written by Kirill Dubovikov. It covers concepts and methodologies to manage and deliver top-notch data science solutions, while also providing guidance on hiring, growing and sustaining a successful data science team.
How to learn data science: from data mining to machine learning
How to ace a data science interview
Data science vs. machine learning: understanding the difference and what it means today
30 common data science terms explained