What this book covers
This book is structured into three parts. Firstly, we cover data science and its foundations in statistics. Then, we cover machine learning as it relates to data science, including core machine learning concepts, applications, and pitfalls to avoid. Finally, we cover how to lead successful data science projects and teams. If you are already familiar with the foundations of data science and the core statistical concepts covered in Part 1, you may wish to skip ahead to Part 2 or refresh your knowledge.
Part 1: Understanding Data Science and Its Foundations
Chapter 1, Introducing Data Science, will provide you with a foundational understanding of data science, its relationship to AI and machine learning, and key statistical concepts. It explores descriptive and inferential statistics, probability, and data distributions, establishing a common language for readers.
Chapter 2, Characterizing and Collecting Data, will give you the knowledge of how to distinguish between different types of data, including first-, second-, and third-party data, as well as structured, unstructured, and semi-structured data. It explores technologies and methods for collecting, storing, and processing data, and provides guidance on navigating the landscape of data-focused solutions, including cloud, on-premises, and hybrid solutions.
Chapter 3, Exploratory Data Analysis, introduces the process of exploratory data analysis (EDA) and its importance in understanding data, developing hypotheses, and building better models. The chapter provides hands-on code examples in Python to reinforce the concepts, with step-by-step explanations suitable for readers with no prior experience in Python.
Chapter 4, The Significance of Significance, explores the concept of statistical significance and its importance in making data-driven decisions. It covers hypothesis testing, also known as significance testing, and provides practical examples to illustrate its application in business scenarios, such as reducing customer churn and evaluating machine learning model improvements.
Chapter 5, Understanding Regression, introduces regression as a powerful statistical tool for uncovering patterns and relationships within data. It explores various use cases for regression in a business context. The chapter begins with the foundational concept of trend lines before delving into the complexities of regression analysis.
Part 2: Machine Learning – Concepts, Applications, and Pitfalls
Chapter 6, Introducing Machine Learning, provides an overview of machine learning and its importance in data-driven decision-making. It covers the progression from traditional statistics to machine learning, the various types of machine learning techniques, and the process of training, validating, and testing models.
Chapter 7, Supervised Machine Learning, focuses on one of the most utilized and beneficial subfields of machine learning. It discusses the steps involved in training and deploying supervised machine learning models and core supervised learning algorithms, as well as factors to consider when training and evaluating these models and their applications.
Chapter 8, Unsupervised Machine Learning, explores the field of unsupervised learning, where algorithms discover hidden patterns and insights from unlabeled data. The chapter covers practical examples of unsupervised learning, the key steps involved, and techniques such as clustering, anomaly detection, dimensionality reduction, and association rule learning. It emphasizes the distinct nature of unsupervised learning compared to supervised learning and highlights its potential for uncovering valuable information in data without prior training.
Chapter 9, Interpreting and Evaluating Machine Learning Models, equips readers with the skills needed to assess the accuracy and reliability of machine learning models. You will learn how to use evaluation metrics to measure model performance and understand the importance of using holdout (test) data for unbiased evaluation. The chapter provides insights into the differences between evaluation metrics for regression and classification models, enabling readers to effectively interpret and validate the quality of machine learning models, ensuring their successful implementation in real-world scenarios.
Chapter 10, Common Pitfalls in Machine Learning, provides readers with the knowledge to identify and address common challenges in developing and deploying machine learning models. It covers issues such as inadequate or poor-quality training data, overfitting and underfitting, training-serving skew, model drift, and bias and fairness. You will learn practical strategies to mitigate these pitfalls, ensuring your models are reliable, accurate, and equitable, ultimately leading to better business decisions and outcomes.
Part 3: Leading Successful Data Science Projects and Teams
Chapter 11, The Structure of a Data Science Project, provides a comprehensive framework for planning and executing data science projects, focusing on delivering impactful data products. You will learn how to identify, evaluate, and prioritize use cases that align with your organization’s goals and have the potential to drive real business value. The chapter covers the key stages of data product development, from data preparation to model design, evaluation, and deployment. You will also learn how to evaluate the business impact of your data products by selecting relevant metrics and KPIs, enabling you to demonstrate the tangible value and ROI of your initiatives and secure ongoing support for your projects.
Chapter 12, The Data Science Team, looks at the art and science of assembling a high-performing data science team. You will learn about the key roles that make up a successful team, including data scientists, machine learning engineers, and data engineers, along with the skills and expertise each role brings to the table. The chapter explores different operating models for structuring data science teams within larger organizations.
Chapter 13, Managing the Data Science Team, explores the unique challenges and best practices for leading data science teams effectively. It covers strategies for enabling rapid experimentation, managing uncertainty, balancing research and production work, communicating effectively, fostering continuous learning, and promoting collaboration. The chapter also discusses common challenges such as aligning projects with business goals, scaling and deploying models, ensuring fairness and ethics, and driving the adoption of data science solutions.
Chapter 14, Continuing Your Journey as a Data Science Leader, provides guidance on navigating the rapidly evolving landscape of data science, machine learning, and AI. It explores strategies for staying current with emerging technologies, specializing in specific industries or fields, and embracing continuous learning. The chapter also discusses the importance of staying informed about the latest trends and news and how data science leaders can promote data-driven thinking within their organizations.
To get the most out of this book, some familiarity with basic mathematical concepts such as algebra, probability, and statistics is helpful but not required. The real prerequisites are curiosity, a willingness to learn, and a drive to use data for the good of your organization. If you bring those qualities, this book will supply the knowledge and practical skills you need. Step by step, you’ll learn to wield the tools of data science and AI with clarity, confidence, and purpose.
Software/hardware covered in the book |
Operating system requirements |
Python (Google Colab) |
Windows, macOS, or Linux A Google account (to access Google Colab) A modern web browser (Google Chrome, Mozilla Firefox, Microsoft Edge, or Apple Safari) |
Setup instructions will be provided in the chapters where there are code exercises.