The Machine Learning Development Process
In this chapter, we will define how the work for any successful machine learning (ML) software engineering project can be divided up. Basically, we will answer the question of how you actually organize the doing of a successful ML project. We will not only discuss the process and workflow but we will also set up the tools you will need for each stage of the process and highlight some important best practices with real ML code examples.
In this edition, there will be more details on an important data science and ML project management methodology: Cross-Industry Standard Process for Data Mining (CRISP-DM). This will include a discussion of how this methodology compares to traditional Agile and Waterfall methodologies and will provide some tips and tricks for applying it to your ML projects. There are also far more detailed examples to help you get up and running with continuous integration/continuous deployment (CI/CD) using GitHub Actions, including how to run ML-focused processes such as automated model validation. The advice on getting up and running in an Interactive Development Environment (IDE) has also been made more tool-agnostic, to allow for those using any appropriate IDE. As before, the chapter will focus heavily on a “four-step” methodology I propose that encompasses a discover, play, develop, deploy workflow for your ML projects. This project workflow will be compared with the CRISP-DM methodology, which is very popular in data science circles. We will also discuss the appropriate development tooling and its configuration and integration for a successful project. We will also cover version control strategies and their basic implementation, and setting up CI/CD for your ML project. Then, we will introduce some potential execution environments as the target destinations for your ML solutions. By the end of this chapter, you will be set up for success in your Python ML engineering project. This is the foundation on which we will build everything in subsequent chapters.
As usual, we will conclude the chapter by summarizing the main points and highlighting what this means as we work through the rest of the book.
Finally, it is also important to note that although we will frame the discussion here in terms of ML challenges, most of what you will learn in this chapter can also be applied to other Python software engineering projects. My hope is that the investment in building out these foundational concepts in detail will be something you can leverage again and again in all of your work.
We will explore all of this in the following sections and subsections:
- Setting up our tools
- Concept to solution in four steps:
- Discover
- Play
- Develop
- Deploy
There is plenty of exciting stuff to get through and lots to learn – so let’s get started!