Preface
”Software is eating the world, but AI is going to eat software.”
— Jensen Huang, CEO of Nvidia
Machine learning (ML), part of the wider field of Artificial Intelligence (AI), is rightfully recognized as one of the most powerful tools available for organizations to extract value from their data. As the capabilities of ML algorithms have grown over the years, it has become increasingly obvious that implementing such algorithms in a scalable, fault-tolerant, and automated way requires the creation of new disciplines. These disciplines, ML Engineering (MLE) and ML Operations (MLOps), are the focus of this book.
The book covers a wide variety of topics in order to help you understand the tools, techniques, and processes you can apply to engineer your ML solutions, with an emphasis on introducing the key concepts so that you can build on them in your future work. The aim is to develop fundamentals and a broad understanding that will stand the test of time, rather than just provide a series of introductions to the latest tools, although we do cover a lot of the latest tools!
All of the code examples are given in Python, the most popular programming language in the world (at the time of writing) and the lingua franca for data applications. Python is a high-level and object-oriented language with a rich ecosystem of tools focused on data science and ML. For example, packages such as scikit-learn and pandas have have become part of the standard lexicon for data science teams across the world. The central tenet of this book is that knowing how to use packages like these is not enough. In this book, we will use these tools, and many, many more, but focus on how to wrap them up in production-grade pipelines and deploy them using appropriate cloud and open-source tools.
We will cover everything from how to organize your ML team, to software development methodologies and best practices, to automating model building through to packaging your ML code, how to deploy your ML pipelines to a variety of different targets and then on to how to scale your workloads for large batch runs. We will also discuss, in an entirely new chapter for this second edition, the exciting world of applying ML engineering and MLOps to deep learning and generative AI, including how to start building solutions using Large Language Models (LLMs) and the new field of LLM Operations (LLMOps).
The second edition of Machine Learning Engineering with Python goes into a lot more depth than the first edition in almost every chapter, with updated examples and more discussion of core concepts. There is also a far wider selection of tools covered and a lot more focus on open-source tools and development. The ethos of focusing on core concepts remains, but it is my hope that this wider view means that the second edition will be an excellent resource for those looking to gain practical knowledge of machine learning engineering.
Although a far greater emphasis has been placed on using open-source tooling, many examples will also leverage services and solutions from Amazon Web Services (AWS). I believe that the accompanying explanations and discussions will, however, mean that you can still apply everything you learn here to any cloud provider or even in an on-premises setting.
Machine Learning Engineering with Python, Second Edition will help you to navigate the challenges of taking ML to production and give you the confidence to start applying MLOps in your projects. I hope you enjoy it!