Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Machine Learning Engineering  with Python

You're reading from   Machine Learning Engineering with Python Manage the lifecycle of machine learning models using MLOps with practical examples

Arrow left icon
Product type Paperback
Published in Aug 2023
Publisher Packt
ISBN-13 9781837631964
Length 462 pages
Edition 2nd Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Andrew P. McMahon Andrew P. McMahon
Author Profile Icon Andrew P. McMahon
Andrew P. McMahon
Arrow right icon
View More author details
Toc

Table of Contents (12) Chapters Close

Preface 1. Introduction to ML Engineering 2. The Machine Learning Development Process FREE CHAPTER 3. From Model to Model Factory 4. Packaging Up 5. Deployment Patterns and Tools 6. Scaling Up 7. Deep Learning, Generative AI, and LLMOps 8. Building an Example ML Microservice 9. Building an Extract, Transform, Machine Learning Use Case 10. Other Books You May Enjoy
11. Index

Setting up our tools

To prepare for the work in the rest of this chapter, and indeed the rest of the book, it will be helpful to set up some tools. At a high level, we need the following:

  • Somewhere to code
  • Something to track our code changes
  • Something to help manage our tasks
  • Somewhere to provision infrastructure and deploy our solution

Let’s look at how to approach each of these in turn:

  • Somewhere to code: First, although the weapon of choice for coding by data scientists is of course Jupyter Notebook, once you begin to make the move toward ML engineering, it will be important to have an IDE to hand. An IDE is basically an application that comes with a series of built-in tools and capabilities to help you to develop the best software that you can. PyCharm is an excellent example for Python developers and comes with a wide variety of plugins, add-ons, and integrations useful to ML engineers. You can download the Community Edition from JetBrains at https://www.jetbrains.com/pycharm/. Another popular development tool is the lightweight but powerful source code editor VS Code. Once you have successfully installed PyCharm, you can create a new project or open an existing one from the Welcome to PyCharm window, as shown in Figure 2.1:
    Figure 2.1 – Opening or creating your PyCharm project

    Figure 2.1: Opening or creating your PyCharm project.

  • Something to track code changes: Next on the list is a code version control system. In this book, we will use GitHub but there are a variety of solutions, all freely available, that are based on the same underlying open-source Git technology. Later sections will discuss how to use these as part of your development workflow, but first, if you do not have a version control system set up, you can navigate to github.com and create a free account. Follow the instructions on the site to create your first repository, and you will be shown a screen that looks something like Figure 2.2. To make your life easier later, you should select Add a README file and Add .gitignore (then select Python). The README file provides an initial Markdown file for you to get started with and somewhere to describe your project. The .gitignore file tells your Git distribution to ignore certain types of files that in general are not important for version control. It is up to you whether you want the repository to be public or private and what license you wish to use. The repository for this book uses the MIT license:
    Figure 2.2 – Setting up your GitHub repository

    Figure 2.2: Setting up your GitHub repository.

    Once you have set up your IDE and version control system, you need to make them talk to each other by using the Git plugins provided with PyCharm. This is as simple as navigating to VCS | Enable Version Control Integration and selecting Git. You can edit the version control settings by navigating to File | Settings | Version Control; see Figure 2.3:

    Figure 2.3 – Configuring version control with PyCharm

    Figure 2.3: Configuring version control with PyCharm.

  • Something to help manage our tasks: You are now ready to write Python and track your code changes, but are you ready to manage or participate in a complex project with other team members? For this, it is often useful to have a solution where you can track tasks, issues, bugs, user stories, and other documentation and items of work. It also helps if this has good integration points with the other tools you will use. In this book, we will use Jira as an example of this. If you navigate to https://www.atlassian.com/software/jira, you can create a free cloud Jira account and then follow the interactive tutorial within the solution to set up your first board and create some tasks. Figure 2.4 shows the task board for this book project, called Machine Learning Engineering in Python (MEIP):

    Figure 2.4: The task board for this book in Jira.

  • Somewhere to provision infrastructure and deploy our solution: Everything that you have just installed and set up is tooling that will really help take your workflow and software development practices to the next level. The last piece of the puzzle is having the tools, technologies, and infrastructure available for deploying the end solution. The management of computing infrastructure for applications was (and often still is) the provision of dedicated infrastructure teams, but with the advent of public clouds, there has been real democratization of this capability for people working across the spectrum of software roles. In particular, modern ML engineering is very dependent on the successful implementation of cloud technologies, usually through the main public cloud providers such as Amazon Web Services (AWS), Microsoft Azure, or Google Cloud Platform (GCP). This book will utilize tools found in the AWS ecosystem, but all of the tools and techniques you will find here have equivalents in the other clouds.

The flip side of the democratization of capabilities that the cloud brings is that teams who own the deployment of their solutions have to gain new skills and understanding. I am a strong believer in the principle that “you build it, you own it, you run it” as far as possible, but this means that as an ML engineer, you will have to be comfortable with a host of potential new tools and principles, as well as owning the performance of your deployed solution. With great power comes great responsibility and all that. In Chapter 5, Deployment Patterns and Tools, we will dive into this topic in detail.

Let’s talk through setting this up.

Setting up an AWS account

As previously stated, you don’t have to use AWS, but that’s what we’re going to use throughout this book. Once it’s set up here, you can use it for everything we’ll do:

  1. To set up an AWS account, navigate to aws.amazon.com and select Create Account. You will have to add some payment details but everything we mention in this book can be explored through the free tier of AWS, where you do not incur a cost below a certain threshold of consumption.
  2. Once you have created your account, you can navigate to the AWS Management Console, where you can see all the services that are available to you (see Figure 2.5):
Figure 2.5 – The AWS Management Console

Figure 2.5: The AWS Management Console.

With our AWS account ready to go, let’s look at the four steps that cover the whole process.

You have been reading a chapter from
Machine Learning Engineering with Python - Second Edition
Published in: Aug 2023
Publisher: Packt
ISBN-13: 9781837631964
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image