Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Azure Data Scientist Associate Certification Guide

You're reading from   Azure Data Scientist Associate Certification Guide A hands-on guide to machine learning in Azure and passing the Microsoft Certified DP-100 exam

Arrow left icon
Product type Paperback
Published in Dec 2021
Publisher Packt
ISBN-13 9781800565005
Length 448 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Authors (2):
Arrow left icon
Andreas Botsikas Andreas Botsikas
Author Profile Icon Andreas Botsikas
Andreas Botsikas
Michael Hlobil Michael Hlobil
Author Profile Icon Michael Hlobil
Michael Hlobil
Arrow right icon
View More author details
Toc

Table of Contents (17) Chapters Close

Preface 1. Section 1: Starting your cloud-based data science journey
2. Chapter 1: An Overview of Modern Data Science FREE CHAPTER 3. Chapter 2: Deploying Azure Machine Learning Workspace Resources 4. Chapter 3: Azure Machine Learning Studio Components 5. Chapter 4: Configuring the Workspace 6. Section 2: No code data science experimentation
7. Chapter 5: Letting the Machines Do the Model Training 8. Chapter 6: Visual Model Training and Publishing 9. Section 3: Advanced data science tooling and capabilities
10. Chapter 7: The AzureML Python SDK 11. Chapter 8: Experimenting with Python Code 12. Chapter 9: Optimizing the ML Model 13. Chapter 10: Understanding Model Results 14. Chapter 11: Working with Pipelines 15. Chapter 12: Operationalizing Models with Code 16. Other Books You May Enjoy

Adopting the DevOps mindset

DevOps is a team mindset that tries to minimize the silos between developers and system operators to shorten the development life cycle of a product. Developers are constantly changing a product to introduce new features and modify existing behaviors. On the other side, system operators need to keep the production systems stable and up and running. In the past, these two groups of people were isolated, and developers were throwing the new piece of software over to the operations team who would try to deploy it in production. As you can imagine, things didn't work that well all the time, causing frictions between those two groups. When it comes to DevOps, one fundamental practice is that a team needs to be autonomous and should contain all required disciplines, both developers and operators.

When it comes to data science, some people refer to the practice as MLOps, but the fundamental ideas remain the same. A team should be self-sufficient, capable of developing all required components for the overall solution, from the data engineering parts that bring in data and the training of the models all the way to operationalizing the model in production. These teams usually work in an agile manner, which embraces an iterative approach, seeking constant improvement based on feedback, as seen in Figure 1.7:

Figure 1.7 – The feedback flow in an agile MLOps team

Figure 1.7 – The feedback flow in an agile MLOps team

The MLOps team operates on its backlog and performs the iterative steps you saw in the Working on a data science project section. Once the model is ready, the system administrators, who are part of the team, are aware of what needs to be done to take the model into production. The model is monitored closely, and if a defect or performance degradation is observed, a backlog item is created for the MLOps team to address in their next sprint.

In order to minimize the development and deployment life cycle of new features in production, automation needs to be embraced. The goal of a DevOps team is to minimize the number of human interventions in the deployment process and automate as many repeatable tasks as possible.

Figure 1.8 shows the most frequently used components while developing real-time models using the MLOps mindset:

Figure 1.8 – Components usually seen in MLOps-driven data science projects

Figure 1.8 – Components usually seen in MLOps-driven data science projects

Let's analyze those components:

  • ARM templates allow you to automate the deployment of Azure resources. This enables the team to spin up and down development, testing, or even production environments in no time. These artifacts are stored within Azure DevOps in a Git version-control repository. The deployment of multiple environments is automated using Azure DevOps pipelines. You are going to read about ARM templates in Chapter 2, Deploying Azure Machine Learning Workspace Resources.
  • Using Azure Data Factory, the data science team orchestrates the pulling and cleansing of the data from the source systems. The data is copied within a data lake, which is accessible from the AzureML workspace. Azure Data Factory uses ARM templates to define its orchestration pipelines, templates that are stored within the Git repository to track changes and be able to deploy in multiple environments.
  • Within the AzureML workspace, data scientists are working on their code. Initially, they start working on Jupyter notebooks. Notebooks are a great way to prototype some ideas, as you will see in Chapter 7, The AzureML Python SDK. As the project progresses, the scripts are exported from the notebooks and are organized into coding scripts. All those code artifacts are version-controlled into Git, using the terminal and commands such as the ones seen in Figure 1.9:
Figure 1.9 – Versioning a notebook and a script file using Git within AzureML

Figure 1.9 – Versioning a notebook and a script file using Git within AzureML

  • When a model is trained, if it is performing better than the model that is currently in production, it is registered within AzureML, and an event is emitted. This event is captured by the AzureML DevOps plugin, which triggers the automatic deployment of the model in the test environment. The model is tested within that environment, and if all tests pass and no errors have been logged in Application Insights, which is monitoring the deployment, the artifacts can be automatically deployed to the next environment, all the way to production.

The ability to ensure both code and model quality plays a crucial role in this automation process. In Python, you can use various tools, such as Flake8, Bandit, and Black, to ensure code quality, check for common security issues, and consistently format your code base. You can also use the pytest framework to write your functional testing, where you will be testing the model results against a golden dataset. With pytest, you can even perform integration testing to verify that the end-to-end system is working as expected.

Adopting DevOps is a never-ending journey. The team will become better every time you repeat the process. The trick is to build trust in the end-to-end development and deployment process so that everyone is confident to make changes and deploy them in production. When the process fails, understand why it failed and learn from your mistakes. Create the mechanisms that will prevent future failures and move on.

You have been reading a chapter from
Azure Data Scientist Associate Certification Guide
Published in: Dec 2021
Publisher: Packt
ISBN-13: 9781800565005
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime