Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Databricks ML in Action

You're reading from   Databricks ML in Action Learn how Databricks supports the entire ML lifecycle end to end from data ingestion to the model deployment

Arrow left icon
Product type Paperback
Published in May 2024
Publisher Packt
ISBN-13 9781800564893
Length 280 pages
Edition 1st Edition
Languages
Arrow right icon
Authors (4):
Arrow left icon
Hayley Horn Hayley Horn
Author Profile Icon Hayley Horn
Hayley Horn
Amanda Baker Amanda Baker
Author Profile Icon Amanda Baker
Amanda Baker
Anastasia Prokaieva Anastasia Prokaieva
Author Profile Icon Anastasia Prokaieva
Anastasia Prokaieva
Stephanie Rivera Stephanie Rivera
Author Profile Icon Stephanie Rivera
Stephanie Rivera
Arrow right icon
View More author details
Toc

Table of Contents (13) Chapters Close

Preface 1. Part 1: Overview of the Databricks Unified Data Intelligence Platform FREE CHAPTER
2. Chapter 1: Getting Started and Lakehouse Concepts 3. Chapter 2: Designing Databricks: Day One 4. Chapter 3: Building the Bronze Layer 5. Part 2: Heavily Project Focused
6. Chapter 4: Getting to Know Your Data 7. Chapter 5: Feature Engineering on Databricks 8. Chapter 6: Tools for Model Training and Experimenting 9. Chapter 7: Productionizing ML on Databricks 10. Chapter 8: Monitoring, Evaluating, and More 11. Index 12. Other Books You May Enjoy

What this book covers

Chapter 1, Getting Started and Lakehouse Concepts, covers the different techniques and methods for data engineering and machine learning. The goal is not to unveil insights into data never seen before. If that were the case, this would be an academic paper. Instead, the goal of this chapter is to use open and free data to demonstrate advanced technology and best practices. You will list and describe each dataset present in the book.

Chapter 2, Designing Databricks: Day One, covers workspace design, model life cycle practices, naming conventions, what not to put in DBFS, and other preparatory topics. The Databricks platform is simple to use. However, there are many options available to cater to the different needs of different organizations. During my years as a contractor and my time at Databricks, I have seen teams succeed and fail. I will share with you the successful dynamics as well as any configurations that accompany those insights in this chapter.

Chapter 3, Building the Bronze Layer, begins your data journey in the Databricks DI Platform by exploring the fundamentals of the Bronze layer of the Medallion architecture. The Bronze layer is the first step in transforming your data for downstream projects, and this chapter will focus on the Databricks features and techniques you have available for the necessary transformations. We will start by introducing you to Auto Loader, a tool to automate data ingestion, which you can implement with or without Delta Live Tables (DLT) to insert and transform your data.

Chapter 4, Getting to Know Your Data, explores the features within the Databricks DI Platform that help improve and monitor data quality and facilitate data exploration. There are numerous approaches to getting to know your data better with Databricks. First, we cover how to oversee data quality with DLT to catch quality issues early and prevent the contamination of entire pipelines. We will take our first close look at Lakehouse Monitoring, which helps us analyze data changes over time and can alert us to changes that concern us.

Chapter 5, Feature Engineering on Databricks, progresses from Chapter 4, where we harnessed the power of Databricks to explore and refine our datasets, to delve into the components of Databricks that enable the next step – feature engineering. We will start by covering Databricks Feature Engineering (DFE) in Unity Catalog to show you how you can efficiently manage engineered features using Unity Catalog. Understanding how to leverage DFE in UC is crucial for creating reusable and consistent features across training and inference. Then, you will learn how to leverage Structured Streaming to calculate features on a stream, which allows you to create stateful features needed for models to make quick decisions.

Chapter 6, Tools for Model Training and Experimenting, examines how to use data science to search for a signal hidden in the noise of data. We will leverage the features we created within the Databricks platform during the previous chapter. We will start by using AutoML in a basic modeling approach, providing auto-generated code and quickly enabling data scientists to establish a baseline model to beat. When searching for a signal, we experiment with different features, hyperparameters, and models. Historically, tracking these configurations and their corresponding evaluation metrics is a time-consuming project in and of itself. A low-overhead tracking mechanism, such as the tracking provided by MLflow, an open source platform for managing data science projects and supporting MLOps, will reduce the burden of manually capturing configurations. More specifically, we’ll introduce MLflow Tracking, an MLflow component that significantly improves tracking each permutation’s many outputs. However, that is only the beginning.

Chapter 7, Productionizing ML on Databricks, explores productionizing a machine learning model using Databricks products, which makes the journey more straightforward and cohesive by incorporating functionality such as the Unity Catalog Registry, Databricks Workflows, Databricks Asset Bundles, and Model Serving capabilities. This chapter will cover the tools and practices to take your models from development to production.

Chapter 8, Monitoring, Evaluating, and More, covers how to create visualizations for dashboards in both the new Lakeview dashboards and the standard DBSQL dashboards. Deployed models can be shared via a web application. Therefore, we will not only introduce Hugging Face Spaces but also deploy the RAG chatbot using a Gradio app to apply what we have learned.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image