Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Distributed Data Systems with Azure Databricks

You're reading from   Distributed Data Systems with Azure Databricks Create, deploy, and manage enterprise data pipelines

Arrow left icon
Product type Paperback
Published in May 2021
Publisher Packt
ISBN-13 9781838647216
Length 414 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Alan Bernardo Palacio Alan Bernardo Palacio
Author Profile Icon Alan Bernardo Palacio
Alan Bernardo Palacio
Arrow right icon
View More author details
Toc

Table of Contents (17) Chapters Close

Preface 1. Section 1: Introducing Databricks
2. Chapter 1: Introduction to Azure Databricks FREE CHAPTER 3. Chapter 2: Creating an Azure Databricks Workspace 4. Section 2: Data Pipelines with Databricks
5. Chapter 3: Creating ETL Operations with Azure Databricks 6. Chapter 4: Delta Lake with Azure Databricks 7. Chapter 5: Introducing Delta Engine 8. Chapter 6: Introducing Structured Streaming 9. Section 3: Machine and Deep Learning with Databricks
10. Chapter 7: Using Python Libraries in Azure Databricks 11. Chapter 8: Databricks Runtime for Machine Learning 12. Chapter 9: Databricks Runtime for Deep Learning 13. Chapter 10: Model Tracking and Tuning in Azure Databricks 14. Chapter 11: Managing and Serving Models with MLflow and MLeap 15. Chapter 12: Distributed Deep Learning in Azure Databricks 16. Other Books You May Enjoy

What this book covers

Chapter 1, Introduction to Azure Databricks, takes you through the core functionalities of Databricks, including how we interact with the workspace environment, a quick look into the main applications, and how we will be using the tool for Python users. This covers topics such as workspace, interface, computation management, and Databricks notebooks.

Chapter 2, Creating an Azure Databricks Workspace, teaches you how to apply all the previous concepts using the different tools that Azure has in order to interact with the workspace. This includes using PowerShell and the Azure CLI to manage all Databricks' resources.

Chapter 3, Creating ETL Operations with Azure Databricks, shows you how to manage different data sources, transform them, and create an entire event-driven ETL.

Chapter 4, Delta Lake with Azure Databricks, explores Delta Lake and how to implement it for various operations.

Chapter 5, Introducing Delta Engine, explores Delta Engine and also shows you how to use it along with Delta Lake and create efficient ETLs in Databricks.

Chapter 6, Introducing Structured Streaming, provides explanations on notebooks, details on how to use specific types of streaming sources and sinks, how to put streaming into production, and notebooks demonstrating example use cases.

Chapter 7, Using Python Libraries in Azure Databricks, explores all the nuances regarding working with Python, as well as introducing core concepts regarding models and data that will be studied in more detail later on.

Chapter 8, Databricks Runtime for Machine Learning, acts as a deep dive for us in the development of classic ML algorithms to train and deploy models based on tabular data, all while exploring libraries and algorithms as well. The examples will be focused on the particularities and advantages of using Databricks for ML.

Chapter 9, Databricks Runtime for Deep Learning, acts as a deep dive for us in the development of classic DL algorithms to train and deploy models based on unstructured data, all while exploring libraries and algorithms as well. The examples will be focused on the particularities and advantages of using Databricks for DL.

Chapter 10, Model Tracking and Tuning in Azure Databricks, focuses on model tuning, deployment, and control using Databricks' functionalities, such as AutoML and Delta Lake, while using it in conjunction with popular libraries such as TensorFlow.

Chapter 11, Managing and Serving Models with MLflow and MLeap, explores in more detail the MLflow library, an open source platform for managing the end-to-end ML life cycle. This library allows the user to track experiments, record and compare parameters, centralize model storage, and more. You will learn how to use it in combination with what was learned in the previous chapters.

Chapter 12, Distributed Deep Learning in Azure Databricks, demonstrates how to use Horovod to make distributed DL faster by taking single-GPU training scripts and scaling them to train across many GPUs in parallel.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime