Chapter 1: Introduction to Azure Databricks
Modern information systems work with massive amounts of data, with a constant flow that increases every day at an exponential rate. This flow comes from different sources, including sales information, transactional data, social media, and more. Organizations have to work with this information in processes that include transformation and aggregation to develop applications that seek to extract value from this data.
Apache Spark was developed to process this massive amount of data. Azure Databricks is built on top of Apache Spark, abstracting most of the complexities of implementing it, and with all the benefits that come with integration with other Azure services. This book aims to provide an introduction to Azure Databricks and explore the applications it has in modern data pipelines to transform, visualize, and extract insights from large amounts of data in a distributed computation environment.
In this introductory chapter, we will explore these topics:
- Introducing Apache Spark
- Introducing Azure Databricks
- Discovering core concepts and terminology
- Interacting with the Azure Databricks workspace
- Using Azure Databricks notebooks
- Exploring data management
- Exploring computation management
- Exploring authentication and authorization
These concepts will help us to later understand all of the aspects of the execution of our jobs in Azure Databricks and to move easily between all its assets.