Processing Data Using Azure Databricks
Databricks is a data engineering product built on top of Apache Spark that provides a unified, cloud-optimized platform so that you can perform Extract, Transform, and Load (ETL), Machine Learning (ML), and Artificial Intelligence (AI) tasks on a large quantity of data.
Azure Databricks, as its name suggests, is the Databricks integration with Azure, which also provides fully managed Spark clusters, an interactive workspace for data visualization and exploration, integration with data sources such as Azure Blob Storage, Azure Data Lake Storage, Azure Cosmos DB, and Azure SQL Data Warehouse.
Azure Databricks can process data from multiple and diverse data sources, such as SQL or NoSQL, structured or unstructured data, and streaming data sources, and also scale up as many servers as required to cater to any data growth.
By the end of the chapter, you will have learned how to configure Databricks, work with storage accounts, process data...