Chapter 7: Azure Databricks
Azure Databricks has become the de facto ETL tool in the cloud. It's a Unified Data Analytics Platform, meaning that it's more than an ETL tool. It can have access to many libraries that will allow a data engineer or a data scientist to work collaboratively and perform a wider task range; from ETL to machine learning and AI analysis.
Like Azure Data Factory, it's a Platform as a Service development and deployment environment. Unlike Data Factory, Databricks is not Azure specific. It is available on Microsoft Azure as well as the Amazon Web Services (AWS) platform. The main difference between Data Factory and Databricks from an ETL point of view is that Data Factory is a no-code environment. We basically use activities to do our ETL.
Databricks uses code and notebooks to achieve the same result. And as we will see later, it is the ETL tool when it comes to big data, whether it's measured in exabytes or petabytes. It is usually used...