Chapter 2: Batch and Real-Time Processing in Databricks
Azure Databricks is capable of processing batch and real-time big data workloads using Apache Sparkâ„¢. As data engineers, it is important to master these workloads for building real-world use cases. A batch load generally refers to an ETL (Extract, Transform, Load) or ELT (Extract, Load, Transform) process where large chunks of data get copied from a source to a sink. This type of workload can take time to process, ranging from minutes to hours, whereas real-time processing works with a much smaller latency (that is, seconds or even milliseconds).
When it comes to Databricks, there are different ways to process batch and real-time workloads. In this chapter, we will discuss the approaches to build and run these workloads. The topics covered in this chapter are as follows:
- Differentiating batch versus real-time processing
- Mounting Azure Data Lake in Databricks
- Working with batch processing
- Batch ETL...