Exploring data analytics
Once data has been ingested, transformed, and aggregated, the next step will be to analyze and explore it. There are many tools available on the market to achieve this, and one of the most popular is Databricks.
Databricks uses the Apache Spark engine that is well suited to dealing with massive amounts of data due to its internal architecture. Whereas a traditional database server would typically run workloads, Databricks uses Spark clusters built from multiple nodes. Data analytics processes are then distributed between those nodes to process them in parallel, as shown in the following diagram:
Figure 13.6 – Example Spark cluster architecture
Azure Databricks is a managed Databricks service that provides excellent flexibility for creating and using Spark clusters as and when needed.
Azure Databricks
Azure Databricks provides workspaces that multiple users can use to build and run analytics jobs collaboratively. A Databricks workspace contains...