Orchestrating Your Data Workflows
We have covered a wealth of techniques and knowledge in building our data platforms. However, there are some missing components in fully orchestrating everything. We’ve mentioned Databricks Workflows, but we didn’t dive deep into how it works; we also haven’t mentioned logging or secrets management. Workflows is an orchestration tool that’s used to manage data pipelines in Databricks. Orchestration tools normally allow for common data tasks and provide the history of each pipeline run, which is specific to the pipeline. Having a central place to manage all your pipelines is a critical step to having reliable, scalable data pipelines. So, this chapter will discuss these topics in detail and create more stability in our data platform.
In this chapter, we’re going to cover the following main topics:
- Logging and monitoring with Datadog
- Secrets management
- Databricks Workflows
- Databricks REST APIs...