Integrating Continous Integration into Your Workflow
As we grow our projects, many data projects go from being a scattering of notebooks to a continuous integration (CI)-driven application. In this chapter, we will go through some of the tooling and concepts for stringing together your Python scripts and notebooks into a working data application. We will be using Jenkins for CI, GitHub for source control, workflows for orchestration, and Terraform for Infrastructure as Code (IaC). Those tools can be swapped out for your preferred tool without much effort.
In this chapter, we’re going to cover the following main topics:
- Python wheels and creating a Python package
- CI with Jenkins
- Working with source control using GitHub
- Creating Databricks jobs and controlling several jobs using workflows
- Creating IaC using Terraform