Understanding how ODH provisions Apache Spark cluster on-demand
We have talked about how the ODH allows you to create a dynamic and flexible development environment to write code such as data pipelines using Jupyter Notebook. We have noticed that data developers need to interact with IT to get time on the data processing clusters such as Apache Spark. These interactions reduce the agility of the team, and this is one of the problems the ML platform solves. To adhere to this scenario, ODH provides the following components:
- A Spark operator that spawns the Apache Spark cluster. For this book, we have forked the original Spark operator provided by ODH and radanalytics to adhere to the latest changes to the Kubernetes API.
- A capability in JupyterHub to issue a request for a new Spark cluster to the Spark operator when certain notebook environments are created by the user.
As a data engineer, when you spin up a new notebook environment using certain notebook images, JupyterHub...