Integrating Jupyter/Python Notebooks into a Data Pipeline
Integrating Jupyter/Python notebooks into a data pipeline provides flexibility, transparency, and efficiency throughout the data-processing life cycle. It bridges the gap between your exploration, development, and production, making it an essential practice in your data-engineering workflows. Integrating Jupyter/Python notebooks into your ADF data pipeline can be done using the Spark activity in ADF. You will need an Azure HDInsight Spark cluster for this example.
Note
This section primarily focuses on the Integrate Jupyter or Python notebooks into a data pipeline concept of the DP-203: Data Engineering on Microsoft Azure exam.
The prerequisites for integrating Jupyter notebooks are as follows:
- Create linked services to Azure Storage
- Create HDInsight from ADF
- Have an HDInsight Spark cluster running
Note
You have already learned how linked services are created in the Data Ingestion section,...