Integrating Jupyter/Python notebooks into a data pipeline
Integrating Jupyter/Python notebooks into our ADF data pipeline can be done using the Spark activity in ADF. You will need an Azure HDInsight Spark cluster for this exercise.
The prerequisite for integrating Jupyter notebooks is to create linked services to Azure Storage and HDInsight from ADF and have an HDInsight Spark cluster running.
You have already seen how to create linked services, in the Developing batch processing solutions by using Data Factory, Data Lake, Spark, Azure Synapse Pipelines, PolyBase, and Azure Databricks section earlier in this chapter, so I'll not repeat the steps here.
Select the Spark activity from ADF and specify the HDInsight linked service that you created in the HDInsight linked service field under the HDI Cluster tab as shown in the following screenshot.
Now, start the Jupyter notebook by going to...