Setting up an HDInsight cluster
HDInsight is a comprehensive solution based on a diverse list of open source platforms. It includes Apache Hadoop, Apache Spark, Apache Kafka, Apache HBase, Apache Hive, Apache Storm, and so on. Solutions based on HDInsight can be integrated with ADF, Azure Data Lake, Cosmos DB, and so on.
In this section, we will set up the HDInsight service, build a basic pipeline, and deploy it to ADF.
Getting ready
Before getting started with the recipe, log in to your Microsoft Azure account.
We assume you have a pre-configured resource group and storage account with Azure Data Lake Gen2.
How to do it…
We will go through the process of creating an HDInsight cluster using the Azure portal and its web interface. Follow these instructions:
- Create a user-assigned managed identity. We will need it in the next step, to set up HDInsight cluster access to Data Lake v2. Find Managed Identities in Azure and click +Add.
- Fill in the appropriate...