Understanding Azure Data Lake
A data lake is a storage repository that allows you to store your data in native format without having to first structure the data at any scale.
Azure Data Lake Storage provides secure, scalable, cost-effective storage for big data analytics. There are two generations of Azure Data Lake, Gen1 and Gen2; however, we will focus on Gen2 only throughout this chapter. Azure Data Lake Gen2 converges the capabilities of Azure Data Lake Gen1 with the capabilities of Azure Blob Storage with the addition of a Hierarchical Namespace to Blob Storage. Because of Azure Blob Storage's capabilities, you get a high availability/disaster recovery solutions for your data lake at a low cost.
The new Azure Blob File System (ABFS) driver is available within Azure HDInsight, Azure Databricks, and Azure Synapse Analytics, which can be used to access the data in a similar way to Hadoop Distributed File System (HDFS).
To use Data Lake Storage Gen2's capabilities, you need to create a storage account that has a hierarchical namespace. You can go through the following steps to create your Azure Data Lake Storage Gen2 account:
- Log in to the Azure portal: https://portal.azure.com.
- Click on the + Create a Resource link and select Storage account from the list of all available resources.
- Select the Resource group where you want to create your storage account. If you don't have a Resource group created, click on the Create new link below the drop-down list.
- Fill in the fields for Storage account name and Location. Â
- Select Standard or Premium Performance as per your business need. If you are new to Data Lake, then it would be better to begin with Standard.
- Select an appropriate value for Account kind and Replication as per the business need. Again, the recommendation would be to leave the default selected values in these fields if you are performing this operation just for your learning purposes:
- For now, we can skip the Networking and Data protection tabs and move directly to the Advanced tab.
- Click on the Enabled radio button for the Hierarchical namespace property under the Advanced tab:
- Leave the default values for all other fields and click on Review + create.
- After reviewing all the details, click on Create and your Azure Data Lake Gen2 account will be created in a couple of minutes.
Now that you have already created your Azure Data Lake Gen2 account, you can use this account with Azure Synapse Analytics. We will learn how to read data from Data Lake in later chapters, but for now, we will learn about Azure Synapse Studio, and how it provides a unified experience when working with various resources under one roof.