Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Azure Data Engineering Cookbook
Azure Data Engineering Cookbook

Azure Data Engineering Cookbook: Get well versed in various data engineering techniques in Azure using this recipe-based guide , Second Edition

Arrow left icon
Profile Icon Nagaraj Venkatesan Profile Icon Ahmad Osama
Arrow right icon
€39.99
Full star icon Full star icon Full star icon Full star icon Half star icon 4.6 (13 Ratings)
Paperback Sep 2022 608 pages 2nd Edition
eBook
€20.99 €30.99
Paperback
€39.99
Subscription
Free Trial
Renews at €18.99p/m
Arrow left icon
Profile Icon Nagaraj Venkatesan Profile Icon Ahmad Osama
Arrow right icon
€39.99
Full star icon Full star icon Full star icon Full star icon Half star icon 4.6 (13 Ratings)
Paperback Sep 2022 608 pages 2nd Edition
eBook
€20.99 €30.99
Paperback
€39.99
Subscription
Free Trial
Renews at €18.99p/m
eBook
€20.99 €30.99
Paperback
€39.99
Subscription
Free Trial
Renews at €18.99p/m

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Table of content icon View table of contents Preview book icon Preview Book

Azure Data Engineering Cookbook

Securing and Monitoring Data in Azure Data Lake

Data Lake forms the key storage layer for data engineering pipelines. Security and the monitoring of Data Lake accounts are key aspects of Data Lake maintenance. This chapter will focus on configuring security controls such as firewalls, encryption, and creating private links to a Data Lake account. By the end of this chapter, you will have learned how to configure a firewall, virtual network, and private link to secure the Data Lake, encrypt Data Lake using Azure Key Vault, and monitor key user actions in Data Lake.

We will be covering the following recipes in this chapter:

  • Configuring a firewall for an Azure Data Lake account using the Azure portal
  • Configuring virtual networks for an Azure Data Lake account using the Azure portal
  • Configuring private links for an Azure Data Lake account
  • Configuring encryption using Azure Key Vault for Azure Data Lake
  • Accessing Blob storage accounts using managed identities
  • Creating an alert to monitor an Azure Data Lake account
  • Securing an Azure Data Lake account with an SAS using PowerShell

Configuring a firewall for an Azure Data Lake account using the Azure portal

Data Lake account access can be restricted to an IP or a range of IPs by whitelisting the allowed IPs in the storage account firewall. In this recipe, we'll learn to restrict access to a Data Lake account using a firewall.

Getting ready

Before you start, perform the following steps:

  1. Open a web browser and go to the Azure portal at https://portal.azure.com.
  2. Make sure you have an existing storage account. If not, create one using the Provisioning an Azure storage account using the Azure portal recipe in Chapter 1, Creating and Managing Data in Azure Data Lake.

How to do it…

To provide access to an IP or range of IPs, follow these steps:

  1. In the Azure portal, locate and open the Azure storage account. In our case, the storage account is packtadestoragev2, created in the Provisioning an Azure storage account using the Azure portal recipe of Chapter 1, Creating and Managing Data in Azure Data Lake.
  2. On the storage account page, in the Security + Networking section, locate and select Firewalls and virtual networks.

As the packtadestoragev2 account was created with public access, it can be accessed from all networks.

  1. To allow access from an IP or an IP range, click on the Selected networks option on the storage account on the Firewalls and virtual networks page:
Figure 2.1 – Azure Storage – Firewalls and virtual networks

Figure 2.1 – Azure Storage – Firewalls and virtual networks

  1. In the Selected networks option, scroll down to the Firewall section. To give access to your machine only, select the Add your client IP address option. To give access to a different IP or range of IPs, type in the IPs in the Address range section:
Figure 2.2 – The whitelist IPs in the Azure Storage Firewall section

Figure 2.2 – The whitelist IPs in the Azure Storage Firewall section

  1. To access storage accounts from Azure services such as Azure Data Factory and Azure Functions, check Allow Azure services on the trusted services list to access this storage account under the Exceptions heading.
  2. Click Save to save the configuration changes.

How it works…

Firewall settings are used to restrict access to an Azure storage account to an IP or range of IPs. Even if a storage account is public, it will only be accessible to the whitelisted IPs defined in the firewall configuration.

Configuring virtual networks for an Azure Data Lake account using the Azure portal

A storage account can be public which is accessible to everyone, public with access to an IP or range of IPs, or private with access to selected virtual networks. In this recipe, we'll learn how to restrict access to an Azure storage account in a virtual network.

Getting ready

Before you start, perform the following steps:

  1. Open a web browser and go to the Azure portal at https://portal.azure.com.
  2. Make sure you have an existing storage account. If not, create one using the Provisioning an Azure storage account using the Azure portal recipe in Chapter 1, Creating and Managing Data in Azure Data Lake.

How to do it…

To restrict access to a virtual network, follow the given steps:

  1. In the Azure portal, locate and open the storage account. In our case, it's packtadestoragev2. On the storage account page, in the Security + Network section, locate and select Firewalls and virtual networks | Selected networks:
Figure 2.3 – Azure Storage – Selected networks

Figure 2.3 – Azure Storage – Selected networks

  1. In the Virtual networks section, select + Add new virtual network:
Figure 2.4 – Adding a virtual network

Figure 2.4 – Adding a virtual network

  1. In the Create virtual network blade, provide the virtual network name, Address space details, and Subnet address range. The remainder of the configuration values are pre-filled, as shown in the following screenshot:
Figure 2.5 – Creating a new virtual network

Figure 2.5 – Creating a new virtual network

  1. Click on Create to create the virtual network. This is created and listed in the Virtual Network section, as shown in the following screenshot:
Figure 2.6 – Saving a virtual network configuration

Figure 2.6 – Saving a virtual network configuration

  1. Click Save to save the configuration changes.

How it works…

We first created an Azure virtual network and then added it to the Azure storage account. Creating the Azure virtual network from the storage account page automatically fills in the resource group, location, and subscription information. The virtual network and the storage account should be in the same location.

The address space specifies the number of IP addresses in a given virtual network.

We also need to define the subnet within the virtual network that the storage account will belong to. We can also create a custom subnet. In our case, for the sake of simplicity, we have used the default subnet.

This allows the storage account to only be accessed by resources that belong to the given virtual network. The storage account is inaccessible to any network other than the specified virtual network.

Configuring private links for an Azure Data Lake account

In this recipe, we will be creating a private link to a storage account and using private endpoints to connect to it.

Private links and private endpoints ensure that all communication to the storage account goes through the Azure backbone network. Communications to the storage account don't use a public internet network, which makes them very secure.

Getting ready

Before you start, perform the following steps:

  1. Open a web browser and go to the Azure portal at https://portal.azure.com.
  2. Make sure you have an existing storage account. If not, create one using the Provisioning an Azure storage account using the Azure Portal recipe in Chapter 1, Creating and Managing Data in Azure Data Lake.
  3. Make sure you have an existing virtual network configured to the storage account. If not, create one using the Configuring virtual networks for an Azure Data Lake account using the Azure portal recipe in this chapter.

How to do it…

Perform the following steps to configure private links to a Data Lake account:

  1. Log in to the Azure portal and click on the storage account.
  2. Click on Networking | the Private Endpoints tab.
  3. Click on the + Private endpoint button, as shown here:
Figure 2.7 – Creating a private endpoint to a storage account

Figure 2.7 – Creating a private endpoint to a storage account

  1. Provide an endpoint name, as shown in the following screenshot:
Figure 2.8 – Providing an endpoint name

Figure 2.8 – Providing an endpoint name

  1. In the Resource tab, set Target sub-resource to dfs. Distributed File Systems (DFS) is sub-source if we are connecting to Data Lake Storage Gen2. The rest of the fields are auto-populated. Proceed to the Configuration section:
Figure 2.9 – Setting the target resource type to dfs

Figure 2.9 – Setting the target resource type to dfs

  1. Create a private Domain Name System (DNS) zone by picking the same resource group where you created the storage account, as shown in the following screenshot:
Figure 2.10 –  Creating a private DNS

Figure 2.10 – Creating a private DNS

  1. Hit the Create button to create the private DNS link.
  2. After the private endpoint is created, open it in the Azure portal. Click on DNS configuration:
Figure 2.11 – Copy the FQD9

Figure 2.11 – Copy the FQD9

  • Make a note of the FQDN and IP addresses details. The FQDN is the Fully Qualified Domain Name, which will resolve to the private IP address if, and only if, you are connected to the virtual network.

With the preceding steps, we have created a private endpoint that will use private links to connect to a storage account.

How it works…

We have created a private link to a storage account and ensured that traffic goes through the Microsoft backbone network (and not the public internet), as we will be accessing the storage account via a private endpoint. To show how it works, let's resolve the private URL link from the following locations. Let's perform the following:

  • Use nslookup to look up a private URL link from your local machine.
  • Use nslookup to look up a private URL link from a virtual machine inside the virtual network.

On your machine, open Command Prompt and type nslookup <FQDN of private link>, as shown in the following screenshot:

Figure 2.12 – Testing a private endpoint connection outside of the virtual network

Figure 2.12 – Testing a private endpoint connection outside of the virtual network

nslookup resolves the private link to an incorrect IP address, as your machine is not part of the virtual network. To see it working, perform the following instructions:

  1. Create a new virtual machine in the Azure portal. Ensure to allow a remote desktop connection to the virtual machine, as shown in the following screenshot:
Figure 2.13 – Creating a new virtual machine and allowing a remote desktop

Figure 2.13 – Creating a new virtual machine and allowing a remote desktop

  1. Under Networking, select the virtual network in which the storage account resides:
Figure 2.14 – Configuring the virtual machine to use the virtual network

Figure 2.14 – Configuring the virtual machine to use the virtual network

Once the virtual machine is created, log in to the virtual machine using a remote desktop and perform nslookup to look up the private link URL again to resolve its IP address. nslookup is a command that will resolve an URL to an IP address. We will use nslookup to verify whether the private link URL resolves to a private IP address (10.x.x.x) and not a public IP address.

nslookup from a virtual machine inside the virtual network resolves correctly to the private IP address of the private link, as shown in the following screenshot. This shows that the connection goes through a virtual network only and doesn't use public internet:

Figure 2.15 – nslookup from the virtual network

Figure 2.15 – nslookup from the virtual network

With the previous recipe, we have successfully created a private link to a storage account, configured a private endpoint connection, and accessed it via a virtual machine to verify the connectivity. This recipe covers how you can securely connect to a storage account through virtual networks only by passing a public network.

Configuring encryption using Azure Key Vault for Azure Data Lake

In this recipe, we will create a key vault and use it to encrypt an Azure Data Lake account.

Azure Data Lake accounts are encrypted at rest by default using Azure managed keys. However, you have the option of bringing your own key to encrypt an Azure Data Lake account. Using your own key gives better control over encryption.

Getting ready

Before you start, perform the following steps:

  1. Open a web browser and go to the Azure portal at https://portal.azure.com.
  2. Make sure that you have an existing storage account. If not, create one using the Provisioning an Azure storage account using the Azure portal recipe in Chapter 1, Creating and Managing Data in Azure Data Lake.

How to do it…

Perform the following steps to add encryption to a Data Lake account using Azure Key Vault:

  1. Log in to portal.azure.com, click on Create a resource, search for Key Vault, and click on Create. Provide the key vault details, as shown in the following screenshot. Click on Review + Create:
Figure 2.16 – Creating an Azure key vault

Figure 2.16 – Creating an Azure key vault

  1. Go to the storage account to be encrypted. Search for Encryption on the left. Click on Encryption and select Customer-managed keys as the Encryption type. Click on Select a key vault and key at the bottom:
Figure 2.17 – Encrypting using customer-managed keys

Figure 2.17 – Encrypting using customer-managed keys

  1. On the new screen, Select a key, select Key vault as Key store type and select the newly created PacktAdeKeyVault as Key vault. Click on Create new key, as shown in the following screenshot:
Figure 2.18 – Selecting Key Vault

Figure 2.18 – Selecting Key Vault

  1. Provide a name for the key to be used for encryption of the storage account. The default option, Generate, ensures that the key is generated automatically. Click on Create:
Figure 2.19 – Creating a key

Figure 2.19 – Creating a key

  1. Once the key is created, the screen automatically moves to the key vault selection page in the Blob storage, and the newly created key is selected as the key. Click on Select:
Figure 2.20 – Selecting the key

Figure 2.20 – Selecting the key

  1. The screen moves to the encryption page on the Blob storage page. Click on Save to complete the encryption configuration.

How it works…

As the newly created key vault has been set for encryption on an Azure Data Lake account, all Data Lake operations (read, write, and metadata) will use the key from Key Vault to encrypt and decrypt the data in Data Lake. The encryption and decryption operations are fully transparent and have no impact on users' operations.

The Data Lake account automatically gets permissions on the key vault to extract the key and perform encryption on data. You can verify this by opening the key vault in the Azure portal and clicking on Access Policies. Note that the storage account has been granted Get, wrap, and unwrap permissions on the keys, as shown in the next screenshot:

Figure 2.21 – Storage account permissions in Key Vault

Figure 2.21 – Storage account permissions in Key Vault

Accessing Blob storage accounts using managed identities

In this recipe, we will grant permissions to managed identities on a storage account and showcase how you can use managed identities to connect to Azure Data Lake.

Managed identities are password-less service accounts used by Azure services such as Data Factory and Azure VMs to access other Azure services, such as Blob storage. In this recipe, we will show you how Azure Data Factory's managed identity can be granted permission on an Azure Blob storage account.

Getting ready

Before you start, perform the following steps:

  1. Open a web browser and go to the Azure portal at https://portal.azure.com.
  2. Make sure you have an existing storage account. If not, create one using the Provisioning an Azure storage account using the Azure portal recipe in Chapter 1, Creating and Managing Data in Azure Data Lake.

How to do it…

We will be testing accessing a Data Lake account using managed identities. To achieve this, we will create a Data Factory account and use Data Factory's managed identity to access the Data Lake account. Perform the following steps to test this:

  1. Create an Azure Data Factory by using the following PowerShell command:
    $resourceGroupName = " packtadestorage";
    $location = 'east us'
    $dataFactoryName = "ADFPacktADE2";
    $DataFactory = Set-AzDataFactoryV2 -ResourceGroupName $resourceGroupName -Location $location -Name $dataFactoryName
  2. Go to the storage account in the Azure portal. Click on Access Control (IAM) and then Add, as shown in the following screenshot:
Figure 2.22 – Adding a role to a managed identity

Figure 2.22 – Adding a role to a managed identity

  1. Select Add role assignment and search for the Storage Blob Data Contributor role. Select the role and click Next. Select Managed identity in Assign access to and click on + Select members, as shown in the following screenshot:
Figure 2.23 – Selecting the Data Factory managed identity

Figure 2.23 – Selecting the Data Factory managed identity

  1. Your subscription should be selected by default. From the Managed identity dropdown, select Data Factory (V2) (1). Select the recently created ADFPacktADE2 Data Factory and click on the Select button:
Figure 2.24 – Assigning a role to a managed identity

Figure 2.24 – Assigning a role to a managed identity

  1. Click on Review + Assign to complete the assignment. To test whether it's working, open the ADFPacktADE2 Data Factory that was created in step 1. Click on Open Azure Data Factory Studio, as shown in the next screenshot:
Figure 2.25 – Opening Azure Data Factory Studio

Figure 2.25 – Opening Azure Data Factory Studio

  1. Click on the Manage button on the left and then Linked services. Click on + New, as shown in the following screenshot:
Figure 2.26 – Creating a linked service in Data Factory

Figure 2.26 – Creating a linked service in Data Factory

  1. Search for Data Lake and select Azure Data Lake Storage Gen 2 as the data store. Select Managed Identity for Authentication method. Select the storage account (packadestoragev2) for Storage account name. Click on Test connection:
Figure 2.27 – Testing a managed identity connection in Data Factory

Figure 2.27 – Testing a managed identity connection in Data Factory

A successful test connection indicates that we can successfully connect to a storage account using a managed identity.

How it works…

A managed identity for the data factory was automatically created when the Data Factory account was created. We provided the Storage Blob Data Contributor permission on the Azure Data Lake storage account to the managed identity of Data Factory. Hence, Data Factory was successfully able to connect to the storage account in a secure way without using a key/password.

Creating an alert to monitor an Azure storage account

We can create an alert on multiple available metrics to monitor an Azure storage account. To create an alert, we need to define the trigger condition and the action to be performed when the alert is triggered. In this recipe, we'll create an alert to send an email if the used capacity metrics for an Azure storage account exceed 5 MB. The used capacity threshold of 5 MB is not a standard and is deliberately kept low to explain the alert functionality.

Getting ready

Before you start, perform the following steps:

  1. Open a web browser and log in to the Azure portal at https://portal.azure.com.
  2. Make sure you have an existing storage account. If not, create one using the Provisioning an Azure storage account using the Azure portal recipe in Chapter 1, Creating and Managing Data in Azure Data Lake.

How to do it…

Follow these steps to create an alert:

  1. In the Azure portal, locate and open the storage account. In our case, the storage account is packtadestoragev2. On the storage account page, search for alert and open Alerts in the Monitoring section:
Figure 2.28 – Selecting Alerts

Figure 2.28 – Selecting Alerts

  1. On the Alerts page, click on + New alert rule:
Figure 2.29 – Adding a new alert

Figure 2.29 – Adding a new alert

  1. On the Alerts | Create alert rule page, observe that the storage account is listed by default in the Resource section. You can add multiple storage accounts in the same alert. Under the Condition section, click Add condition:
Figure 2.30 – Adding a new alert condition

Figure 2.30 – Adding a new alert condition

  1. On the Configure signal logic page, select Used capacity under Signal name:
Figure 2.31 – Configuring the signal logic

Figure 2.31 – Configuring the signal logic

  1. On the Configure signal logic page, under Alert logic, set Operator to Greater than, Aggregation type to Average, and configure the threshold to 5 MiB. We need to provide the value in bytes:
Figure 2.32 – Configuring alert logic

Figure 2.32 – Configuring alert logic

Click Done to configure the trigger. The condition is added, and we'll be taken back to the Create alert rule page:

Figure 2.33 – Viewing a new alert condition

Figure 2.33 – Viewing a new alert condition

  1. The next step is to add an action to perform when the alert condition is reached. On the Create alert rule page, in the ACTIONS GROUPS section, click Create:
Figure 2.34 – Creating a new alert action group

Figure 2.34 – Creating a new alert action group

  1. On the Add action group page, provide the Action group name, Display name, and Resource group details:
Figure 2.35 – Adding a new alert action group

Figure 2.35 – Adding a new alert action group

  1. In Notifications, provide an email address. Click on Review + Create:
Figure 2.36 – Selecting the alert action

Figure 2.36 – Selecting the alert action

  1. Click on Create to create the action group. We are then taken back to the Create rule page. The Email action is listed in the Action Groups section.
  2. The next step is to define the Severity, Alert rule name, and Alert rule description details:
Figure 2.37 – Creating an alert rule

Figure 2.37 – Creating an alert rule

  1. Click the Create alert rule button to create the alert.
  2. The next step is to trigger the alert. To do that, download BigFile.csv from the https://github.com/PacktPublishing/Azure-Data-Engineering-Cookbook-2nd-edition/blob/main/Chapter2/BigFile.csv file to the Azure storage account by following the steps mentioned in the Creating containers and uploading files to Azure Blob storage using PowerShell recipe of Chapter 1, Creating and Managing Data in Azure Data Lake. The triggered alerts are listed on the Alerts page, as shown in the following screenshot:
Figure 2.38 – Viewing alerts

Figure 2.38 – Viewing alerts

  1. An email is sent to the email ID specified in the email action group. The email appears as shown in the following screenshot:
Figure 2.39 – The alert email

Figure 2.39 – The alert email

How it works…

Setting up an alert is easy. At first, we need to define the alert condition (a trigger or signal). An alert condition defines the metrics and threshold that, when breached, trigger the alert. We can define more than one condition on multiple metrics for one alert.

We then need to define the action to be performed when the alert condition is reached. We can define more than one action for an alert. In our example, in addition to sending an email when the used capacity is more than 5 MB, we can configure Azure Automation to delete the old blobs/files in order to maintain the Azure storage capacity within 5 MB.

There are other signals, such as transactions, ingress, egress, availability, Success Server Latency, and Success E2E Latency, on which alerts can be defined. Detailed information on monitoring Azure storage is available at https://docs.microsoft.com/en-us/azure/storage/common/storage-monitoring-diagnosing-troubleshooting.

Securing an Azure storage account with SAS using PowerShell

A Shared Access Signature (SAS) provides more granular access to blobs by specifying an expiry limit, specific permissions, and IPs.

Using an SAS, we can specify different permissions to users or applications on different blobs, based on the requirement. For example, if an application needs to read one file/blob from a container, instead of providing access to all the files in the container, we can use an SAS to provide read access on the required blob.

In this recipe, we'll learn to create and use an SAS to access blobs.

Getting ready

Before you start, go through the following steps:

  1. Make sure you have an existing Azure storage account. If not, create one by following the Provisioning an Azure storage account using PowerShell recipe in Chapter 1, Creating and Managing Data in Azure Data Lake.
  2. Make sure you have an existing Azure storage container. If not, create one by following the Creating containers and uploading files to Azure Blob storage using PowerShell recipe.
  3. Make sure you have existing blobs/files in an Azure storage container. If not, you can upload blobs by following the previous recipe.
  4. Log in to your Azure subscription in PowerShell. To log in, run the Connect- AzAccount command in a new PowerShell window and follow the instructions.

How to do it…

Let's begin by securing blobs using an SAS.

Securing blobs using an SAS

Perform the following steps:

  1. Execute the following command in the PowerShell window to get the storage context:
    $resourcegroup = "packtadestorage"
    $storageaccount = "packtadestoragev2"
    #get storage context
    $storagecontext = (Get-AzStorageAccount -ResourceGroupName $resourcegroup -Name $storageaccount). Context
  2. Execute the following commands to get the SAS token for the logfile1.txt blob in the logfiles container with list and read permissions:
    #set the token expiry time
    $starttime = Get-Date
    $endtime = $starttime.AddDays(1)
    # get the SAS token into a variable
    $sastoken = New-AzStorageBlobSASToken -Container "logfiles" -Blob "logfile1.txt" -Permission lr -StartTime $starttime -ExpiryTime $endtime -Context $storagecontext
    # view the SAS token.
     $sastoken
  3. Execute the following commands to list the blob using the SAS token:
    #get storage account context using the SAS token
    $ctx = New-AzStorageContext -StorageAccountName $storageaccount -SasToken $sastoken
    #list the blob details
    Get-AzStorageBlob -blob "logfile1.txt" -Container "logfiles" -Context $ctx

You should get output as shown in the following screenshot:

Figure 2.40 – Listing blobs using an SAS

Figure 2.40 – Listing blobs using an SAS

  1. Execute the following command to write data to logfile1.txt. Ensure you have the Logfile1.txt file in the C:\ADECookbook\Chapter1\ Logfiles\ folder in the machine you are running the script from:
    Set-AzStorageBlobContent -File C:\ADECookbook\Chapter1\ Logfiles\Logfile1.txt -Container logfiles -Context $ctx

You should get output as shown in the following screenshot:

Figure 2.41 – Uploading a blob using an SAS

Figure 2.41 – Uploading a blob using an SAS

The write fails, as the SAS token was created with list and read access.

Securing a container with an SAS

Perform the following steps:

  1. Execute the following command to create a container stored access policy:
    $resourcegroup = "packtadestorage"
    $storageaccount = "packtadestoragev2"
    #get storage context 
    $storagecontext = (Get-AzStorageAccount -ResourceGroupName $resourcegroup -Name $storageaccount). Context
    $starttime = Get-Date
    $endtime = $starttime.AddDays(1)
    New-AzStorageContainerStoredAccessPolicy -Container logfiles -Policy writepolicy -Permission lw -StartTime $starttime -ExpiryTime $endtime -Context $storagecontext
  2. Execute the following command to create the SAS token:
    #get the SAS token
    $sastoken = New-AzStorageContainerSASToken -Name logfiles -Policy writepolicy -Context
  3. Execute the following commands to list all the blobs in the container using the SAS token:
    #get the storage context with SAS token
    $ctx = New-AzStorageContext -StorageAccountName $storageaccount -SasToken $sastoken
    #list blobs using SAS token
    Get-AzStorageBlob -Container logfiles -Context $ctx

How it works…

To generate a shared access token for a blob, use the New-AzStorageBlobSASToken command. We need to provide the blob name, container name, permission (l = list, r = read, and w = write), and storage context to generate an SAS token. We can additionally secure the token by providing IPs that can access the blob.

We then use the SAS token to get the storage context using the New-AzStorageContext command. We use the storage context to access the blobs using the Get-AzStorageBlob command. Note that we can only list and read blobs and can't write to them, as the SAS token doesn't have write permissions.

To generate a shared access token for a container, we first create an access policy for the container using the New-AzStorageContainerStoredAccessPolicy command. The access policy specifies the start and expiry time, permission, and IPs. We then generate the SAS token by passing the access policy name to the New-AzStorageContainerSASToken command.

We can now access the container and the blobs using the SAS token.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Build data pipelines from scratch and find solutions to common data engineering problems
  • Learn how to work with Azure Data Factory, Data Lake, Databricks, and Synapse Analytics
  • Monitor and maintain your data engineering pipelines using Log Analytics, Azure Monitor, and Azure Purview

Description

The famous quote 'Data is the new oil' seems more true every day as the key to most organizations' long-term success lies in extracting insights from raw data. One of the major challenges organizations face in leveraging value out of data is building performant data engineering pipelines for data visualization, ingestion, storage, and processing. This second edition of the immensely successful book by Ahmad Osama brings to you several recent enhancements in Azure data engineering and shares approximately 80 useful recipes covering common scenarios in building data engineering pipelines in Microsoft Azure. You’ll explore recipes from Azure Synapse Analytics workspaces Gen 2 and get to grips with Synapse Spark pools, SQL Serverless pools, Synapse integration pipelines, and Synapse data flows. You’ll also understand Synapse SQL Pool optimization techniques in this second edition. Besides Synapse enhancements, you’ll discover helpful tips on managing Azure SQL Database and learn about security, high availability, and performance monitoring. Finally, the book takes you through overall data engineering pipeline management, focusing on monitoring using Log Analytics and tracking data lineage using Azure Purview. By the end of this book, you’ll be able to build superior data engineering pipelines along with having an invaluable go-to guide.

Who is this book for?

This book is for data engineers, data architects, database administrators, and data professionals who want to get well versed with the Azure data services for building data pipelines. Basic understanding of cloud and data engineering concepts will help in getting the most out of this book.

What you will learn

  • Process data using Azure Databricks and Azure Synapse Analytics
  • Perform data transformation using Azure Synapse data flows
  • Perform common administrative tasks in Azure SQL Database
  • Build effective Synapse SQL pools which can be consumed by Power BI
  • Monitor Synapse SQL and Spark pools using Log Analytics
  • Track data lineage using Microsoft Purview integration with pipelines
Estimated delivery fee Deliver to Austria

Premium delivery 7 - 10 business days

€17.95
(Includes tracking information)

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Sep 26, 2022
Length: 608 pages
Edition : 2nd
Language : English
ISBN-13 : 9781803246789
Vendor :
Microsoft
Category :
Languages :
Tools :

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Estimated delivery fee Deliver to Austria

Premium delivery 7 - 10 business days

€17.95
(Includes tracking information)

Product Details

Publication date : Sep 26, 2022
Length: 608 pages
Edition : 2nd
Language : English
ISBN-13 : 9781803246789
Vendor :
Microsoft
Category :
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
€18.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
€189.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts
€264.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 138.97
Azure Data Engineering Cookbook
€39.99
Azure Data Engineer Associate Certification Guide
€59.99
Azure for Developers
€38.99
Total 138.97 Stars icon

Table of Contents

15 Chapters
Chapter 1: Creating and Managing Data in Azure Data Lake Chevron down icon Chevron up icon
Chapter 2: Securing and Monitoring Data in Azure Data Lake Chevron down icon Chevron up icon
Chapter 3: Building Data Ingestion Pipelines Using Azure Data Factory Chevron down icon Chevron up icon
Chapter 4: Azure Data Factory Integration Runtime Chevron down icon Chevron up icon
Chapter 5: Configuring and Securing Azure SQL Database Chevron down icon Chevron up icon
Chapter 6: Implementing High Availability and Monitoring in Azure SQL Database Chevron down icon Chevron up icon
Chapter 7: Processing Data Using Azure Databricks Chevron down icon Chevron up icon
Chapter 8: Processing Data Using Azure Synapse Analytics Chevron down icon Chevron up icon
Chapter 9: Transforming Data Using Azure Synapse Dataflows Chevron down icon Chevron up icon
Chapter 10: Building the Serving Layer in Azure Synapse SQL Pool Chevron down icon Chevron up icon
Chapter 11: Monitoring Synapse SQL and Spark Pools Chevron down icon Chevron up icon
Chapter 12: Optimizing and Maintaining Synapse SQL and Spark Pools Chevron down icon Chevron up icon
Chapter 13: Monitoring and Maintaining Azure Data Engineering Pipelines Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.6
(13 Ratings)
5 star 84.6%
4 star 7.7%
3 star 0%
2 star 0%
1 star 7.7%
Filter icon Filter
Top Reviews

Filter reviews by




Ayman Dec 13, 2022
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book is very similar to a cookbook indeed. You look up the "recipe" for what you're trying to accomplish in Azure regarding Data Engineering workloads and you'll find the step by step instructions. This book is not meant to be exhaustive on theory but rather how to implement solutions. It's on my shelf as a quick reference and review for something I need to get done. Many of the examples are both shown through the portal and via some sort of code (PowerShell, SQLCMD, KQL, etc). It's a great way to learn how to do the tasks multiple ways. If you're looking to get hands on and practice Data Engineering tasks and setup for the supporting environment in Azure, this is a great reference.
Amazon Verified review Amazon
Jagjeet S Makhija Nov 07, 2022
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I just finished the book Azure Data Engineering by Example: Practical Implementation for Data Engineers by both the authors. The book was exactly what I was expecting. The book itself is great and an introduction to Azure Data Engineering and it covers almost anything you need to know about Batch and Streaming Analytics. Like the excellent Azure Data book: Problem, Design, Solution. But this was only not the case, it’s a self-explanatory book about ADF, Azure Synapse, Power Shell, Query Optimization and it starts from scratch. The example part in the title is because all of the concepts are very well explained through examples. If you want to follow along, you need to read the book from front to cover, as they build upon each other. It’s a great introduction and although I was already familiar with most of the concepts, I definitely picked up a few things along the way. Especially the later chapters, about Migration SSIS packages, Configuring Azure Data Bricks environment, Delta Lake, transform data using Python and Scala were very interesting to me.This book is Impressive in my point of view!!
Amazon Verified review Amazon
Sivashankar G Oct 28, 2022
Full star icon Full star icon Full star icon Full star icon Full star icon 5
*Sharing your experience is much better than teaching just the theory*This is something I always believe, showing the way of doing it rather limiting the explanation only to the concept or theory is the better way of teaching. I love the way this book explains how a data engineer can use Azure data services, but more than that, how this shows the way of doing it using multiple ways.Another thing I noticed, and helped me to keep what I learnt in my head permanently, is “how it works” sections in this book. Absolutely a brilliant idea, not just limiting “how to do it”, extending it with “how it works” that helps us to understand how my implementation works, is something I really appreciate.I highly recommend this book for all data engineers who work with Azure or who plan to work with Azure. The Azure Data Engineering Cookbook is a one you must read.
Amazon Verified review Amazon
Gregory Smith Aug 19, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I've been meaning to write this book review for a while now – I got my hands on this book about five months back. To sum it up, going through the recipe examples has been a really enjoyable experience.For those of you aiming to rock the "Exam DP-203: Data Engineering on Microsoft Azure" and also wanting to get a good grip on exploring data, as well as setting up and managing secure and compliant data processing pipelines using all sorts of tools and tricks, this book is like your trusty sidekick. It'll help you build that strong foundation you need for the tough exam!"Azure Data Engineering Cookbook" is strategically tailored to cater to a diverse audience, ranging from database administrators, developers, to ETL practitioners. Through a pragmatic and recipe-centered approach, the book delves deep into the realm of Azure Data Engineering. Those versed in technical and database architecture, with a background in crafting data and ETL solutions within diverse environments including on-premise and other cloud platforms, will discover valuable insights within these pages.Just a heads-up, though: they assume you've got the basics of Azure and data engineering down, so you can dive right into the deep end of Azure Data Engineering without any hiccups.Grab a copy today!!
Amazon Verified review Amazon
Om S Dec 07, 2022
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Azure data engineering cookbook is a very practical book updated as the second edition that covers many dimensions of azure the data engineering process so called "reader with recipes" or hands-on chapters one step after another by beginning with Azure Data Lake and covering data ingestion using Azure Data Factory into Azure Data Lake and Azure SQL Database, management of common storage layers such as Azure Data Lake and Azure SQL Database, focusing on topics such as security, high availability, and performance monitoring.The center of the book is on data processing using Azure Databricks, Azure Synapse Analytics Spark pools, Synapse dataflows, and data exploration using Synapse serverless SQL pools this also focuses on the consumption of the data using Synapse dedicated SQL pool and Synapse Spark lake databases, covering the tips and tricks to optimize and maintain Synapse dedicated SQL pool databases and lake databases. Eventually, the bibliophile also has a reward chapter on managing the overall data engineering pipeline, which covers pipeline monitoring using Azure Log Analytics and following data lineage using Microsoft Purview.The book contains many helpful PowerShell scripts/Codes and distinctive screenshots, best way to get the most out of this is to go through one chapter at a time but even before that download code and files it starts from "Chapter04".For those who want to pass "Exam DP-203: Data Engineering on Microsoft Azure" and want to understand the data through exploration, build and maintain secure and compliant data processing pipelines by using different tools and techniques this book is their best friend. Make your foundation for the tough exam!
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is the delivery time and cost of print book? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela
What is custom duty/charge? Chevron down icon Chevron up icon

Customs duty are charges levied on goods when they cross international borders. It is a tax that is imposed on imported goods. These duties are charged by special authorities and bodies created by local governments and are meant to protect local industries, economies, and businesses.

Do I have to pay customs charges for the print book order? Chevron down icon Chevron up icon

The orders shipped to the countries that are listed under EU27 will not bear custom charges. They are paid by Packt as part of the order.

List of EU27 countries: www.gov.uk/eu-eea:

A custom duty or localized taxes may be applicable on the shipment and would be charged by the recipient country outside of the EU27 which should be paid by the customer and these duties are not included in the shipping charges been charged on the order.

How do I know my custom duty charges? Chevron down icon Chevron up icon

The amount of duty payable varies greatly depending on the imported goods, the country of origin and several other factors like the total invoice amount or dimensions like weight, and other such criteria applicable in your country.

For example:

  • If you live in Mexico, and the declared value of your ordered items is over $ 50, for you to receive a package, you will have to pay additional import tax of 19% which will be $ 9.50 to the courier service.
  • Whereas if you live in Turkey, and the declared value of your ordered items is over € 22, for you to receive a package, you will have to pay additional import tax of 18% which will be € 3.96 to the courier service.
How can I cancel my order? Chevron down icon Chevron up icon

Cancellation Policy for Published Printed Books:

You can cancel any order within 1 hour of placing the order. Simply contact customercare@packt.com with your order details or payment transaction id. If your order has already started the shipment process, we will do our best to stop it. However, if it is already on the way to you then when you receive it, you can contact us at customercare@packt.com using the returns and refund process.

Please understand that Packt Publishing cannot provide refunds or cancel any order except for the cases described in our Return Policy (i.e. Packt Publishing agrees to replace your printed book because it arrives damaged or material defect in book), Packt Publishing will not accept returns.

What is your returns and refunds policy? Chevron down icon Chevron up icon

Return Policy:

We want you to be happy with your purchase from Packtpub.com. We will not hassle you with returning print books to us. If the print book you receive from us is incorrect, damaged, doesn't work or is unacceptably late, please contact Customer Relations Team on customercare@packt.com with the order number and issue details as explained below:

  1. If you ordered (eBook, Video or Print Book) incorrectly or accidentally, please contact Customer Relations Team on customercare@packt.com within one hour of placing the order and we will replace/refund you the item cost.
  2. Sadly, if your eBook or Video file is faulty or a fault occurs during the eBook or Video being made available to you, i.e. during download then you should contact Customer Relations Team within 14 days of purchase on customercare@packt.com who will be able to resolve this issue for you.
  3. You will have a choice of replacement or refund of the problem items.(damaged, defective or incorrect)
  4. Once Customer Care Team confirms that you will be refunded, you should receive the refund within 10 to 12 working days.
  5. If you are only requesting a refund of one book from a multiple order, then we will refund you the appropriate single item.
  6. Where the items were shipped under a free shipping offer, there will be no shipping costs to refund.

On the off chance your printed book arrives damaged, with book material defect, contact our Customer Relation Team on customercare@packt.com within 14 days of receipt of the book with appropriate evidence of damage and we will work with you to secure a replacement copy, if necessary. Please note that each printed book you order from us is individually made by Packt's professional book-printing partner which is on a print-on-demand basis.

What tax is charged? Chevron down icon Chevron up icon

Currently, no tax is charged on the purchase of any print book (subject to change based on the laws and regulations). A localized VAT fee is charged only to our European and UK customers on eBooks, Video and subscriptions that they buy. GST is charged to Indian customers for eBooks and video purchases.

What payment methods can I use? Chevron down icon Chevron up icon

You can pay with the following card types:

  1. Visa Debit
  2. Visa Credit
  3. MasterCard
  4. PayPal
What is the delivery time and cost of print books? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela