Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Azure Data and AI Architect Handbook
Azure Data and AI Architect Handbook

Azure Data and AI Architect Handbook: Adopt a structured approach to designing data and AI solutions at scale on Microsoft Azure

Arrow left icon
Profile Icon Olivier Mertens Profile Icon Breght Van Baelen
Arrow right icon
€8.99 €29.99
Full star icon Full star icon Full star icon Full star icon Half star icon 4.5 (13 Ratings)
eBook Jul 2023 284 pages 1st Edition
eBook
€8.99 €29.99
Paperback
€37.99
Subscription
Free Trial
Renews at €18.99p/m
Arrow left icon
Profile Icon Olivier Mertens Profile Icon Breght Van Baelen
Arrow right icon
€8.99 €29.99
Full star icon Full star icon Full star icon Full star icon Half star icon 4.5 (13 Ratings)
eBook Jul 2023 284 pages 1st Edition
eBook
€8.99 €29.99
Paperback
€37.99
Subscription
Free Trial
Renews at €18.99p/m
eBook
€8.99 €29.99
Paperback
€37.99
Subscription
Free Trial
Renews at €18.99p/m

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Table of content icon View table of contents Preview book icon Preview Book

Azure Data and AI Architect Handbook

Introduction to Data Architectures

With data quickly becoming an essential asset of any business, the need for cloud data architects has never been higher. The key role these professionals fulfill is to provide the technical blueprints of any cloud data project and expertise on data architectures as a whole. A skilled data architect is proficient in many steps of the end-to-end data processes, such as data ingestion, data warehouses, data transformations, and visualization.

It is of utmost importance that data architects are familiar with the benefits and drawbacks of individual resources as well as platform-wide design patterns. Typically, aspiring data architects have a background as business intelligence (BI) developers, data engineers, or data scientists. They are often specialized in one or more tools but lack experience in architecting solutions according to best practices.

Compared to a developer profile, an architect is more focused on the long term and the bigger picture. The architect must keep in mind the overarching business strategy and prioritize certain aspects of the architecture accordingly. To equip you with the necessary skills to do so, you will be introduced to methods of getting business value from your data, to solidify any long-term data strategy.

This chapter will also introduce you to a wide-purpose referential data architecture. This architecture will be used as a guideline throughout this entire book and will become more and more defined as the chapters go on.

Finally, on-premises data architectures nowadays face a variety of challenges. You will explore these challenges and look at how a business can benefit from either a cloud or a hybrid cloud solution.

In this chapter, we’re going to cover the following main topics:

  • Understanding the value of data
  • A data architecture reference diagram
  • Challenges of on-premises architectures

Understanding the value of data

Data generation is growing at an exponential rate. 90 percent of data in the world was generated in the last 2 years, and global data creation is expected to reach 181 zettabytes in 2022.

Just to put this number in perspective, 1 zettabyte is equal to 1 million petabytes. This scale requires data architects to deal with the complexity of big data, but it also introduces an opportunity. The expert data analyst, Doug Laney, defines big data with the popular three Vs framework: Volume, Variety, and Velocity. In this section, we would like to explore a fourth one called Value.

Types of analytics

Data empowers businesses to look back into the past, giving insights into established and emerging patterns, and making informed decisions for the future. Gartner splits analytical solutions that support decision-making into four categories: descriptive, diagnostic, predictive, and prescriptive analytics. Each category is potentially more complex to analyze but can also add more value to your business.

Let’s go through each of these categories next:

  • Descriptive analytics is concerned with answering the question, “What is happening in my business?” It describes the past and current state of the business by creating static reports on top of data. The data used to answer this question is often modeled in a data warehouse, which models historical data in dimension and fact tables for reporting purposes.
  • Diagnostic analytics tries to answer the question, “Why is it happening?” It drills down into the historical data with interactive reports and diagnoses the root cause. Interactive reports are still built on top of a data warehouse, but additional data may be added to support this type of analysis. A broader view of your data estate allows for more root causes to be found.
  • Predictive analytics learns from historical trends and patterns to make predictions for the future. It deals with answering the question, “What will happen in the future?” This is where machine learning (ML) and artificial intelligence (AI) come into play, drawing data from the data warehouse or raw data sources to learn from.
  • Prescriptive analytics answers the question, “What should I do?” and prescribes the next best action. When we know what will happen in the future, we can act on it. This can be done by using different ML methods such as recommendation systems or explainable AI. Recommendation systems recommend the next best product to customers based on similar products or what similar customers bought. Think, for instance, about Netflix recommending new series or movies you might like. Explainable AI will identify which factors were most important to output a certain prediction, which allows you to act on those factors to change the predicted outcome.

The following diagram shows the value-extracting process, going from data to analytics, decisions, and actions:

Figure 1.1 – Extracting value from data

Figure 1.1 – Extracting value from data

Just as with humans, ML models need to learn from their mistakes, which can be done with the help of a feedback loop. A feedback loop allows a teacher to correct the outcomes of the ML model and add them as training labels for the next learning cycle. Learning cycles allow the ML model to improve over time and combat data drift. Data drift occurs when the data on which the model was trained isn’t representative anymore of the data the model predicts. This will lead to inaccurate predictions.

As ML models improve over time, it is best practice to have human confirmation of predictions before automating the decision-making process. Even when an ML model has matured, we can’t rely on the model being right 100 percent of the time. This is why ML models often work with confidence scores, stating how confident they are in the prediction. If the confidence score is below a certain threshold, human intervention is required.

To get continuous value out of data, it is necessary to build a data roadmap and strategy. A complexity-value matrix is a mapping tool to help prioritize which data projects need to be addressed first. This matrix will be described more in detail in the following section.

A complexity-value matrix

A complexity-value matrix has four quadrants to plot future data projects on. These go from high- to low-value and low- to high-complexity. Projects that are considered high-value and have a low complexity are called “quick wins” or “low-hanging fruit” and should be prioritized first. These are often Software-as-a-Service (SaaS) applications or third-party APIs that can quickly be integrated into your data platform to get immediate value. Data projects with high complexity and low value should not be pursued as they have a low Return on Investment (ROI). In general, the more difficult our analytical questions become, the more complex the projects may be, but also, the more value we may get out of it.

A visualization of the four quadrants of the matrix can be seen as follows:

Figure 1.2 – The four quadrants of a complexity-value matrix

Figure 1.2 – The four quadrants of a complexity-value matrix

Often, we think of the direct value data projects bring but do also consider the indirect value. Data engineering projects often do not have a direct value as they move data from one system to another, but this may indirectly open up a world of new opportunities.

To extract value from data, a solid data architecture needs to be in place. In the following section, we’ll define an abstract data architecture diagram that will be referenced throughout this book to explain data architecture principles.

A data architecture reference diagram

The reference architecture diagram that is abstractly defined for now in Figure 1.3 shows the typical structure of an end-to-end data platform in a (hybrid) cloud:

Figure 1.3 – A typical structure of an end-to-end data platform in a (hybrid) cloud

Figure 1.3 – A typical structure of an end-to-end data platform in a (hybrid) cloud

This reference diagram shows the key components of most modern cloud data platforms. There are limitless possible adaptations, such as accommodating streaming data, but the diagram in Figure 1.3 serves as the basis for more advanced data architectures. It’s like the Pizza Margherita of data architectures! The architecture diagram in Figure 1.3 already shows four distinct layers in the end-to-end architecture, as follows:

  • The ingestion layer
  • The storage layer
  • The serving layer
  • The consumption layer

Next to these layers, there are a couple of other key aspects of the data platform that span across multiple layers, as follows:

  • Data orchestration and processing
  • Advanced analytics
  • Data governance and compliance
  • Security
  • Monitoring

Let’s cover the first layer next.

The ingestion layer

The ingestion layer serves as the data entrance to the cloud environment. Here, data from various sources is pulled into the cloud. These sources include on-premises databases, SaaS applications, other cloud environments, Internet of Things (IoT) devices, and many more. Let’s look at this layer in more detail:

  • First, the number of data sources can vary greatly between businesses and could already bring a variety of challenges to overcome. In enterprise-scale organizations, when the amount of data sources can reach extraordinary levels, it is of exceptional importance to maintain a clear overview and management of these sources.
  • Secondly, the sheer variety of sources is another common issue to deal with. Different data sources can have distinct methods of ingesting data into the cloud and, in some cases, require architectural changes to accommodate.
  • Thirdly, managing authentication for data sources can be cumbersome. Authentication, which happens in a multitude of ways, is often unique to the data source. Every source requires its own tokens, keys, or other types of credentials that must be managed and seamlessly refreshed to optimize security.

From a design perspective, there are a few other aspects to keep in mind. The architect should consider the following:

  • Data speed: Will incoming data from the source be ingested periodically (that is, batch ingestion) or continuously (that is, data streaming)?
  • Level of the structure of the data: Will the incoming data be unstructured, semi-structured, or structured?

Regarding data speed, data will be ingested in batches in the vast majority of cases. This translates to periodical requests made to an application programming interface (API) to pull data from the data source. For the more uncommon cases of streaming data, architectural changes are required to provide an environment to store and process the continuous flow of data. In later chapters, you will discover how the platform architecture will differ to accommodate the streaming data.

Finally, the level of structure of the data will determine the amount of required data transformations, the methods of storing the data, or the destination of data movements. Unstructured data, such as images and audio files, will require different processing compared to semi-structured key-value pairs or structured tabular files.

(Add what data ingestion services will be discussed later in the book).

The storage layer

The definitions of the following layers can vary. Over the course of this book, the storage layer refers to the central (often large-scale) storage of data. Data lakes are the most common method for massive storage of data, due to their capacity and relatively low cost. Alternatives are graph-based databases, relational databases, NoSQL databases, flat file-based databases, and so on. The data warehouse, which holds business-ready data and is optimized for querying and analytics, does not belong to the storage layer but will fall under the serving layer instead.

Decisions made by the architect in the storage layer can have a great effect on costs, performance, and the data platform in its entirety. Here, the architect will have to consider redundancy, access tiers, and security. In the case of a data lake, a tier system needs to be considered for raw, curated, and enriched data, as well as a robust and scalable folder structure.

(Add what data storage services will be discussed later in the book).

The serving layer

In the serving layer, preprocessed and cleansed data is stored in a data warehouse, often regarded as the flagship of the data platform. This is a type of structured storage that is optimized for large-scale queries and analytics. The data warehouse forms one of the core components of BI.

The major difference between a data warehouse and the aforementioned data lake is the level of structure. A data warehouse is defined by schemas and enforces data types and structures. Conversely, a data lake can be seen as a massive dump of all kinds of data, with little to no regard for the enforcement of specific rules. The strong level of enforcement makes a data warehouse significantly more homogeneous, which results in far better performance for analytics.

The cloud data architect has various decisions to make in the serving layer. There are quite a few options for data warehousing on the Azure cloud, as follows:

  1. First, the architect should think about whether they want an Infrastructure-as-a-Service (IaaS), a Platform-as-a-Service (PaaS), or a SaaS solution. In short, this results in a trade-off between management responsibilities, development efforts, and flexibility. This will be discussed more in later chapters.
  2. Next, different services on Azure come with their own advantages and disadvantages. The architect could, for example, opt for a very cost-effective serverless SQL solution or leverage massive processing power in highly performant dedicated SQL pools, among numerous other options.

After deciding on the most fitting service, there are still decisions to be made within the data warehouse. The architect will have to determine structures to organize the data in the data warehouse, also known as schemas. Common schemas are star and snowflake schemas, which also come with their own benefits and drawbacks.

Chapter 6, Data Warehousing, will teach you all the necessary skills to confidently decide on the right solution. Chapter 7, The Semantic Layer, will introduce you to the concept of data marts, subsets of a data warehouse ready for business consumption.

The consumption layer

The consumption layer is the final layer of an end-to-end data architecture and typically follows the serving layer by extracting data from the data warehouse. There are numerous ways of consuming the data, which has been prepared and centralized in earlier stages.

The most common manner of consumption is through data visualization. This can happen through dashboarding and building reports. The combination of a data warehouse and a visualization service is often referred to as BI. Many modern dashboarding tools allow for interactivity and drill-down functionality within the dashboard itself. Although technically it is not a part of the Azure stack, Power BI is the preferred service for data visualization for Azure data platforms. However, Microsoft allows other visualization services to connect conveniently as well.

Another way to consume data is by making the data available to other applications or platforms using APIs.

Chapter 8, Visualizing Data Using Power BI, will teach you how to extract data from the data warehouse in various ways and visualize it using interactive dashboarding. In this chapter, you will also discover methods to perform self-service BI, allowing end users to create their own ad hoc dashboards and reports to quickly perform data analysis.

Data orchestration and processing

Contrary to the four layers mentioned previously, there are a couple of other core components of the data platform that span across the entire end-to-end process.

Data orchestration refers to moving data from one place to another, often using data pipelines. This process is often done by data engineers. When data is moved from one stage to the next, data undergoes transformations in the form of joining data, deriving new columns, computing aggregations, and so on. For example, when data is moved from a data lake to a data warehouse, it must be transformed to match the data model, which is enforced by the data warehouse. Another example is when moving data between tiers (raw, curated, and enriched tiers) in the data lake, where the data becomes more and more ready for business use whenever it moves up a tier.

Data pipelines allow data engineers to automate and scale the orchestration and processing of data. These components are critical to the performance and health of the data platform and must be monitored accordingly.

Here are two common methods of performing orchestration and processing:

  • Extract-Transform-Load (ETL)
  • Extract-Load-Transform (ELT)

In both cases, data is extracted from a source and loaded to a destination. The main difference between both methods is the location where the transformations take place. These will be further discussed in Chapter 4, Transforming Data on Azure. This chapter will also teach you how to create and monitor data pipelines according to best practices.

Advanced analytics

For analyses that may be too complex to perform in the serving layer, an analytics suite or data science environment can be added to the architecture to perform advanced analytics and unlock ML capabilities. This component can often be added in a later stage of platform development, as it will mostly not influence the core working of the other layers. A data platform in an early phase of development can perfectly exist without this component.

One option for the advanced analytics suite is an ML workspace where data scientists can preprocess data, perform feature engineering, and train and deploy ML models. The latter may require additional components such as a container registry for storing and managing model deployments. The Azure Machine Learning workspace allows users to create and run ML pipelines to scale their data science processes. It also enables citizen data scientists to train models using no-code and low-code features.

Apart from an environment for data scientists and ML engineers to build and deploy custom models, the Azure cloud also provides users with a wide array of pre-trained ML models. Azure Cognitive Services encompass many models for computer vision (CV), speech recognition, text analytics, search capabilities, and so on. These models are available through ready-to-use API endpoints. They often involve niche cases but, when used correctly, bring a lot of value to the solution and are exceptionally fast to implement.

Chapter 9, Advanced Analytics Using AI, will go deeper into end-to-end ML workflows, such as the connection to data storages, performing preprocessing, model training, and model deployments. This chapter will also introduce the concepts of ML operations, often referred to as MLOps. This encompasses continuous integration and continuous development (CI/CD) for ML workflows.

Data governance and compliance

The more a data platform scales, the harder it becomes to maintain a clear overview of existing data sources, data assets, transformations, data access control, and compliance. To avoid a build-up of technical backlog, it is strongly recommended to start the setup of governance and compliance processes from an early stage of development and have it scale with the platform.

To govern Azure data platforms, Microsoft developed Microsoft Purview, formerly known as Azure Purview. This tool, which is covered in Chapter 10, Enterprise-Level Data Governance and Compliance, allows users to gain clear insights into the governance and compliance of the platform. Therefore, it is essential to the skill set of any aspiring Azure data architect. In this chapter, you will learn how to do the following:

  • Create a data map by performing scans on data assets
  • Construct a data catalog to provide an overview of the metadata of data assets
  • Build a business glossary to establish clear definitions of possibly ambiguous business terms
  • Gain executive insights on the entire data estate

Security

With the growing rise of harmful cyber-attacks, security is another indispensable component of a data platform. Improper security or configurations may lead to tremendous costs for the business. Investing in robust security to prevent attacks from happening will typically be vastly cheaper than dealing with the damage afterward.

Cybersecurity can be very complex and therefore should be configured and managed using the help of a cybersecurity architect. However, certain aspects of security should fall into the responsibilities of the data architect as well. The data architect should have the appropriate skill set to establish data security. Examples are working with row- or column-level security, data encryption at rest and in transit, masking sensitive data, and so on.

Chapter 11, Introduction to Data Security, will teach you all that is necessary to ensure data is always well protected and access is always limited to a minimum.

Monitoring

Disruptions such as failing data pipelines, breaking transformations, and unhealthy deployments can shut down the workings of an entire data platform. To limit the downtime to an absolute minimum, these processes and deployments should be monitored continuously.

Azure provides monitoring and health reports on pipeline runs, Spark and SQL jobs, ML model deployments, data asset scans, and more. The monitoring of these resources will be further discussed in their own respective chapters.

Challenges of on-premises architectures

Cloud computing has seen a steep rise in adoption during the last decade. Nevertheless, a significant chunk of businesses hold on to keeping their servers and data on-premises. There are certain reasons why a business may prefer on-premises over the cloud. Some businesses have the perception of increased security when keeping data on their own servers. Others, generally smaller businesses, may not feel the need to optimize their IT landscape or simply are not keen on change. Organizations in strictly regulated industries can be bound to on-premises for compliance. Whichever the reason, on-premises architectures nowadays come with certain challenges.

These challenges include, among other things, the following:

  • Scalability
  • Cost optimization
  • Agility
  • Flexibility

Let’s go through these challenges in detail.

Scalability

Organizations with a rapidly enlarging technological landscape will struggle the most to overcome the challenge of scalability. As the total business data volume keeps growing continually, an organization faces the constant need of having to find new ways to expand the on-premises server farm. It is not always as simple as just adding extra servers. After a while, extra building infrastructure is needed, new personnel must be hired, energy consumption soars, and so on.

Here, the benefit of cloud computing is the enormous pool of available servers and computing resources. For the business, this means it can provision any additional capacity without having to worry about the intricate organization and planning of its own servers.

Cost optimization

Businesses that completely rely on on-premises servers are never fully cost-effective. Why is this so?

Let’s take a look at two scenarios:

  • When usage increases: When the usage increases, the need for extra capacity arises. A business is not going to wait until its servers are used to their limits, risking heavy throttling and bottleneck issues, before starting to expand its capacity. Although the risk of full saturation of its servers is hereby avoided, the computing and storage capacity is never fully made use of. While usage can grow linearly or exponentially, costs will rise in discrete increments, referring to distinct expansions of server capacity.
  • When usage decreases: When the usage decreases, the additional capacity is simply standing there, unused. Even if the decrease in usage lasts for longer periods of time, it is not that simple to just sell the hardware, free up the physical space, and get rid of the extra maintenance personnel. In most situations, this results in costs remaining unchanged despite the usage.

Cloud computing usually follows a pay-as-you-go (PAYG) business model. This solves the two challenges of cost optimization during variable usage. PAYG allows businesses to match their costs to their usage, avoiding disparities, as can be seen in the following diagram:

Figure 1.4 – Cost patterns depending on usage for on-premises and cloud infrastructure

Figure 1.4 – Cost patterns depending on usage for on-premises and cloud infrastructure

Let’s cover the next challenge now.

Agility

In contrast to whether it is possible to make a certain change, agility refers to the speed at which businesses can implement these new changes. Expanding or reducing capacity, changing the types of processing power, and so on takes time in an on-premises environment. In most cases, this involves the acquisition of new hardware, installing the new compute, and configuring security, all of which can be extremely time-consuming in a business context.

Here, cloud architectures benefit from far superior agility over on-premises architectures. Scaling capacity up or down, changing memory-optimized processors for compute-optimized processors: all of this is performed in a matter of seconds or minutes.

Flexibility

The challenge of flexibility can be interpreted very broadly and has some intersections with the other challenges. Difficulties with scalability and agility can be defined as types of flexibility issues.

Apart from difficulties regarding scalability and agility, on-premises servers face the issue of constant hardware modernization. In this case, we could compare on-premises and cloud infrastructure to a purchased car or a rental car respectively. There is not always the need to make use of cutting-edge technology, but if the need is present, think about which option will result in having a more modern car in most situations.

In other cases, specialized hardware such as field-programmable gate arrays (FPGAs) might be required for a short period of time—for example, during the training of an extraordinarily complex ML model. To revisit the car example, would you rather purchase a van when you occasionally have to move furniture or rent a van for a day while moving?

Let’s summarize the chapter next.

Summary

In this chapter, we first discussed how to extract value from your data by asking the right analytical questions. Questions may increase in complexity from descriptive, diagnostic, and predictive to prescriptive but may also hold more value. A complexity-value matrix is necessary to prioritize data projects and build a data roadmap. A crucial thing to remember is to capture data as soon as possible, even if you don’t have a data strategy or roadmap yet. All data that you do not capture now cannot be used in the future to extract value from. Next, we introduced a reference architecture diagram. Over time, you will get familiar with every component of the diagram and how they interact with each other.

Four layers of cloud architectures were explained. The ingestion layer is used to pull data into the central cloud data platform. The storage layer is capable of holding massive amounts of data, often in a tiered system, where data gets more business-ready as it moves through the tiers. In the serving layer, the data warehouse is located, which holds data with a strictly enforced schema and is optimized for analytical workloads. Lastly, the consumption layer allows end users and external systems to consume the data in reports and dashboards or to be used in other applications.

Some components of the data platform span across multiple layers. Data orchestration and processing refers to data pipelines that ingest data into the cloud, move data from one place to another, and orchestrate data transformations. Advanced analytics leverages Azure’s many pre-trained ML models and a data science environment to perform complex calculations and provide meaningful predictions. Data governance tools bring data asset compliance, flexible access control, data lineage, and overall insights into the entire data estate. Impeccable security of individual components as well as the integrations between them takes away many of the worries regarding harmful actions being made by third parties. Finally, the extensive monitoring capabilities in Azure allow us to get insights into the health and performance of the processes and data storage in the platform.

Finally, we discussed the drawbacks that on-premises architectures face, such as scalability, cost optimization, agility, and flexibility. These challenges are often conveniently dealt with by leveraging the benefits of cloud-based approaches.

In the next chapter, we will look at two Microsoft frameworks that ease the move to the cloud.

Left arrow icon Right arrow icon

Key benefits

  • Translate and implement conceptual architectures with the right Azure services
  • Inject artificial intelligence into data solutions for advanced analytics
  • Leverage cloud computing and frameworks to drive data science workloads

Description

With data’s growing importance in businesses, the need for cloud data and AI architects has never been higher. The Azure Data and AI Architect Handbook is designed to assist any data professional or academic looking to advance their cloud data platform designing skills. This book will help you understand all the individual components of an end-to-end data architecture and how to piece them together into a scalable and robust solution. You’ll begin by getting to grips with core data architecture design concepts and Azure Data & AI services, before exploring cloud landing zones and best practices for building up an enterprise-scale data platform from scratch. Next, you’ll take a deep dive into various data domains such as data engineering, business intelligence, data science, and data governance. As you advance, you’ll cover topics ranging from learning different methods of ingesting data into the cloud to designing the right data warehousing solution, managing large-scale data transformations, extracting valuable insights, and learning how to leverage cloud computing to drive advanced analytical workloads. Finally, you’ll discover how to add data governance, compliance, and security to solutions. By the end of this book, you’ll have gained the expertise needed to become a well-rounded Azure Data & AI architect.

Who is this book for?

This book is for anyone looking to elevate their skill set to the level of an architect. Data engineers, data scientists, business intelligence developers, and database administrators who want to learn how to design end-to-end data solutions and get a bird’s-eye view of the entire data platform will find this book useful. Although not required, basic knowledge of databases and data engineering workloads is recommended.

What you will learn

  • Design scalable and cost-effective cloud data platforms on Microsoft Azure
  • Explore architectural design patterns with various use cases
  • Determine the right data stores and data warehouse solutions
  • Discover best practices for data orchestration and transformation
  • Help end users to visualize data using interactive dashboarding
  • Leverage OpenAI and custom ML models for advanced analytics
  • Manage security, compliance, and governance for the data estate

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Jul 31, 2023
Length: 284 pages
Edition : 1st
Language : English
ISBN-13 : 9781803230733
Category :
Concepts :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Product Details

Publication date : Jul 31, 2023
Length: 284 pages
Edition : 1st
Language : English
ISBN-13 : 9781803230733
Category :
Concepts :

Packt Subscriptions

See our plans and pricing
Modal Close icon
€18.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
€189.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts
€264.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 113.97
Azure Data and AI Architect Handbook
€37.99
Modern Generative AI with ChatGPT and OpenAI Models
€37.99
Azure Architecture Explained
€37.99
Total 113.97 Stars icon
Banner background image

Table of Contents

17 Chapters
Part 1: Introduction to Azure Data Architect Chevron down icon Chevron up icon
Chapter 1: Introduction to Data Architectures Chevron down icon Chevron up icon
Chapter 2: Preparing for Cloud Adoption Chevron down icon Chevron up icon
Part 2: Data Engineering on Azure Chevron down icon Chevron up icon
Chapter 3: Ingesting Data into the Cloud Chevron down icon Chevron up icon
Chapter 4: Transforming Data on Azure Chevron down icon Chevron up icon
Chapter 5: Storing Data for Consumption Chevron down icon Chevron up icon
Part 3: Data Warehousing and Analytics Chevron down icon Chevron up icon
Chapter 6: Data Warehousing Chevron down icon Chevron up icon
Chapter 7: The Semantic Layer Chevron down icon Chevron up icon
Chapter 8: Visualizing Data Using Power BI Chevron down icon Chevron up icon
Chapter 9: Advanced Analytics Using AI Chevron down icon Chevron up icon
Part 4: Data Security, Governance, and Compliance Chevron down icon Chevron up icon
Chapter 10: Enterprise-Level Data Governance and Compliance Chevron down icon Chevron up icon
Chapter 11: Introduction to Data Security Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.5
(13 Ratings)
5 star 61.5%
4 star 30.8%
3 star 7.7%
2 star 0%
1 star 0%
Filter icon Filter
Top Reviews

Filter reviews by




Tanya Silva Aug 08, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I have received this book for review purposes from the publisher.Review ( 8/8/23): I have spent the last few weeks reading Azure Data and AI Architect Handbook, wishing something like this existed many years ago when I was starting in the field of AI/ML, Data Engineering, Data Services, Data Security, Data Science, Data Governance, and other topics which with the rise of the interest in GenAI are coming into the light as well. I highly recommend this book to a new college graduate just starting their career in this fascinating space as well as to a seasoned professional who might have worked with Microsoft offerings for many years and would like to understand the latest technology development.The book covers Introduction to Data Architecture, Data Engineering, Data Warehousing and Analytics, and Data Security, Governance, and Compliance.In the Data Architecture section, topics such as reference diagrams with the various layers are covered, as well as data orchestration, processing, security, monitoring, etc. Once the reader understands the data architecture, the authors provide insight into the challenges with on-prem solutions and what is needed to prepare for Cloud Adoption.The Data Engineering section covers data ingestion, transformations, and storage. Each topic in itself is a vast area requiring deep subject matter expertise. However, for an AI architect, I suggest having at least a high-level understanding of how this area is done, what the challenges are, what the bottlenecks are, and what architectural decisions might be made at this step that might impact AI products down the line.The Data Warehousing and Analytics section covers DW ( the speed of your visualizations/downstream apps will be extremely dependent on this step). The semantic layer ( okay, I admit, I am biased when I have to do data analysis. I like it in a multidimensional format, but over the years, I have found it is difficult for humans to grasp high-level dimensionality). Power BI - even if you do not use Power BI, I highly recommend installing a free desktop version to play with it - you might be pleasantly surprised. The last topic in this section covers Advanced Analytics using AI - Azure Cognitive Services, Azure OpenAI, LLMs, and such. Depending on when you are reading this review, the digital version of the book may be modified to include new developments in this space.The last part covers Data Security, Governance, and Compliance - this topic sometimes gets on the backburner when the solutions are being architectured, but I suggest including this in the first rounds of your product development roadmap. I forecast a lot of development in legal, security, and privacy space coming up in the AI field.Overall, I highly recommend AI Architects have this book in their library as a reference so that they can envision end-to-end solutions on Azure and be aware of all pieces of the puzzle.
Amazon Verified review Amazon
Morph360Tech Jan 06, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Data Architecture, Cloud Adoption and how you can utilise the tools that Microsoft provide for Data Ingestion, Transformation and Consumption are clearly defined in the book which I will explore below.Ingestion and batch or streaming is explained in a simple manner most not technical would understand and the ingestion architecture are explored and described with Event Hubs and even the IoT hub.Transforming Data data flows, data lakes and pipelines along with the Bronze to silver and silver to gold transformations all within Azure giving people the heads up on which direction they wish to go forward with for data CI/CD on Azure.Storing Data for Consumption is explained with the significance of the data types (Structured, Semi-Structured ad Unstructured data)Data Warehousing and Analytics I loved the reference to good old normalisation and how these feeds into Data Marts and what they are I loved the Design methods and SCDs then into building a data warehouse in the cloud using Azure SQL, Synapse serverless SQL Pools or dedicated pools all sound complicated but they do explain it in simple terms that most people would get and understand.Visualisation and Power BI with starting out getting your data and the star schema to enriching it with DAX then one the Low code or Code first explored with some reference to Cognitive Services and OpenAI nicely explained and thought-provoking activities.Data Governance and Compliance/Data Security from RBAC to Threat Protection which is great to see alongside everything here rather than it being a side panel or something to look at later.Overall I am very impressed and you always find something new in the world of IT and especially in the world of data and analytics.
Amazon Verified review Amazon
Steven Fernandes Sep 10, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book is a goldmine for anyone looking to build or optimize data solutions on Azure. It walks you through designing scalable, cost-effective cloud architectures, and offers best practices for data storage, ETL processes, and visualization. The inclusion of real-world use cases and advanced topics like OpenAI and custom ML models makes it an indispensable resource. Highly recommended for data professionals at all levels!
Amazon Verified review Amazon
S.Kundu Aug 23, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
The Book will start with explaining different Data Architectures and go through the different layers along with challenges of on-premises architectures. Then it will slowly move into details of different Batch and Streaming ingestion architectures.It explains how you can transform your data using different options such as mapping data flows, Spark notebooks, SQL scripts, SSIS, Azure Stream Analytics and Azure Databricks.It will teach how to schedule and monitor your data pipelines on Azure and also will help you understand how to deploy using CI/CD. Then it will deep dive into different Data Warehousing concepts.You will also learn concepts about Azure Cognitive Services, Azure OpenAI Service and Azure Machine Learning and MLOps.The book will also cover different options of implementing security through access controls and authentication mechanisms along with how to use Microsoft Purview for data governance.
Amazon Verified review Amazon
Rohan Desai Aug 16, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
The book "Azure Data and AI architecture Handbook" by Oliver and Breght is an insightful guide that skillfully navigates the intricate intersection of data architecture and artificial intelligence. It offers a comprehensive exploration of the vital role data architecture plays in the realm of AI, making it a must-read for both beginners and seasoned professionals in the field.The book delves into the foundational concepts of data architecture, gradually intertwining them with cutting-edge AI principles. The author's ability to explain complex concepts in an accessible manner is commendable, making it a suitable resource for readers with varying levels of expertise. From discussing data modeling techniques to elucidating the intricacies of neural networks and machine learning algorithms, the book covers a wide spectrum of topics.This book starts with a glimpse of data architecture and preparation for cloud adoption. Further, this book focuses more on data massaging concepts with Azures' perspective.This book provides an in-depth process of transforming data using multiple methods like sql scripts, ssis, and spark notebooks, with a detailed outline of batch and streaming ingestion.It also explains data warehousing concepts from scratch with an explanation of SCD concepts and its implementation.This book also covers the data security, governance, and compliance topics that would enlighten you on enterprise level data governance, its importance, data protection, access control, and threat protection.At the academic or professional level, this book servers a good source of learning for someone who would start their Azure journey as a data modeler or data engineer.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.