Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases now! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Azure Data and AI Architect Handbook
Azure Data and AI Architect Handbook

Azure Data and AI Architect Handbook: Adopt a structured approach to designing data and AI solutions at scale on Microsoft Azure

Arrow left icon
Profile Icon Olivier Mertens Profile Icon Breght Van Baelen
Arrow right icon
€20.98 €29.99
Full star icon Full star icon Full star icon Full star icon Half star icon 4.5 (13 Ratings)
eBook Jul 2023 284 pages 1st Edition
eBook
€20.98 €29.99
Paperback
€37.99
Subscription
Free Trial
Renews at €18.99p/m
Arrow left icon
Profile Icon Olivier Mertens Profile Icon Breght Van Baelen
Arrow right icon
€20.98 €29.99
Full star icon Full star icon Full star icon Full star icon Half star icon 4.5 (13 Ratings)
eBook Jul 2023 284 pages 1st Edition
eBook
€20.98 €29.99
Paperback
€37.99
Subscription
Free Trial
Renews at €18.99p/m
eBook
€20.98 €29.99
Paperback
€37.99
Subscription
Free Trial
Renews at €18.99p/m

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
Table of content icon View table of contents Preview book icon Preview Book

Azure Data and AI Architect Handbook

Introduction to Data Architectures

With data quickly becoming an essential asset of any business, the need for cloud data architects has never been higher. The key role these professionals fulfill is to provide the technical blueprints of any cloud data project and expertise on data architectures as a whole. A skilled data architect is proficient in many steps of the end-to-end data processes, such as data ingestion, data warehouses, data transformations, and visualization.

It is of utmost importance that data architects are familiar with the benefits and drawbacks of individual resources as well as platform-wide design patterns. Typically, aspiring data architects have a background as business intelligence (BI) developers, data engineers, or data scientists. They are often specialized in one or more tools but lack experience in architecting solutions according to best practices.

Compared to a developer profile, an architect is more focused on the long term and the bigger picture. The architect must keep in mind the overarching business strategy and prioritize certain aspects of the architecture accordingly. To equip you with the necessary skills to do so, you will be introduced to methods of getting business value from your data, to solidify any long-term data strategy.

This chapter will also introduce you to a wide-purpose referential data architecture. This architecture will be used as a guideline throughout this entire book and will become more and more defined as the chapters go on.

Finally, on-premises data architectures nowadays face a variety of challenges. You will explore these challenges and look at how a business can benefit from either a cloud or a hybrid cloud solution.

In this chapter, we’re going to cover the following main topics:

  • Understanding the value of data
  • A data architecture reference diagram
  • Challenges of on-premises architectures

Understanding the value of data

Data generation is growing at an exponential rate. 90 percent of data in the world was generated in the last 2 years, and global data creation is expected to reach 181 zettabytes in 2022.

Just to put this number in perspective, 1 zettabyte is equal to 1 million petabytes. This scale requires data architects to deal with the complexity of big data, but it also introduces an opportunity. The expert data analyst, Doug Laney, defines big data with the popular three Vs framework: Volume, Variety, and Velocity. In this section, we would like to explore a fourth one called Value.

Types of analytics

Data empowers businesses to look back into the past, giving insights into established and emerging patterns, and making informed decisions for the future. Gartner splits analytical solutions that support decision-making into four categories: descriptive, diagnostic, predictive, and prescriptive analytics. Each category is potentially more complex to analyze but can also add more value to your business.

Let’s go through each of these categories next:

  • Descriptive analytics is concerned with answering the question, “What is happening in my business?” It describes the past and current state of the business by creating static reports on top of data. The data used to answer this question is often modeled in a data warehouse, which models historical data in dimension and fact tables for reporting purposes.
  • Diagnostic analytics tries to answer the question, “Why is it happening?” It drills down into the historical data with interactive reports and diagnoses the root cause. Interactive reports are still built on top of a data warehouse, but additional data may be added to support this type of analysis. A broader view of your data estate allows for more root causes to be found.
  • Predictive analytics learns from historical trends and patterns to make predictions for the future. It deals with answering the question, “What will happen in the future?” This is where machine learning (ML) and artificial intelligence (AI) come into play, drawing data from the data warehouse or raw data sources to learn from.
  • Prescriptive analytics answers the question, “What should I do?” and prescribes the next best action. When we know what will happen in the future, we can act on it. This can be done by using different ML methods such as recommendation systems or explainable AI. Recommendation systems recommend the next best product to customers based on similar products or what similar customers bought. Think, for instance, about Netflix recommending new series or movies you might like. Explainable AI will identify which factors were most important to output a certain prediction, which allows you to act on those factors to change the predicted outcome.

The following diagram shows the value-extracting process, going from data to analytics, decisions, and actions:

Figure 1.1 – Extracting value from data

Figure 1.1 – Extracting value from data

Just as with humans, ML models need to learn from their mistakes, which can be done with the help of a feedback loop. A feedback loop allows a teacher to correct the outcomes of the ML model and add them as training labels for the next learning cycle. Learning cycles allow the ML model to improve over time and combat data drift. Data drift occurs when the data on which the model was trained isn’t representative anymore of the data the model predicts. This will lead to inaccurate predictions.

As ML models improve over time, it is best practice to have human confirmation of predictions before automating the decision-making process. Even when an ML model has matured, we can’t rely on the model being right 100 percent of the time. This is why ML models often work with confidence scores, stating how confident they are in the prediction. If the confidence score is below a certain threshold, human intervention is required.

To get continuous value out of data, it is necessary to build a data roadmap and strategy. A complexity-value matrix is a mapping tool to help prioritize which data projects need to be addressed first. This matrix will be described more in detail in the following section.

A complexity-value matrix

A complexity-value matrix has four quadrants to plot future data projects on. These go from high- to low-value and low- to high-complexity. Projects that are considered high-value and have a low complexity are called “quick wins” or “low-hanging fruit” and should be prioritized first. These are often Software-as-a-Service (SaaS) applications or third-party APIs that can quickly be integrated into your data platform to get immediate value. Data projects with high complexity and low value should not be pursued as they have a low Return on Investment (ROI). In general, the more difficult our analytical questions become, the more complex the projects may be, but also, the more value we may get out of it.

A visualization of the four quadrants of the matrix can be seen as follows:

Figure 1.2 – The four quadrants of a complexity-value matrix

Figure 1.2 – The four quadrants of a complexity-value matrix

Often, we think of the direct value data projects bring but do also consider the indirect value. Data engineering projects often do not have a direct value as they move data from one system to another, but this may indirectly open up a world of new opportunities.

To extract value from data, a solid data architecture needs to be in place. In the following section, we’ll define an abstract data architecture diagram that will be referenced throughout this book to explain data architecture principles.

A data architecture reference diagram

The reference architecture diagram that is abstractly defined for now in Figure 1.3 shows the typical structure of an end-to-end data platform in a (hybrid) cloud:

Figure 1.3 – A typical structure of an end-to-end data platform in a (hybrid) cloud

Figure 1.3 – A typical structure of an end-to-end data platform in a (hybrid) cloud

This reference diagram shows the key components of most modern cloud data platforms. There are limitless possible adaptations, such as accommodating streaming data, but the diagram in Figure 1.3 serves as the basis for more advanced data architectures. It’s like the Pizza Margherita of data architectures! The architecture diagram in Figure 1.3 already shows four distinct layers in the end-to-end architecture, as follows:

  • The ingestion layer
  • The storage layer
  • The serving layer
  • The consumption layer

Next to these layers, there are a couple of other key aspects of the data platform that span across multiple layers, as follows:

  • Data orchestration and processing
  • Advanced analytics
  • Data governance and compliance
  • Security
  • Monitoring

Let’s cover the first layer next.

The ingestion layer

The ingestion layer serves as the data entrance to the cloud environment. Here, data from various sources is pulled into the cloud. These sources include on-premises databases, SaaS applications, other cloud environments, Internet of Things (IoT) devices, and many more. Let’s look at this layer in more detail:

  • First, the number of data sources can vary greatly between businesses and could already bring a variety of challenges to overcome. In enterprise-scale organizations, when the amount of data sources can reach extraordinary levels, it is of exceptional importance to maintain a clear overview and management of these sources.
  • Secondly, the sheer variety of sources is another common issue to deal with. Different data sources can have distinct methods of ingesting data into the cloud and, in some cases, require architectural changes to accommodate.
  • Thirdly, managing authentication for data sources can be cumbersome. Authentication, which happens in a multitude of ways, is often unique to the data source. Every source requires its own tokens, keys, or other types of credentials that must be managed and seamlessly refreshed to optimize security.

From a design perspective, there are a few other aspects to keep in mind. The architect should consider the following:

  • Data speed: Will incoming data from the source be ingested periodically (that is, batch ingestion) or continuously (that is, data streaming)?
  • Level of the structure of the data: Will the incoming data be unstructured, semi-structured, or structured?

Regarding data speed, data will be ingested in batches in the vast majority of cases. This translates to periodical requests made to an application programming interface (API) to pull data from the data source. For the more uncommon cases of streaming data, architectural changes are required to provide an environment to store and process the continuous flow of data. In later chapters, you will discover how the platform architecture will differ to accommodate the streaming data.

Finally, the level of structure of the data will determine the amount of required data transformations, the methods of storing the data, or the destination of data movements. Unstructured data, such as images and audio files, will require different processing compared to semi-structured key-value pairs or structured tabular files.

(Add what data ingestion services will be discussed later in the book).

The storage layer

The definitions of the following layers can vary. Over the course of this book, the storage layer refers to the central (often large-scale) storage of data. Data lakes are the most common method for massive storage of data, due to their capacity and relatively low cost. Alternatives are graph-based databases, relational databases, NoSQL databases, flat file-based databases, and so on. The data warehouse, which holds business-ready data and is optimized for querying and analytics, does not belong to the storage layer but will fall under the serving layer instead.

Decisions made by the architect in the storage layer can have a great effect on costs, performance, and the data platform in its entirety. Here, the architect will have to consider redundancy, access tiers, and security. In the case of a data lake, a tier system needs to be considered for raw, curated, and enriched data, as well as a robust and scalable folder structure.

(Add what data storage services will be discussed later in the book).

The serving layer

In the serving layer, preprocessed and cleansed data is stored in a data warehouse, often regarded as the flagship of the data platform. This is a type of structured storage that is optimized for large-scale queries and analytics. The data warehouse forms one of the core components of BI.

The major difference between a data warehouse and the aforementioned data lake is the level of structure. A data warehouse is defined by schemas and enforces data types and structures. Conversely, a data lake can be seen as a massive dump of all kinds of data, with little to no regard for the enforcement of specific rules. The strong level of enforcement makes a data warehouse significantly more homogeneous, which results in far better performance for analytics.

The cloud data architect has various decisions to make in the serving layer. There are quite a few options for data warehousing on the Azure cloud, as follows:

  1. First, the architect should think about whether they want an Infrastructure-as-a-Service (IaaS), a Platform-as-a-Service (PaaS), or a SaaS solution. In short, this results in a trade-off between management responsibilities, development efforts, and flexibility. This will be discussed more in later chapters.
  2. Next, different services on Azure come with their own advantages and disadvantages. The architect could, for example, opt for a very cost-effective serverless SQL solution or leverage massive processing power in highly performant dedicated SQL pools, among numerous other options.

After deciding on the most fitting service, there are still decisions to be made within the data warehouse. The architect will have to determine structures to organize the data in the data warehouse, also known as schemas. Common schemas are star and snowflake schemas, which also come with their own benefits and drawbacks.

Chapter 6, Data Warehousing, will teach you all the necessary skills to confidently decide on the right solution. Chapter 7, The Semantic Layer, will introduce you to the concept of data marts, subsets of a data warehouse ready for business consumption.

The consumption layer

The consumption layer is the final layer of an end-to-end data architecture and typically follows the serving layer by extracting data from the data warehouse. There are numerous ways of consuming the data, which has been prepared and centralized in earlier stages.

The most common manner of consumption is through data visualization. This can happen through dashboarding and building reports. The combination of a data warehouse and a visualization service is often referred to as BI. Many modern dashboarding tools allow for interactivity and drill-down functionality within the dashboard itself. Although technically it is not a part of the Azure stack, Power BI is the preferred service for data visualization for Azure data platforms. However, Microsoft allows other visualization services to connect conveniently as well.

Another way to consume data is by making the data available to other applications or platforms using APIs.

Chapter 8, Visualizing Data Using Power BI, will teach you how to extract data from the data warehouse in various ways and visualize it using interactive dashboarding. In this chapter, you will also discover methods to perform self-service BI, allowing end users to create their own ad hoc dashboards and reports to quickly perform data analysis.

Data orchestration and processing

Contrary to the four layers mentioned previously, there are a couple of other core components of the data platform that span across the entire end-to-end process.

Data orchestration refers to moving data from one place to another, often using data pipelines. This process is often done by data engineers. When data is moved from one stage to the next, data undergoes transformations in the form of joining data, deriving new columns, computing aggregations, and so on. For example, when data is moved from a data lake to a data warehouse, it must be transformed to match the data model, which is enforced by the data warehouse. Another example is when moving data between tiers (raw, curated, and enriched tiers) in the data lake, where the data becomes more and more ready for business use whenever it moves up a tier.

Data pipelines allow data engineers to automate and scale the orchestration and processing of data. These components are critical to the performance and health of the data platform and must be monitored accordingly.

Here are two common methods of performing orchestration and processing:

  • Extract-Transform-Load (ETL)
  • Extract-Load-Transform (ELT)

In both cases, data is extracted from a source and loaded to a destination. The main difference between both methods is the location where the transformations take place. These will be further discussed in Chapter 4, Transforming Data on Azure. This chapter will also teach you how to create and monitor data pipelines according to best practices.

Advanced analytics

For analyses that may be too complex to perform in the serving layer, an analytics suite or data science environment can be added to the architecture to perform advanced analytics and unlock ML capabilities. This component can often be added in a later stage of platform development, as it will mostly not influence the core working of the other layers. A data platform in an early phase of development can perfectly exist without this component.

One option for the advanced analytics suite is an ML workspace where data scientists can preprocess data, perform feature engineering, and train and deploy ML models. The latter may require additional components such as a container registry for storing and managing model deployments. The Azure Machine Learning workspace allows users to create and run ML pipelines to scale their data science processes. It also enables citizen data scientists to train models using no-code and low-code features.

Apart from an environment for data scientists and ML engineers to build and deploy custom models, the Azure cloud also provides users with a wide array of pre-trained ML models. Azure Cognitive Services encompass many models for computer vision (CV), speech recognition, text analytics, search capabilities, and so on. These models are available through ready-to-use API endpoints. They often involve niche cases but, when used correctly, bring a lot of value to the solution and are exceptionally fast to implement.

Chapter 9, Advanced Analytics Using AI, will go deeper into end-to-end ML workflows, such as the connection to data storages, performing preprocessing, model training, and model deployments. This chapter will also introduce the concepts of ML operations, often referred to as MLOps. This encompasses continuous integration and continuous development (CI/CD) for ML workflows.

Data governance and compliance

The more a data platform scales, the harder it becomes to maintain a clear overview of existing data sources, data assets, transformations, data access control, and compliance. To avoid a build-up of technical backlog, it is strongly recommended to start the setup of governance and compliance processes from an early stage of development and have it scale with the platform.

To govern Azure data platforms, Microsoft developed Microsoft Purview, formerly known as Azure Purview. This tool, which is covered in Chapter 10, Enterprise-Level Data Governance and Compliance, allows users to gain clear insights into the governance and compliance of the platform. Therefore, it is essential to the skill set of any aspiring Azure data architect. In this chapter, you will learn how to do the following:

  • Create a data map by performing scans on data assets
  • Construct a data catalog to provide an overview of the metadata of data assets
  • Build a business glossary to establish clear definitions of possibly ambiguous business terms
  • Gain executive insights on the entire data estate

Security

With the growing rise of harmful cyber-attacks, security is another indispensable component of a data platform. Improper security or configurations may lead to tremendous costs for the business. Investing in robust security to prevent attacks from happening will typically be vastly cheaper than dealing with the damage afterward.

Cybersecurity can be very complex and therefore should be configured and managed using the help of a cybersecurity architect. However, certain aspects of security should fall into the responsibilities of the data architect as well. The data architect should have the appropriate skill set to establish data security. Examples are working with row- or column-level security, data encryption at rest and in transit, masking sensitive data, and so on.

Chapter 11, Introduction to Data Security, will teach you all that is necessary to ensure data is always well protected and access is always limited to a minimum.

Monitoring

Disruptions such as failing data pipelines, breaking transformations, and unhealthy deployments can shut down the workings of an entire data platform. To limit the downtime to an absolute minimum, these processes and deployments should be monitored continuously.

Azure provides monitoring and health reports on pipeline runs, Spark and SQL jobs, ML model deployments, data asset scans, and more. The monitoring of these resources will be further discussed in their own respective chapters.

Challenges of on-premises architectures

Cloud computing has seen a steep rise in adoption during the last decade. Nevertheless, a significant chunk of businesses hold on to keeping their servers and data on-premises. There are certain reasons why a business may prefer on-premises over the cloud. Some businesses have the perception of increased security when keeping data on their own servers. Others, generally smaller businesses, may not feel the need to optimize their IT landscape or simply are not keen on change. Organizations in strictly regulated industries can be bound to on-premises for compliance. Whichever the reason, on-premises architectures nowadays come with certain challenges.

These challenges include, among other things, the following:

  • Scalability
  • Cost optimization
  • Agility
  • Flexibility

Let’s go through these challenges in detail.

Scalability

Organizations with a rapidly enlarging technological landscape will struggle the most to overcome the challenge of scalability. As the total business data volume keeps growing continually, an organization faces the constant need of having to find new ways to expand the on-premises server farm. It is not always as simple as just adding extra servers. After a while, extra building infrastructure is needed, new personnel must be hired, energy consumption soars, and so on.

Here, the benefit of cloud computing is the enormous pool of available servers and computing resources. For the business, this means it can provision any additional capacity without having to worry about the intricate organization and planning of its own servers.

Cost optimization

Businesses that completely rely on on-premises servers are never fully cost-effective. Why is this so?

Let’s take a look at two scenarios:

  • When usage increases: When the usage increases, the need for extra capacity arises. A business is not going to wait until its servers are used to their limits, risking heavy throttling and bottleneck issues, before starting to expand its capacity. Although the risk of full saturation of its servers is hereby avoided, the computing and storage capacity is never fully made use of. While usage can grow linearly or exponentially, costs will rise in discrete increments, referring to distinct expansions of server capacity.
  • When usage decreases: When the usage decreases, the additional capacity is simply standing there, unused. Even if the decrease in usage lasts for longer periods of time, it is not that simple to just sell the hardware, free up the physical space, and get rid of the extra maintenance personnel. In most situations, this results in costs remaining unchanged despite the usage.

Cloud computing usually follows a pay-as-you-go (PAYG) business model. This solves the two challenges of cost optimization during variable usage. PAYG allows businesses to match their costs to their usage, avoiding disparities, as can be seen in the following diagram:

Figure 1.4 – Cost patterns depending on usage for on-premises and cloud infrastructure

Figure 1.4 – Cost patterns depending on usage for on-premises and cloud infrastructure

Let’s cover the next challenge now.

Agility

In contrast to whether it is possible to make a certain change, agility refers to the speed at which businesses can implement these new changes. Expanding or reducing capacity, changing the types of processing power, and so on takes time in an on-premises environment. In most cases, this involves the acquisition of new hardware, installing the new compute, and configuring security, all of which can be extremely time-consuming in a business context.

Here, cloud architectures benefit from far superior agility over on-premises architectures. Scaling capacity up or down, changing memory-optimized processors for compute-optimized processors: all of this is performed in a matter of seconds or minutes.

Flexibility

The challenge of flexibility can be interpreted very broadly and has some intersections with the other challenges. Difficulties with scalability and agility can be defined as types of flexibility issues.

Apart from difficulties regarding scalability and agility, on-premises servers face the issue of constant hardware modernization. In this case, we could compare on-premises and cloud infrastructure to a purchased car or a rental car respectively. There is not always the need to make use of cutting-edge technology, but if the need is present, think about which option will result in having a more modern car in most situations.

In other cases, specialized hardware such as field-programmable gate arrays (FPGAs) might be required for a short period of time—for example, during the training of an extraordinarily complex ML model. To revisit the car example, would you rather purchase a van when you occasionally have to move furniture or rent a van for a day while moving?

Let’s summarize the chapter next.

Summary

In this chapter, we first discussed how to extract value from your data by asking the right analytical questions. Questions may increase in complexity from descriptive, diagnostic, and predictive to prescriptive but may also hold more value. A complexity-value matrix is necessary to prioritize data projects and build a data roadmap. A crucial thing to remember is to capture data as soon as possible, even if you don’t have a data strategy or roadmap yet. All data that you do not capture now cannot be used in the future to extract value from. Next, we introduced a reference architecture diagram. Over time, you will get familiar with every component of the diagram and how they interact with each other.

Four layers of cloud architectures were explained. The ingestion layer is used to pull data into the central cloud data platform. The storage layer is capable of holding massive amounts of data, often in a tiered system, where data gets more business-ready as it moves through the tiers. In the serving layer, the data warehouse is located, which holds data with a strictly enforced schema and is optimized for analytical workloads. Lastly, the consumption layer allows end users and external systems to consume the data in reports and dashboards or to be used in other applications.

Some components of the data platform span across multiple layers. Data orchestration and processing refers to data pipelines that ingest data into the cloud, move data from one place to another, and orchestrate data transformations. Advanced analytics leverages Azure’s many pre-trained ML models and a data science environment to perform complex calculations and provide meaningful predictions. Data governance tools bring data asset compliance, flexible access control, data lineage, and overall insights into the entire data estate. Impeccable security of individual components as well as the integrations between them takes away many of the worries regarding harmful actions being made by third parties. Finally, the extensive monitoring capabilities in Azure allow us to get insights into the health and performance of the processes and data storage in the platform.

Finally, we discussed the drawbacks that on-premises architectures face, such as scalability, cost optimization, agility, and flexibility. These challenges are often conveniently dealt with by leveraging the benefits of cloud-based approaches.

In the next chapter, we will look at two Microsoft frameworks that ease the move to the cloud.

Left arrow icon Right arrow icon

Key benefits

  • Translate and implement conceptual architectures with the right Azure services
  • Inject artificial intelligence into data solutions for advanced analytics
  • Leverage cloud computing and frameworks to drive data science workloads

Description

With data’s growing importance in businesses, the need for cloud data and AI architects has never been higher. The Azure Data and AI Architect Handbook is designed to assist any data professional or academic looking to advance their cloud data platform designing skills. This book will help you understand all the individual components of an end-to-end data architecture and how to piece them together into a scalable and robust solution. You’ll begin by getting to grips with core data architecture design concepts and Azure Data & AI services, before exploring cloud landing zones and best practices for building up an enterprise-scale data platform from scratch. Next, you’ll take a deep dive into various data domains such as data engineering, business intelligence, data science, and data governance. As you advance, you’ll cover topics ranging from learning different methods of ingesting data into the cloud to designing the right data warehousing solution, managing large-scale data transformations, extracting valuable insights, and learning how to leverage cloud computing to drive advanced analytical workloads. Finally, you’ll discover how to add data governance, compliance, and security to solutions. By the end of this book, you’ll have gained the expertise needed to become a well-rounded Azure Data & AI architect.

Who is this book for?

This book is for anyone looking to elevate their skill set to the level of an architect. Data engineers, data scientists, business intelligence developers, and database administrators who want to learn how to design end-to-end data solutions and get a bird’s-eye view of the entire data platform will find this book useful. Although not required, basic knowledge of databases and data engineering workloads is recommended.

What you will learn

  • Design scalable and cost-effective cloud data platforms on Microsoft Azure
  • Explore architectural design patterns with various use cases
  • Determine the right data stores and data warehouse solutions
  • Discover best practices for data orchestration and transformation
  • Help end users to visualize data using interactive dashboarding
  • Leverage OpenAI and custom ML models for advanced analytics
  • Manage security, compliance, and governance for the data estate

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Jul 31, 2023
Length: 284 pages
Edition : 1st
Language : English
ISBN-13 : 9781803230733
Category :
Concepts :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning

Product Details

Publication date : Jul 31, 2023
Length: 284 pages
Edition : 1st
Language : English
ISBN-13 : 9781803230733
Category :
Concepts :

Packt Subscriptions

See our plans and pricing
Modal Close icon
€18.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
€189.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts
€264.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 113.97
Azure Architecture Explained
€37.99
Modern Generative AI with ChatGPT and OpenAI Models
€37.99
Azure Data and AI Architect Handbook
€37.99
Total 113.97 Stars icon

Table of Contents

17 Chapters
Part 1: Introduction to Azure Data Architect Chevron down icon Chevron up icon
Chapter 1: Introduction to Data Architectures Chevron down icon Chevron up icon
Chapter 2: Preparing for Cloud Adoption Chevron down icon Chevron up icon
Part 2: Data Engineering on Azure Chevron down icon Chevron up icon
Chapter 3: Ingesting Data into the Cloud Chevron down icon Chevron up icon
Chapter 4: Transforming Data on Azure Chevron down icon Chevron up icon
Chapter 5: Storing Data for Consumption Chevron down icon Chevron up icon
Part 3: Data Warehousing and Analytics Chevron down icon Chevron up icon
Chapter 6: Data Warehousing Chevron down icon Chevron up icon
Chapter 7: The Semantic Layer Chevron down icon Chevron up icon
Chapter 8: Visualizing Data Using Power BI Chevron down icon Chevron up icon
Chapter 9: Advanced Analytics Using AI Chevron down icon Chevron up icon
Part 4: Data Security, Governance, and Compliance Chevron down icon Chevron up icon
Chapter 10: Enterprise-Level Data Governance and Compliance Chevron down icon Chevron up icon
Chapter 11: Introduction to Data Security Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Most Recent
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.5
(13 Ratings)
5 star 61.5%
4 star 30.8%
3 star 7.7%
2 star 0%
1 star 0%
Filter icon Filter
Most Recent

Filter reviews by




Morph360Tech Jan 06, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Data Architecture, Cloud Adoption and how you can utilise the tools that Microsoft provide for Data Ingestion, Transformation and Consumption are clearly defined in the book which I will explore below.Ingestion and batch or streaming is explained in a simple manner most not technical would understand and the ingestion architecture are explored and described with Event Hubs and even the IoT hub.Transforming Data data flows, data lakes and pipelines along with the Bronze to silver and silver to gold transformations all within Azure giving people the heads up on which direction they wish to go forward with for data CI/CD on Azure.Storing Data for Consumption is explained with the significance of the data types (Structured, Semi-Structured ad Unstructured data)Data Warehousing and Analytics I loved the reference to good old normalisation and how these feeds into Data Marts and what they are I loved the Design methods and SCDs then into building a data warehouse in the cloud using Azure SQL, Synapse serverless SQL Pools or dedicated pools all sound complicated but they do explain it in simple terms that most people would get and understand.Visualisation and Power BI with starting out getting your data and the star schema to enriching it with DAX then one the Low code or Code first explored with some reference to Cognitive Services and OpenAI nicely explained and thought-provoking activities.Data Governance and Compliance/Data Security from RBAC to Threat Protection which is great to see alongside everything here rather than it being a side panel or something to look at later.Overall I am very impressed and you always find something new in the world of IT and especially in the world of data and analytics.
Amazon Verified review Amazon
Advitya Gemawat Nov 06, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
The book covers several essential concepts and skills needed to design, implement, and manage data and AI solutions on Azure. It also provides practical guidance and best practices for various scenarios and challenges that data and AI architects may encounter.Some of the topics that I found particularly useful were:🛠 Using Azure Synapse Analytics to build a modern data warehouse that can handle both structured and unstructured data at scale. The book explains how to use Synapse SQL, Spark, and Synapse Pipelines to ingest, transform, and analyze data from various sources. It also shows how to use Synapse Studio, a unified web-based interface that simplifies the development and management of data and AI projects. Synapse's capabilities may be further augmented by utilizing integrations with other data products as part of Microsoft Fabric's platform.📈 Leveraging Azure Machine Learning (AML) to create, train, and deploy machine learning models on the cloud. The book also demonstrates how to quickly get started with AML designer, a drag-and-drop tool that allows you to build machine learning pipelines without writing code.🧰 I also love that this book includes description on several basic concepts around LLMs (such as Fine-tuning, Grounding etc), describing the use-cases of several foundational models, and building custom applications with the Azure Open AI (AOAI) Service.In my view, what especially makes this book resourceful for non-technical and technical audiences is its emphasis on explanation of basic concepts around cloud architectures and ML modeling along with code examples to get started.
Amazon Verified review Amazon
nikesh Oct 30, 2023
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
The Azure Data and AI Architect Handbook is an excellent resource that offers insights into Azure data architecture and AI capabilities, catering to a wide audience, from beginners entering the world of data to seasoned professionals looking to stay updated with the latest Azure offerings. This book introduces core data architecture design principles and Azure data and AI services, providing a comprehensive guide to building enterprise-scale data platforms and driving advanced analytical workloads in the cloud. It covers data engineering, business intelligence, data science, data governance, data ingestion, data warehousing, and more, making it a valuable resource for those seeking to optimize data solutions on Azure.The book excels in bridging theoretical concepts with practical applications, providing readers with a holistic understanding of Azure's data architecture and AI possibilities. However, there are some areas for improvement. In many chapters, the book lacks external resource cross-references, and additionally, the authors could consider including a companion GitHub repository. In summary, the Azure Data and AI Architect Handbook is a valuable resource for those venturing into Azure data architecture and AI applications. Its depth and real-world examples empower readers to put their knowledge into action. To further elevate its educational value, addressing the noted areas for improvement, including enhancing the reference list and a companion GitHub repository, would make it an even more comprehensive and practical learning tool.
Amazon Verified review Amazon
Amazon Customer Oct 15, 2023
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
Packt's "Azure Data and AI Architect Handbook" provides a comprehensive overview of core Azure data architecture concepts. It covers building enterprise-scale platforms and emphasizes best practices, with clear explanations and helpful diagrams.Pros:· Comprehensive coverage of Azure data architecture.· Clear explanations and valuable insights.· Well-structured content with informative diagrams.Cons:· Lacks guidance on aligning business use cases with data strategies.· Limited practical implementation guidance.Overall:A valuable resource for understanding Azure's data landscape, offering thorough coverage of architecture concepts. However, more practical guidance on business alignment and implementation would enhance its value.
Amazon Verified review Amazon
Kelvin D. Meeks Sep 11, 2023
Full star icon Full star icon Full star icon Empty star icon Empty star icon 3
As I read through the chapters of this book, the thought that kept coming to my mind:"It's like reading diluted & neutered sets of Microsoft Azure documentation" (i.e., no rich cross-linking to additional relevant content - and almost no hands-on examples)Read on, for why I had that feeling...At 284 pages (but only 245, if we exclude the Index) – this book impressively attempts to cover a wide range of information that will be of interest to anyone that wishes to establish an architect-level awareness of Azure data architecture & AI capabilities.Note: For my review – I read a PDF version of the book that I downloaded from Packt’s web site, AFTER the publication date of the book.Three key criticisms I have - with almost the entire book:- A significant lack of additional suggested reading links (beyond just the paltry few citations of Microsoft Azure documentation). There is a severe dearth of reference to other related material, articles, books, research papers - that would deeply enrich the reader's experience - and magnify the educational value of this book.- With the noticeable exception of Chapter-8, there is a severe paucity of actual detailed examples in the majority of the book's pages.- The lack of a companion github repository - providing hands-on examples.This book suffers from a lack, in almost all chapters, of any in-depth, detailed discussion – of real-world examples & case studies. In Chapter-3 (Page-39), fraud detection is briefly mentioned – and would have made an EXCELLENT example / case study – on which to elaborate in that chapter.In almost every instance – the reader would be better served by simply reading the Microsoft Azure documentation – rather than the diluted treatment given to many topics in the various chapters – most of which lack the basic courtesy of pointing the reader to the appropriate online documentation landing page, for the services discussed.What I liked:Chapter-3’s discussion of Kappa and and Delta lake architectures.Chapter-6’s coverage of Data Warehousing (this is the best-written chapter in the entire book, and provides detail examples to clearly explain concepts).What could be improved in the next edition:Better use of color – and consistent use of color - in diagramsPage-xvi, hyperlink to errata page is not enabled.MAJOR MISS: Inclusion of a companion github project for the book, to provide some hands-on exercises.Chapter-1 (page-4): The first sentence of this book, published in July/August 2023 - refers to some growth predictions, in the past..."Data generation is growing at an exponential rate. 90 percent of data in the world was generated in the last 2 years, and global data creation is expected to reach 181 zettabytes in 2022". A better quote would be to show the expected growth by 2030, at the very least.Chapter-1 (Page-7): The Data Architecture reference diagram does not reflect a “Data orchestration and processing” layer – but this is called out in the bullet list enumeration of diagram elements.Chapter-1 (Page-8): Appears to still have some internal / editor reminder note embedded in the text, re: “(Add what data ingestion services will be discussed later in the book).”Chapter-1 (Page-9): Appears to still have some internal / editor reminder note embedded in the text, re:“(Add what data storage services will be discussed later in the book).”Chapter-1 lacks any suggested links, additional reading – to enrich the reader’s experience.NOTE: This criticism holds TRUE for the MAJORITY of the book's chapters.Chapter-1 is missing a section to introduce the fundamental concepts of Data Architecture PrinciplesChapter-1 would benefit from having a table to provide a comparison of the capabilities across the major Cloud Service Providers (CSPs) – i.e., Azure, AWS, GCP.Microsoft’s choice of the acronym WAF (for Well-Architected Framework) – is unfortunate – as it could easily be confused with the more common usage (Web Application Firewall). For example, on page-18, there is an [incorrect] link to (“Azure Well-Architected Framework review - Azure Application Gateway v2” documentation) – that clearly refers to ”WAF” in the context of a Web Application Firewall (“Be aware of Application Gateway capacity changes when enabling WAF”)Chapter-2 (Page-18) – The hyperlink to Microsoft Azure WAF documentation page is incorrect, and not enabled.Chapter-2 (Page-18) – There is supposed to be a link to refer the reader to the Well-Architected Framework (WAF) main page (re: “For the complete framework…”) – but the link that is provided – is to a sub-page– referring to Application Gateway concerns – “Azure Well-Architected Framework review - Azure Application Gateway v2”.Chapter-2 (Page-23) - The section on cost optimization discussion – would be better placed near the end of the book, in a dedicated chapter for that topic.Chapter-2 (page-23) - The advice to “Whenever possible, look for cloud-native offerings to offload your workloads.” – seems incongruent with the section’s focus on cost optimization. If you don’t have significant variability in your scaleability requirements – and you have sufficient compute power in an existing data center – you may be able to more efficiently manage some CPU/memory intensive workloads – on your existing data center hardware.Chapter-2 would greatly benefit by having some illustrative worked examples of the costs for different cost variances – based on different deployment choices of some simple Data Architecture examples. Instead of saying it can vary across regions, or network ingress/egress can increase costs, or hosting in different regions can increase latencies. In particular, citing some actual examples from the barely mentioned Azure calculator, and Total Cost of Ownership (TCO) calculator.Chapter-2 (page-27) - The very brief discussion of “Using data partitioning” – would be much better if it included a discussion of the why, for each strategy mentioned.Chapter-2 (page-29) – The enumeration of the concepts of Subscriptions, Resource groups, and Management groups – is not in the same order as the hierarchy depicted in the corresponding diagram – which introduces confusion – and needless burden on the reader to mentally CORRECT what they may have thought was safe to infer from the ordering of the list. Rule #1: Make learning EASY for the readerChapter-2 (page-29) - the book still refers to the old name ("Azure Active Directory (AAD)"). It should be updated to reflect the new name ("Microsoft Entra ID") - that was announced July 11th, BEFORE the book was published.Chapter-2 (page-30) – “The architecture of the data management landing zone is quite extensive and may be hard to clearly visualize in this book” – supports my belief that this book should actually be closer to 450-650 pages in length.Chapter-2 (page-30) the link to the data management landing zone is not hyperlink enabled – and when the text is copied – it mangles the link, putting parts of the URL out of their correct order.Chapter-2 (page-31): "Services shown in color are mandatory for the landing zone, whereas services that appear in gray are optional" re: Fig 2.2. Is *very* confusing - as there doesn't appear to be any services colored gray. The only thing gray - are the layers. There appear to only be services in either black, or reddish-orange.Chapter-3 discusses different strategies for ingestion – but the decision criteria is often embedded in paragraphs - a decision-tree or decision criteria would perhaps be beneficial to help communicate the information more visually. This would be especially helpful when there are more than two possible choices discussed.Chapter-3 (page-51): The term SHIRs is introduced, and is defined as self-hosted IRs. However, nowhere in the previous pages, was IR defined as an acronym. For the benefit of the reader, the full term should be defined here as Self-Hosted Integration Runtime.Chapter-3 (page-57): The discussion on Event Hub should include a link to the “Azure Event Hubs quotas and limits”) in the Azure documentation.Chapter-6 (page-135): The reference to “The data vault method” – should provide the proper attribution to its creator: The author of the third approach to the subject of the data warehouse, known as the Data Vault, is Dan Linstedt. The Data Vault is the result of 10 years of his research efforts to ensure the consistency, flexibility and scalability of the warehouse. The first results of his research in this field are five articles on this subject, which were published in 2000. In contrary to Inmon’s view, Linstedt assumes that all available data from the entire time period should be loaded into the warehouse. This is known as the “single version of the facts” approach. As with Kimball’s star schema, with the Data Vault Linstedt introduces some additional objects to organize the data warehouse structure. These objects are referred to as the hub, satellite and link.Chapter-7 (page-144): "Figure 7.6 – Power BI Premium as a superset of AAS", the light-colored font is *much* more difficult to read.Chapter-7 should introduce the concepts of taxonomy and ontology – and provide reference to some public domain examples.Chapter-8 (page-154): The link to the pricing for Power BI is __very__ incongruent with the *complete* lack of reference to any links for other service pricing details – as well as the lack of any citation in the book to the __very important__ documentation links for service-specific Quotas and Limits.Chapter-8 itself – feels like it is VERY out-of-place, and does not feel like it belongs in an ARCHITECT book. It is written to a level of detail for a DEVELOPER, that I WISH the *PREVIOUS* 7 chapters had demonstrated.Chapter-8 begs the question – why does it delve into the development details – when none of the previous chapters have touched on such matters?Chapter-9 (pages 185-187): Discusses Azure Cognitive Services (re: Speech, Vision) – but doesn’t connect the dots to how this applies to Data Architecture. Further, the level of discussion barely goes beyond “brochure-ware” – and smells of a marketing ploy – not a chapter intent on teaching how to use the Azure AI services.Chapter-9 (189-…): Begins discussing the “Azure OpenAI Service” – and though it makes a vague reference to “some” hallucination – it DOES NOT cite the relevant OpenAI papers: GPT-4 Technical Report (27 March 2023); or the GPT-4 System Card (27 March 2023) – that latter of which, specifcally includes this explicit warning: “In particular, our usage policies prohibit the use of our models and products in the contexts of high risk government decision making (e.g, law enforcement, criminal justice, migration and asylum), or for offering legal or health advice.”Chapter-10: Does not provide any links to the relevant standards that are cited (i.e., DCAM, DAMA DMBOK)Chapter-11 (page-228): states “The only significant choice to make here is which version of the TLS protocol to choose: TLS 1.0, TLS 1.1, or TLS 1.2”. This ignores the fact that TLS 1.0 and TLS 1.1 have been deemed to be vulnerable – and TLS 1.2 should be minimally enforced. Further, this sentence should include TLS 1.3. The appropriate NIST paper for TLS should be cited for exclusion of TLS 1.0 and TLS 1.1, and the NIST recommendation/guidance for adoption of TLS 1.2, and TLS 1.3.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.