Azure Data and AI Architect Handbook

Introduction to Data Architectures

With data quickly becoming an essential asset of any business, the need for cloud data architects has never been higher. The key role these professionals fulfill is to provide the technical blueprints of any cloud data project and expertise on data architectures as a whole. A skilled data architect is proficient in many steps of the end-to-end data processes, such as data ingestion, data warehouses, data transformations, and visualization.

It is of utmost importance that data architects are familiar with the benefits and drawbacks of individual resources as well as platform-wide design patterns. Typically, aspiring data architects have a background as business intelligence (BI) developers, data engineers, or data scientists. They are often specialized in one or more tools but lack experience in architecting solutions according to best practices.

Compared to a developer profile, an architect is more focused on the long term and the bigger picture. The architect must keep in mind the overarching business strategy and prioritize certain aspects of the architecture accordingly. To equip you with the necessary skills to do so, you will be introduced to methods of getting business value from your data, to solidify any long-term data strategy.

This chapter will also introduce you to a wide-purpose referential data architecture. This architecture will be used as a guideline throughout this entire book and will become more and more defined as the chapters go on.

Finally, on-premises data architectures nowadays face a variety of challenges. You will explore these challenges and look at how a business can benefit from either a cloud or a hybrid cloud solution.

In this chapter, we’re going to cover the following main topics:

Understanding the value of data
A data architecture reference diagram
Challenges of on-premises architectures

Understanding the value of data

Data generation is growing at an exponential rate. 90 percent of data in the world was generated in the last 2 years, and global data creation is expected to reach 181 zettabytes in 2022.

Just to put this number in perspective, 1 zettabyte is equal to 1 million petabytes. This scale requires data architects to deal with the complexity of big data, but it also introduces an opportunity. The expert data analyst, Doug Laney, defines big data with the popular three Vs framework: Volume, Variety, and Velocity. In this section, we would like to explore a fourth one called Value.

Types of analytics

Data empowers businesses to look back into the past, giving insights into established and emerging patterns, and making informed decisions for the future. Gartner splits analytical solutions that support decision-making into four categories: descriptive, diagnostic, predictive, and prescriptive analytics. Each category is potentially more complex to analyze but can also add more value to your business.

Let’s go through each of these categories next:

Descriptive analytics is concerned with answering the question, “What is happening in my business?” It describes the past and current state of the business by creating static reports on top of data. The data used to answer this question is often modeled in a data warehouse, which models historical data in dimension and fact tables for reporting purposes.
Diagnostic analytics tries to answer the question, “Why is it happening?” It drills down into the historical data with interactive reports and diagnoses the root cause. Interactive reports are still built on top of a data warehouse, but additional data may be added to support this type of analysis. A broader view of your data estate allows for more root causes to be found.
Predictive analytics learns from historical trends and patterns to make predictions for the future. It deals with answering the question, “What will happen in the future?” This is where machine learning (ML) and artificial intelligence (AI) come into play, drawing data from the data warehouse or raw data sources to learn from.
Prescriptive analytics answers the question, “What should I do?” and prescribes the next best action. When we know what will happen in the future, we can act on it. This can be done by using different ML methods such as recommendation systems or explainable AI. Recommendation systems recommend the next best product to customers based on similar products or what similar customers bought. Think, for instance, about Netflix recommending new series or movies you might like. Explainable AI will identify which factors were most important to output a certain prediction, which allows you to act on those factors to change the predicted outcome.

The following diagram shows the value-extracting process, going from data to analytics, decisions, and actions:

Figure 1.1 – Extracting value from data

Just as with humans, ML models need to learn from their mistakes, which can be done with the help of a feedback loop. A feedback loop allows a teacher to correct the outcomes of the ML model and add them as training labels for the next learning cycle. Learning cycles allow the ML model to improve over time and combat data drift. Data drift occurs when the data on which the model was trained isn’t representative anymore of the data the model predicts. This will lead to inaccurate predictions.

As ML models improve over time, it is best practice to have human confirmation of predictions before automating the decision-making process. Even when an ML model has matured, we can’t rely on the model being right 100 percent of the time. This is why ML models often work with confidence scores, stating how confident they are in the prediction. If the confidence score is below a certain threshold, human intervention is required.

To get continuous value out of data, it is necessary to build a data roadmap and strategy. A complexity-value matrix is a mapping tool to help prioritize which data projects need to be addressed first. This matrix will be described more in detail in the following section.

A complexity-value matrix

A complexity-value matrix has four quadrants to plot future data projects on. These go from high- to low-value and low- to high-complexity. Projects that are considered high-value and have a low complexity are called “quick wins” or “low-hanging fruit” and should be prioritized first. These are often Software-as-a-Service (SaaS) applications or third-party APIs that can quickly be integrated into your data platform to get immediate value. Data projects with high complexity and low value should not be pursued as they have a low Return on Investment (ROI). In general, the more difficult our analytical questions become, the more complex the projects may be, but also, the more value we may get out of it.

A visualization of the four quadrants of the matrix can be seen as follows:

Figure 1.2 – The four quadrants of a complexity-value matrix

Often, we think of the direct value data projects bring but do also consider the indirect value. Data engineering projects often do not have a direct value as they move data from one system to another, but this may indirectly open up a world of new opportunities.

To extract value from data, a solid data architecture needs to be in place. In the following section, we’ll define an abstract data architecture diagram that will be referenced throughout this book to explain data architecture principles.

A data architecture reference diagram

The reference architecture diagram that is abstractly defined for now in Figure 1.3 shows the typical structure of an end-to-end data platform in a (hybrid) cloud:

Figure 1.3 – A typical structure of an end-to-end data platform in a (hybrid) cloud

This reference diagram shows the key components of most modern cloud data platforms. There are limitless possible adaptations, such as accommodating streaming data, but the diagram in Figure 1.3 serves as the basis for more advanced data architectures. It’s like the Pizza Margherita of data architectures! The architecture diagram in Figure 1.3 already shows four distinct layers in the end-to-end architecture, as follows:

The ingestion layer
The storage layer
The serving layer
The consumption layer

Next to these layers, there are a couple of other key aspects of the data platform that span across multiple layers, as follows:

Data orchestration and processing
Advanced analytics
Data governance and compliance
Security
Monitoring

Let’s cover the first layer next.

The ingestion layer

The ingestion layer serves as the data entrance to the cloud environment. Here, data from various sources is pulled into the cloud. These sources include on-premises databases, SaaS applications, other cloud environments, Internet of Things (IoT) devices, and many more. Let’s look at this layer in more detail:

First, the number of data sources can vary greatly between businesses and could already bring a variety of challenges to overcome. In enterprise-scale organizations, when the amount of data sources can reach extraordinary levels, it is of exceptional importance to maintain a clear overview and management of these sources.
Secondly, the sheer variety of sources is another common issue to deal with. Different data sources can have distinct methods of ingesting data into the cloud and, in some cases, require architectural changes to accommodate.
Thirdly, managing authentication for data sources can be cumbersome. Authentication, which happens in a multitude of ways, is often unique to the data source. Every source requires its own tokens, keys, or other types of credentials that must be managed and seamlessly refreshed to optimize security.

From a design perspective, there are a few other aspects to keep in mind. The architect should consider the following:

Data speed: Will incoming data from the source be ingested periodically (that is, batch ingestion) or continuously (that is, data streaming)?
Level of the structure of the data: Will the incoming data be unstructured, semi-structured, or structured?

Regarding data speed, data will be ingested in batches in the vast majority of cases. This translates to periodical requests made to an application programming interface (API) to pull data from the data source. For the more uncommon cases of streaming data, architectural changes are required to provide an environment to store and process the continuous flow of data. In later chapters, you will discover how the platform architecture will differ to accommodate the streaming data.

Finally, the level of structure of the data will determine the amount of required data transformations, the methods of storing the data, or the destination of data movements. Unstructured data, such as images and audio files, will require different processing compared to semi-structured key-value pairs or structured tabular files.

(Add what data ingestion services will be discussed later in the book).

The storage layer

The definitions of the following layers can vary. Over the course of this book, the storage layer refers to the central (often large-scale) storage of data. Data lakes are the most common method for massive storage of data, due to their capacity and relatively low cost. Alternatives are graph-based databases, relational databases, NoSQL databases, flat file-based databases, and so on. The data warehouse, which holds business-ready data and is optimized for querying and analytics, does not belong to the storage layer but will fall under the serving layer instead.

Decisions made by the architect in the storage layer can have a great effect on costs, performance, and the data platform in its entirety. Here, the architect will have to consider redundancy, access tiers, and security. In the case of a data lake, a tier system needs to be considered for raw, curated, and enriched data, as well as a robust and scalable folder structure.

(Add what data storage services will be discussed later in the book).

The serving layer

In the serving layer, preprocessed and cleansed data is stored in a data warehouse, often regarded as the flagship of the data platform. This is a type of structured storage that is optimized for large-scale queries and analytics. The data warehouse forms one of the core components of BI.

The major difference between a data warehouse and the aforementioned data lake is the level of structure. A data warehouse is defined by schemas and enforces data types and structures. Conversely, a data lake can be seen as a massive dump of all kinds of data, with little to no regard for the enforcement of specific rules. The strong level of enforcement makes a data warehouse significantly more homogeneous, which results in far better performance for analytics.

The cloud data architect has various decisions to make in the serving layer. There are quite a few options for data warehousing on the Azure cloud, as follows:

First, the architect should think about whether they want an Infrastructure-as-a-Service (IaaS), a Platform-as-a-Service (PaaS), or a SaaS solution. In short, this results in a trade-off between management responsibilities, development efforts, and flexibility. This will be discussed more in later chapters.
Next, different services on Azure come with their own advantages and disadvantages. The architect could, for example, opt for a very cost-effective serverless SQL solution or leverage massive processing power in highly performant dedicated SQL pools, among numerous other options.

After deciding on the most fitting service, there are still decisions to be made within the data warehouse. The architect will have to determine structures to organize the data in the data warehouse, also known as schemas. Common schemas are star and snowflake schemas, which also come with their own benefits and drawbacks.

Chapter 6, Data Warehousing, will teach you all the necessary skills to confidently decide on the right solution. Chapter 7, The Semantic Layer, will introduce you to the concept of data marts, subsets of a data warehouse ready for business consumption.

The consumption layer

The consumption layer is the final layer of an end-to-end data architecture and typically follows the serving layer by extracting data from the data warehouse. There are numerous ways of consuming the data, which has been prepared and centralized in earlier stages.

The most common manner of consumption is through data visualization. This can happen through dashboarding and building reports. The combination of a data warehouse and a visualization service is often referred to as BI. Many modern dashboarding tools allow for interactivity and drill-down functionality within the dashboard itself. Although technically it is not a part of the Azure stack, Power BI is the preferred service for data visualization for Azure data platforms. However, Microsoft allows other visualization services to connect conveniently as well.

Another way to consume data is by making the data available to other applications or platforms using APIs.

Chapter 8, Visualizing Data Using Power BI, will teach you how to extract data from the data warehouse in various ways and visualize it using interactive dashboarding. In this chapter, you will also discover methods to perform self-service BI, allowing end users to create their own ad hoc dashboards and reports to quickly perform data analysis.

Data orchestration and processing

Contrary to the four layers mentioned previously, there are a couple of other core components of the data platform that span across the entire end-to-end process.

Data orchestration refers to moving data from one place to another, often using data pipelines. This process is often done by data engineers. When data is moved from one stage to the next, data undergoes transformations in the form of joining data, deriving new columns, computing aggregations, and so on. For example, when data is moved from a data lake to a data warehouse, it must be transformed to match the data model, which is enforced by the data warehouse. Another example is when moving data between tiers (raw, curated, and enriched tiers) in the data lake, where the data becomes more and more ready for business use whenever it moves up a tier.

Data pipelines allow data engineers to automate and scale the orchestration and processing of data. These components are critical to the performance and health of the data platform and must be monitored accordingly.

Here are two common methods of performing orchestration and processing:

Extract-Transform-Load (ETL)
Extract-Load-Transform (ELT)

In both cases, data is extracted from a source and loaded to a destination. The main difference between both methods is the location where the transformations take place. These will be further discussed in Chapter 4, Transforming Data on Azure. This chapter will also teach you how to create and monitor data pipelines according to best practices.

Advanced analytics

For analyses that may be too complex to perform in the serving layer, an analytics suite or data science environment can be added to the architecture to perform advanced analytics and unlock ML capabilities. This component can often be added in a later stage of platform development, as it will mostly not influence the core working of the other layers. A data platform in an early phase of development can perfectly exist without this component.

One option for the advanced analytics suite is an ML workspace where data scientists can preprocess data, perform feature engineering, and train and deploy ML models. The latter may require additional components such as a container registry for storing and managing model deployments. The Azure Machine Learning workspace allows users to create and run ML pipelines to scale their data science processes. It also enables citizen data scientists to train models using no-code and low-code features.

Apart from an environment for data scientists and ML engineers to build and deploy custom models, the Azure cloud also provides users with a wide array of pre-trained ML models. Azure Cognitive Services encompass many models for computer vision (CV), speech recognition, text analytics, search capabilities, and so on. These models are available through ready-to-use API endpoints. They often involve niche cases but, when used correctly, bring a lot of value to the solution and are exceptionally fast to implement.

Chapter 9, Advanced Analytics Using AI, will go deeper into end-to-end ML workflows, such as the connection to data storages, performing preprocessing, model training, and model deployments. This chapter will also introduce the concepts of ML operations, often referred to as MLOps. This encompasses continuous integration and continuous development (CI/CD) for ML workflows.

Data governance and compliance

The more a data platform scales, the harder it becomes to maintain a clear overview of existing data sources, data assets, transformations, data access control, and compliance. To avoid a build-up of technical backlog, it is strongly recommended to start the setup of governance and compliance processes from an early stage of development and have it scale with the platform.

To govern Azure data platforms, Microsoft developed Microsoft Purview, formerly known as Azure Purview. This tool, which is covered in Chapter 10, Enterprise-Level Data Governance and Compliance, allows users to gain clear insights into the governance and compliance of the platform. Therefore, it is essential to the skill set of any aspiring Azure data architect. In this chapter, you will learn how to do the following:

Create a data map by performing scans on data assets
Construct a data catalog to provide an overview of the metadata of data assets
Build a business glossary to establish clear definitions of possibly ambiguous business terms
Gain executive insights on the entire data estate

Security

With the growing rise of harmful cyber-attacks, security is another indispensable component of a data platform. Improper security or configurations may lead to tremendous costs for the business. Investing in robust security to prevent attacks from happening will typically be vastly cheaper than dealing with the damage afterward.

Cybersecurity can be very complex and therefore should be configured and managed using the help of a cybersecurity architect. However, certain aspects of security should fall into the responsibilities of the data architect as well. The data architect should have the appropriate skill set to establish data security. Examples are working with row- or column-level security, data encryption at rest and in transit, masking sensitive data, and so on.

Chapter 11, Introduction to Data Security, will teach you all that is necessary to ensure data is always well protected and access is always limited to a minimum.

Monitoring

Disruptions such as failing data pipelines, breaking transformations, and unhealthy deployments can shut down the workings of an entire data platform. To limit the downtime to an absolute minimum, these processes and deployments should be monitored continuously.

Azure provides monitoring and health reports on pipeline runs, Spark and SQL jobs, ML model deployments, data asset scans, and more. The monitoring of these resources will be further discussed in their own respective chapters.

Challenges of on-premises architectures

Cloud computing has seen a steep rise in adoption during the last decade. Nevertheless, a significant chunk of businesses hold on to keeping their servers and data on-premises. There are certain reasons why a business may prefer on-premises over the cloud. Some businesses have the perception of increased security when keeping data on their own servers. Others, generally smaller businesses, may not feel the need to optimize their IT landscape or simply are not keen on change. Organizations in strictly regulated industries can be bound to on-premises for compliance. Whichever the reason, on-premises architectures nowadays come with certain challenges.

These challenges include, among other things, the following:

Scalability
Cost optimization
Agility
Flexibility

Let’s go through these challenges in detail.

Scalability

Organizations with a rapidly enlarging technological landscape will struggle the most to overcome the challenge of scalability. As the total business data volume keeps growing continually, an organization faces the constant need of having to find new ways to expand the on-premises server farm. It is not always as simple as just adding extra servers. After a while, extra building infrastructure is needed, new personnel must be hired, energy consumption soars, and so on.

Here, the benefit of cloud computing is the enormous pool of available servers and computing resources. For the business, this means it can provision any additional capacity without having to worry about the intricate organization and planning of its own servers.

Cost optimization

Businesses that completely rely on on-premises servers are never fully cost-effective. Why is this so?

Let’s take a look at two scenarios:

When usage increases: When the usage increases, the need for extra capacity arises. A business is not going to wait until its servers are used to their limits, risking heavy throttling and bottleneck issues, before starting to expand its capacity. Although the risk of full saturation of its servers is hereby avoided, the computing and storage capacity is never fully made use of. While usage can grow linearly or exponentially, costs will rise in discrete increments, referring to distinct expansions of server capacity.
When usage decreases: When the usage decreases, the additional capacity is simply standing there, unused. Even if the decrease in usage lasts for longer periods of time, it is not that simple to just sell the hardware, free up the physical space, and get rid of the extra maintenance personnel. In most situations, this results in costs remaining unchanged despite the usage.

Cloud computing usually follows a pay-as-you-go (PAYG) business model. This solves the two challenges of cost optimization during variable usage. PAYG allows businesses to match their costs to their usage, avoiding disparities, as can be seen in the following diagram:

Figure 1.4 – Cost patterns depending on usage for on-premises and cloud infrastructure

Let’s cover the next challenge now.

Agility

In contrast to whether it is possible to make a certain change, agility refers to the speed at which businesses can implement these new changes. Expanding or reducing capacity, changing the types of processing power, and so on takes time in an on-premises environment. In most cases, this involves the acquisition of new hardware, installing the new compute, and configuring security, all of which can be extremely time-consuming in a business context.

Here, cloud architectures benefit from far superior agility over on-premises architectures. Scaling capacity up or down, changing memory-optimized processors for compute-optimized processors: all of this is performed in a matter of seconds or minutes.

Flexibility

The challenge of flexibility can be interpreted very broadly and has some intersections with the other challenges. Difficulties with scalability and agility can be defined as types of flexibility issues.

Apart from difficulties regarding scalability and agility, on-premises servers face the issue of constant hardware modernization. In this case, we could compare on-premises and cloud infrastructure to a purchased car or a rental car respectively. There is not always the need to make use of cutting-edge technology, but if the need is present, think about which option will result in having a more modern car in most situations.

In other cases, specialized hardware such as field-programmable gate arrays (FPGAs) might be required for a short period of time—for example, during the training of an extraordinarily complex ML model. To revisit the car example, would you rather purchase a van when you occasionally have to move furniture or rent a van for a day while moving?

Let’s summarize the chapter next.

Summary

In this chapter, we first discussed how to extract value from your data by asking the right analytical questions. Questions may increase in complexity from descriptive, diagnostic, and predictive to prescriptive but may also hold more value. A complexity-value matrix is necessary to prioritize data projects and build a data roadmap. A crucial thing to remember is to capture data as soon as possible, even if you don’t have a data strategy or roadmap yet. All data that you do not capture now cannot be used in the future to extract value from. Next, we introduced a reference architecture diagram. Over time, you will get familiar with every component of the diagram and how they interact with each other.

Four layers of cloud architectures were explained. The ingestion layer is used to pull data into the central cloud data platform. The storage layer is capable of holding massive amounts of data, often in a tiered system, where data gets more business-ready as it moves through the tiers. In the serving layer, the data warehouse is located, which holds data with a strictly enforced schema and is optimized for analytical workloads. Lastly, the consumption layer allows end users and external systems to consume the data in reports and dashboards or to be used in other applications.

Some components of the data platform span across multiple layers. Data orchestration and processing refers to data pipelines that ingest data into the cloud, move data from one place to another, and orchestrate data transformations. Advanced analytics leverages Azure’s many pre-trained ML models and a data science environment to perform complex calculations and provide meaningful predictions. Data governance tools bring data asset compliance, flexible access control, data lineage, and overall insights into the entire data estate. Impeccable security of individual components as well as the integrations between them takes away many of the worries regarding harmful actions being made by third parties. Finally, the extensive monitoring capabilities in Azure allow us to get insights into the health and performance of the processes and data storage in the platform.

Finally, we discussed the drawbacks that on-premises architectures face, such as scalability, cost optimization, agility, and flexibility. These challenges are often conveniently dealt with by leveraging the benefits of cloud-based approaches.

In the next chapter, we will look at two Microsoft frameworks that ease the move to the cloud.

Filter reviews by

All

Amazon verified reviews

Tanya Silva Aug 08, 2023

I have received this book for review purposes from the publisher.Review ( 8/8/23): I have spent the last few weeks reading Azure Data and AI Architect Handbook, wishing something like this existed many years ago when I was starting in the field of AI/ML, Data Engineering, Data Services, Data Security, Data Science, Data Governance, and other topics which with the rise of the interest in GenAI are coming into the light as well. I highly recommend this book to a new college graduate just starting their career in this fascinating space as well as to a seasoned professional who might have worked with Microsoft offerings for many years and would like to understand the latest technology development.The book covers Introduction to Data Architecture, Data Engineering, Data Warehousing and Analytics, and Data Security, Governance, and Compliance.In the Data Architecture section, topics such as reference diagrams with the various layers are covered, as well as data orchestration, processing, security, monitoring, etc. Once the reader understands the data architecture, the authors provide insight into the challenges with on-prem solutions and what is needed to prepare for Cloud Adoption.The Data Engineering section covers data ingestion, transformations, and storage. Each topic in itself is a vast area requiring deep subject matter expertise. However, for an AI architect, I suggest having at least a high-level understanding of how this area is done, what the challenges are, what the bottlenecks are, and what architectural decisions might be made at this step that might impact AI products down the line.The Data Warehousing and Analytics section covers DW ( the speed of your visualizations/downstream apps will be extremely dependent on this step). The semantic layer ( okay, I admit, I am biased when I have to do data analysis. I like it in a multidimensional format, but over the years, I have found it is difficult for humans to grasp high-level dimensionality). Power BI - even if you do not use Power BI, I highly recommend installing a free desktop version to play with it - you might be pleasantly surprised. The last topic in this section covers Advanced Analytics using AI - Azure Cognitive Services, Azure OpenAI, LLMs, and such. Depending on when you are reading this review, the digital version of the book may be modified to include new developments in this space.The last part covers Data Security, Governance, and Compliance - this topic sometimes gets on the backburner when the solutions are being architectured, but I suggest including this in the first rounds of your product development roadmap. I forecast a lot of development in legal, security, and privacy space coming up in the AI field.Overall, I highly recommend AI Architects have this book in their library as a reference so that they can envision end-to-end solutions on Azure and be aware of all pieces of the puzzle.

Amazon Verified review

Morph360Tech Jan 06, 2024

Data Architecture, Cloud Adoption and how you can utilise the tools that Microsoft provide for Data Ingestion, Transformation and Consumption are clearly defined in the book which I will explore below.Ingestion and batch or streaming is explained in a simple manner most not technical would understand and the ingestion architecture are explored and described with Event Hubs and even the IoT hub.Transforming Data data flows, data lakes and pipelines along with the Bronze to silver and silver to gold transformations all within Azure giving people the heads up on which direction they wish to go forward with for data CI/CD on Azure.Storing Data for Consumption is explained with the significance of the data types (Structured, Semi-Structured ad Unstructured data)Data Warehousing and Analytics I loved the reference to good old normalisation and how these feeds into Data Marts and what they are I loved the Design methods and SCDs then into building a data warehouse in the cloud using Azure SQL, Synapse serverless SQL Pools or dedicated pools all sound complicated but they do explain it in simple terms that most people would get and understand.Visualisation and Power BI with starting out getting your data and the star schema to enriching it with DAX then one the Low code or Code first explored with some reference to Cognitive Services and OpenAI nicely explained and thought-provoking activities.Data Governance and Compliance/Data Security from RBAC to Threat Protection which is great to see alongside everything here rather than it being a side panel or something to look at later.Overall I am very impressed and you always find something new in the world of IT and especially in the world of data and analytics.

Steven Fernandes Sep 10, 2023

This book is a goldmine for anyone looking to build or optimize data solutions on Azure. It walks you through designing scalable, cost-effective cloud architectures, and offers best practices for data storage, ETL processes, and visualization. The inclusion of real-world use cases and advanced topics like OpenAI and custom ML models makes it an indispensable resource. Highly recommended for data professionals at all levels!

S.Kundu Aug 23, 2023

The Book will start with explaining different Data Architectures and go through the different layers along with challenges of on-premises architectures. Then it will slowly move into details of different Batch and Streaming ingestion architectures.It explains how you can transform your data using different options such as mapping data flows, Spark notebooks, SQL scripts, SSIS, Azure Stream Analytics and Azure Databricks.It will teach how to schedule and monitor your data pipelines on Azure and also will help you understand how to deploy using CI/CD. Then it will deep dive into different Data Warehousing concepts.You will also learn concepts about Azure Cognitive Services, Azure OpenAI Service and Azure Machine Learning and MLOps.The book will also cover different options of implementing security through access controls and authentication mechanisms along with how to use Microsoft Purview for data governance.

Rohan Desai Aug 16, 2023

The book "Azure Data and AI architecture Handbook" by Oliver and Breght is an insightful guide that skillfully navigates the intricate intersection of data architecture and artificial intelligence. It offers a comprehensive exploration of the vital role data architecture plays in the realm of AI, making it a must-read for both beginners and seasoned professionals in the field.The book delves into the foundational concepts of data architecture, gradually intertwining them with cutting-edge AI principles. The author's ability to explain complex concepts in an accessible manner is commendable, making it a suitable resource for readers with varying levels of expertise. From discussing data modeling techniques to elucidating the intricacies of neural networks and machine learning algorithms, the book covers a wide spectrum of topics.This book starts with a glimpse of data architecture and preparation for cloud adoption. Further, this book focuses more on data massaging concepts with Azures' perspective.This book provides an in-depth process of transforming data using multiple methods like sql scripts, ssis, and spark notebooks, with a detailed outline of batch and streaming ingestion.It also explains data warehousing concepts from scratch with an explanation of SCD concepts and its implementation.This book also covers the data security, governance, and compliance topics that would enlighten you on enterprise level data governance, its importance, data protection, access control, and threat protection.At the academic or professional level, this book servers a good source of learning for someone who would start their Azure journey as a data modeler or data engineer.

Azure Data and AI Architect Handbook: Adopt a structured approach to designing data and AI solutions at scale on Microsoft Azure

What do you get with Print?

Azure Data and AI Architect Handbook

Introduction to Data Architectures

Understanding the value of data

Types of analytics

A complexity-value matrix

A data architecture reference diagram

The ingestion layer

The storage layer

The serving layer

The consumption layer

Data orchestration and processing

Advanced analytics

Data governance and compliance

Security

Monitoring

Challenges of on-premises architectures

Scalability

Cost optimization

Agility

Flexibility

Summary

Page 1 of 5

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with Print?

Product Details

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

Filter reviews by

People who bought this also bought

About the authors

FAQs

Azure Data and AI Architect Handbook: Adopt a structured approach to designing data and AI solutions at scale on Microsoft Azure

What do you get with Print?

Contact Details

Shipping Address

Billing Address

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with Print?

Contact Details

Shipping Address

Billing Address

Product Details

Packt Subscriptions

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

Filter reviews by

People who bought this also bought

About the authors

FAQs