Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Cloud Scale Analytics with Azure Data Services

You're reading from   Cloud Scale Analytics with Azure Data Services Build modern data warehouses on Microsoft Azure

Arrow left icon
Product type Paperback
Published in Jul 2021
Publisher Packt
ISBN-13 9781800562936
Length 520 pages
Edition 1st Edition
Tools
Arrow right icon
Author (1):
Arrow left icon
Patrik Borosch Patrik Borosch
Author Profile Icon Patrik Borosch
Patrik Borosch
Arrow right icon
View More author details
Toc

Table of Contents (20) Chapters Close

Preface 1. Section 1: Data Warehousing and Considerations Regarding Cloud Computing
2. Chapter 1: Balancing the Benefits of Data Lakes Over Data Warehouses FREE CHAPTER 3. Chapter 2: Connecting Requirements and Technology 4. Section 2: The Storage Layer
5. Chapter 3: Understanding the Data Lake Storage Layer 6. Chapter 4: Understanding Synapse SQL Pools and SQL Options 7. Section 3: Cloud-Scale Data Integration and Data Transformation
8. Chapter 5: Integrating Data into Your Modern Data Warehouse 9. Chapter 6: Using Synapse Spark Pools 10. Chapter 7: Using Databricks Spark Clusters 11. Chapter 8: Streaming Data into Your MDWH 12. Chapter 9: Integrating Azure Cognitive Services and Machine Learning 13. Chapter 10: Loading the Presentation Layer 14. Section 4: Data Presentation, Dashboarding, and Distribution
15. Chapter 11: Developing and Maintaining the Presentation Layer 16. Chapter 12: Distributing Data 17. Chapter 13: Introducing Industry Data Models 18. Chapter 14: Establishing Data Governance 19. Other Books You May Enjoy

Exploring the benefits of AI and ML

Companies start building AI projects and massively rely on those statistical functions and the math behind them to predict customer churn, recognize images, detect fraud, mine knowledge, and so much more. ML projects often begin as part of a big data or Data Warehouse project, but this can also be the other way around; that is, the start of an AI 
or machine learning project often leads to the development of an analytical system.

As my data scientist colleagues tell me, if you want to be able to really predict an event based on incoming data, a machine learning model needs to be trained on quite a large amount of data. The more data you can bring to train a model, the better and more accurate the model will be in the end.

A wonderful experiment to do with image recognition, for example, can be done at www.customvision.ai. You can start by examining one of the example projects there. I like the "Chocolate or Dalmatian" example.

This is a nice experiment that did not need too much of input to enable the image recognizer to distinguish between Stracciatella Chocolate ice cream and Dalmatian dogs. When you try to teach the system on different images and circumstances, you might find out that you need far more training images than six per group.

Understanding ML challenges

I have experimented with the service and uploaded images of people in an emergency versus images of people relaxing or doing Yoga or similar. I used around 50 – 60 images for each group and still didn't reach a really satisfying accuracy (74%).

With this experiment, I even created a model with a bias that I first didn't understand myself. There were too many "emergency" cases being interpreted incorrectly as "All good" cases. By discussing this with my data scientist colleagues and examining the training set, we found out why.

There were too many pictures in the "All good" training set that showed people on grass or in nature, with lots of green around them. This, in turn, led the system to interpret green as a signifier of "All good," no matter how big the emergency obviously was. Someone with their leg at a strange angle and a broken arm, in a meadow? The model would interpret it as "All good."

In this sandbox environment, I did no harm at all. But imagine a case where a system is used to help detect emergency situations, and hopefully kickstart an alarm those vital seconds earlier. This is only a very basic example of how the right tool in the wrong hands might cause a lot of damage.

There are so many different cases where machine learning can help increase the accuracy of processes, increase their speed, or save them money because of the right predictions – such as predicting when and why a machine will fail before it actually fails, helping to mine information from massive data, and more.

Sorting ML into the Modern Data Warehouse

How does this relate to the Modern Data Warehouse? As we stated previously, the Modern Data Warehouse does not only offer scalable, fast, and secure storage components. It also offers at least one (and, in the context of this book, six) compute component(s) that can interact with the storage services and can be used to create and run machine learning models at scale. The "run" can be implemented in batch mode, near-real time or even in real time, depending on the streaming engine used. That Modern Data Warehouse can then store the results of ML calculations into a suitable presentation layer to provide this data to the downstream consumers, who will process the data further, visualize it, draw their insights from it, and take action. The system can close the loop using the enterprise service bus and the integration services available to feed back insights as parameters for the surrounding systems.

Understanding responsible ML/AI

A responsible data scientist will need tools that support them conducting this work properly. The buzzword of the moment in that area is machine learning operations (MLOps). One of the most important steps of creating a responsible AI is having complete traceability/auditability of the source data, and the versions of the datasets used to train and retrain a certain model at a certain timestamp. With this information, the results of the model can also be audited and interpreted. This information is vital when it comes to audits and traceability regarding legal questions, for instance. The collaborative aspects of an MLOps-driven environment are another important factor.

Note

We can find a definition of Responsible AI, following the principles of Fairness, Reliability and Safety, Privacy and Security, Inclusiveness, and Transparency and Accountability, at https://www.microsoft.com/en-us/ai/responsible-ai.

An MLOps environment embedded in the Modern Data Warehouse will be another puzzle part of the bigger picture and helps integrate those principles into the analytical estate of any company. With the tight interconnectivity between data services, storage, compute components, streaming services, IoT technology, ML and AI, and visualization tools, the world of analytics today offers a wide range of possibilities at a far lower cost than ever before. The ease of use of these services and their corresponding productivity is constantly growing.

You have been reading a chapter from
Cloud Scale Analytics with Azure Data Services
Published in: Jul 2021
Publisher: Packt
ISBN-13: 9781800562936
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime