Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
The Artificial Intelligence Infrastructure Workshop
The Artificial Intelligence Infrastructure Workshop

The Artificial Intelligence Infrastructure Workshop: Build your own highly scalable and robust data storage systems that can support a variety of cutting-edge AI applications

Arrow left icon
Profile Icon Chinmay Arankalle Profile Icon Anand N.S. Profile Icon Kevin Liao Profile Icon Kunal Gera Profile Icon Bas Geerdink Profile Icon Gareth Dwyer +2 more Show less
Arrow right icon
$9.99 $35.99
Full star icon Full star icon Full star icon Full star icon Half star icon 4.3 (3 Ratings)
eBook Aug 2020 732 pages 1st Edition
eBook
$9.99 $35.99
Paperback
$48.99
Subscription
Free Trial
Renews at $19.99p/m
Arrow left icon
Profile Icon Chinmay Arankalle Profile Icon Anand N.S. Profile Icon Kevin Liao Profile Icon Kunal Gera Profile Icon Bas Geerdink Profile Icon Gareth Dwyer +2 more Show less
Arrow right icon
$9.99 $35.99
Full star icon Full star icon Full star icon Full star icon Half star icon 4.3 (3 Ratings)
eBook Aug 2020 732 pages 1st Edition
eBook
$9.99 $35.99
Paperback
$48.99
Subscription
Free Trial
Renews at $19.99p/m
eBook
$9.99 $35.99
Paperback
$48.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Table of content icon View table of contents Preview book icon Preview Book

The Artificial Intelligence Infrastructure Workshop

2. Artificial Intelligence Storage Requirements

Overview

In this chapter, you will learn how to differentiate between traditional data warehousing and modern AI-focused systems. You'll be able to describe the typical layers in an architecture that is suited for building AI systems, such as a data lake, and list the requirements for creating the storage layers for an AI system. Later, you will learn how to define the specific requirements per storage layer for a use case and identify the infrastructure as well as the software systems based on the requirements. By the end of this chapter, you'll be able to identify the requirements for data storage solutions for AI systems based on the data layers.

Introduction

In the previous chapter, we covered the fundamentals of data storage. In this chapter, we'll dive a little deeper into the architecture of Artificial Intelligence (AI) solutions, starting with the requirements that define them. This chapter will be a mixture of theoretical content and hands-on exercises, with real-life examples where AI is actively used.

Let's say you are a solution architect involved in the design of a new data lake. There are a lot of technology choices to be made that would have an impact on the people involved and on the long-term operations of the organization. It is great to have a set of requirements at the start of the project that each decision could be based on. Storing data essentially means writing data to disk or memory so that it is safe, secure, findable, and retrievable. There are many ways to store data: on-premise, in the cloud, on disk, in a database, in memory, and so on. Each way fulfills a set of requirements to a greater...

Storage Requirements

It's crucial to keep track of the requirements of your solution in all phases of the project. Since most projects follow the agile methodology, it's not an option to just define the requirements at the start of the project and then "get to work."

The agile methodology requires team members to continuously reflect on the initial plan and requirements provided in the Deming cycle, as shown in the following figure:

Figure 2.1: The Deming cycle

A list of requirements can be divided into functional and non-functional requirements. The functional requirements contain the user stories that explain how to interact with the system; these are not in the scope of this book since they are less technical and more concerned with UX design and customer journeys. The non-functional (or technical) requirements contain descriptions of the required workings of the system. The non-functional architecture requirements for an AI storage...

Data Layers

An AI system consists of multiple data storage layers that are connected with Extract, Transform, and Load (ETL) or Extract, Load, and Transform (ELT) pipelines. Each separate storage solution has its own requirements, depending on the type of data that is stored and the usage pattern. The following figure shows this concept:

Figure 2.3: Conceptual overview of the data layers in a typical AI solution

From a high-level viewpoint, the backend (and thus, the storage systems) of an AI solution is split up into three parts or layers:

  • Raw data layer: Contains copies of files from source systems. Also known as the staging area.
  • Historical data layer: The core of a data-driven system, containing an overview of data from multiple source systems that have been gathered over time. By stacking the data rather than replacing or updating old values, history is preserved and time travel (being able to make queries over a data state in the past) is...

Raw Data

The raw data layer contains the one-to-one copies of files from the source systems. The copies are stored to make sure that any data that arrives is preserved in its original form. After storing the raw data, some checks can be done to make sure that the data can be processed by the rest of the ETL pipeline, such as a checksum.

Security

We'll look at data security first. All modern software and data systems must be secure. By security requirements, we mean all aspects related to ensuring that the data in a system cannot be viewed or deleted by unauthorized people or systems. It entails identity and access management, role-based access, and data encryption.

Basic Protection

In any data project, security is a key requirement. The basic level of data protection is to require a username-password combination for anyone who can access the data: customers, developers, analysts, and so on. In all cases, the passwords should be evaluated against a strong password policy...

Historical Data

The historical data layer contains data stores that hold all data from a certain point in the past (for example, the start of the company) up until now. In most cases, this data is considered to be important to run a business, and in some cases, even vital for its existence. For example, the historical data layer of a newspaper agency contains sources, reference material, interviews, media footage, and so on, all of which were used to publish news articles. Data is stored in blobs, file shares, and relational tables, often with primary and foreign keys (enforced by the infrastructure or in software). The data can be modeled to a standard such as a data vault to preserve historical information. This data layer is responsible for keeping the truth, which means it is highly regulated and governed. Any data that is inserted into one of the tables in this layer has gone through several checks, and metadata is stored next to the actual data to keep track of the history and...

Streaming Data

The requirements for a streaming data layer are different from a batch-oriented data lake. Firstly, the time dimension plays a crucial role. Any event data that enters the message bus as a stream must be timestamped. Secondly, performance and latency are more important since it must be certain that data can be processed in due time. Thirdly, the way that analytics and machine learning are applied differs; while the data is being streamed in, the system must analyze it in near-real-time. In general, streaming data software relies more on computing power than storage space; processing speed, low latency, and high throughput are key. Nevertheless, the storage requirements that are in place for a streaming data system are worth considering and are a bit different from "static" batch-driven applications.

Security

A typical streaming datastore is separated into topics. A topic is named as such in the popular streaming data store Kafka. These can be considered...

Analytics Data

The responsibility of the analytics layer of an AI system is to make data fast and available for machine learning models, queries, and so on. This can be achieved by caching data efficiently or by virtualizing views and queries where needed to materialize these views.

Performance

The data needs to be quickly available for ad hoc queries, reports, machine learning models, and so on. Therefore, the data schema that is chosen should reflect a "schema-on-read" pattern rather than a "schema-on-write" one. When caching data, it can be very efficient to store the data in a columnar NoSQL database for fast access. This would mean the duplication of data in many cases, but that's all right since the analytics layer is not responsible for maintaining "one version of the truth." We call these caches data marts. They are usually specific for one goal, for example, retrieving the sales data of the last month.

In modern data lakes, the entire...

Model Development and Training

Data that is used for developing and training machine learning models is temporarily stored in a model development environment. The data store itself can be physical (a file share or database) or in memory. The data is a copy of one or more sources in the other data layers. Once the data has been used, it should be removed to free up space and to prevent security breaches. When developing reinforcement learning systems, it's necessary to merge this environment with the production environment; for example, by training the models directly on the data in the historical data layer.

In our example of PacktBank, the model development environment of the new data lake is used by data scientists to build and train new risk models. Whereas the old way of forecasting whether clients could afford a loan was purely based on rules, the new management wants to become more data-driven and rely on algorithms that have been trained on historical data. The historical...

Summary

In this chapter, we discussed the non-functional requirements for data storage solutions. It has become clear that a data lake, which is an evolution of a data warehouse, consists of multiple layers that have their own requirements and thus technology. We have discussed the key requirements for a raw data store where primarily flat files need to be stored in a robust way, for a historical database where temporal information is saved, and for analytics data stores where fast querying is necessary. Furthermore, we have explained the requirements for a streaming data engine and for a model development environment. In all cases, requirements management is an ongoing process in an AI project. Rather than setting all the requirements in stone at the start of the project, architects and developers should be agile, revisiting and revising the requirements after every iteration.

In the next chapter, we will connect the layers of the architecture we have explored in this chapter by...

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Understand how artificial intelligence, machine learning, and deep learning are different from one another
  • Discover the data storage requirements of different AI apps using case studies
  • Explore popular data solutions such as Hadoop Distributed File System (HDFS) and Amazon Simple Storage Service (S3)

Description

Social networking sites see an average of 350 million uploads daily - a quantity impossible for humans to scan and analyze. Only AI can do this job at the required speed, and to leverage an AI application at its full potential, you need an efficient and scalable data storage pipeline. The Artificial Intelligence Infrastructure Workshop will teach you how to build and manage one. The Artificial Intelligence Infrastructure Workshop begins taking you through some real-world applications of AI. You’ll explore the layers of a data lake and get to grips with security, scalability, and maintainability. With the help of hands-on exercises, you’ll learn how to define the requirements for AI applications in your organization. This AI book will show you how to select a database for your system and run common queries on databases such as MySQL, MongoDB, and Cassandra. You’ll also design your own AI trading system to get a feel of the pipeline-based architecture. As you learn to implement a deep Q-learning algorithm to play the CartPole game, you’ll gain hands-on experience with PyTorch. Finally, you’ll explore ways to run machine learning models in production as part of an AI application. By the end of the book, you’ll have learned how to build and deploy your own AI software at scale, using various tools, API frameworks, and serialization methods.

Who is this book for?

If you are looking to develop the data storage skills needed for machine learning and AI and want to learn AI best practices in data engineering, this workshop is for you. Experienced programmers can use this book to advance their career in AI. Familiarity with programming, along with knowledge of exploratory data analysis and reading and writing files using Python will help you to understand the key concepts covered.

What you will learn

  • Get to grips with the fundamentals of artificial intelligence
  • Understand the importance of data storage and architecture in AI applications
  • Build data storage and workflow management systems with open source tools
  • Containerize your AI applications with tools such as Docker
  • Discover commonly used data storage solutions and best practices for AI on Amazon Web Services (AWS)
  • Use the AWS CLI and AWS SDK to perform common data tasks

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Aug 17, 2020
Length: 732 pages
Edition : 1st
Language : English
ISBN-13 : 9781800206991
Category :
Languages :
Tools :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Product Details

Publication date : Aug 17, 2020
Length: 732 pages
Edition : 1st
Language : English
ISBN-13 : 9781800206991
Category :
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 141.97
The Natural Language Processing Workshop
$43.99
The Reinforcement Learning Workshop
$48.99
The Artificial Intelligence Infrastructure Workshop
$48.99
Total $ 141.97 Stars icon
Banner background image

Table of Contents

12 Chapters
1. Data Storage Fundamentals Chevron down icon Chevron up icon
2. Artificial Intelligence Storage Requirements Chevron down icon Chevron up icon
3. Data Preparation Chevron down icon Chevron up icon
4. The Ethics of AI Data Storage Chevron down icon Chevron up icon
5. Data Stores: SQL and NoSQL Databases Chevron down icon Chevron up icon
6. Big Data File Formats Chevron down icon Chevron up icon
7. Introduction to Analytics Engine (Spark) for Big Data Chevron down icon Chevron up icon
8. Data System Design Examples Chevron down icon Chevron up icon
9. Workflow Management for AI Chevron down icon Chevron up icon
10. Introduction to Data Storage on Cloud Services (AWS) Chevron down icon Chevron up icon
11. Building an Artificial Intelligence Algorithm Chevron down icon Chevron up icon
12. Productionizing Your AI Applications Chevron down icon Chevron up icon

Customer reviews

Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.3
(3 Ratings)
5 star 33.3%
4 star 66.7%
3 star 0%
2 star 0%
1 star 0%
Alexandre Henrique Nov 13, 2020
Full star icon Full star icon Full star icon Full star icon Full star icon 5
There are lots of books about many subjects covered in this title. However, after reading it I believe that this is a complete workshop that provides you with everything you need to learn about IA Infrastructure, if you are a beginner, or even if you already have some knowledge in the field.Despite the breadth, this book does not lack in-depth on the many subjects that are covered. Each session has several examples and tutorial-like explanations where you can get your hands dirty and practice. I guess this is the key point you must consider to have it at your belt. In my opinion, this is its main feature because once you get some practice you start being more confident and take more risks to learn the topic you've been reading so far.More specifically, there are several takeaways you can get from this book on data systems design, cloud storage and management, data preparation, and how to productionizing your AI applications. Because of that, I've learned a lot about these topics since I had no prior knowledge of most of them before reading this title.I strongly recommend you to read it if you wanna become a Machine Learning expert/engineer.
Amazon Verified review Amazon
Jethro Muller Sep 15, 2020
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
The book covers the key topics and concepts required to build your own production-grade AI systems. It equips you with the tools needed to deploy your very own AI model from beginning to end.It includes specific installation instructions for Windows, Linux, and Mac for the various programs and packages required by the book. It's a helpful addition that directs the readers to the appropriate installation instructions and helps alleviate confusion with additional setup steps included in the book. It also includes a clear guide on setting up your own AWS account with step-by-step visual instructions. The only downside to all of this is that it's not immediately obvious why you should install any of these packages as there is no real context given outside of what you'd use them for.Each chapter has a clear outline of what is covered and what readers should take away from it. This is useful if only some topics appeal to you and you'd rather skip others or you are already familiar with what a chapter is covering.The book contains clear code examples with further code/resources online making it easy to interact with. It's definitely aimed at readers with a good coding background as is mentioned in the preface (the code isn't complicated, just not overly explained as in some books aimed at beginners).There are challenge questions (answers are provided) to push the reader to participate actively instead of just reading through the book without engaging with it directly.The book covers the main ethical concerns when developing production-ready AI models. Concerns around various baked-in biases in models often introduced unknowingly are very important to learn about. It covers ethical issues that are important considerations when developing AI software including case studies of things gone wrong in real-life AI applications.Overall, it's a good practical introduction to AI systems at scale and in production with useful examples to build modern AI applications. It covers considerations from data integrity, storage, processing, retrieval, and model training as well as system design.This book is worth looking at if you have experience coding and want to make use of AI in industry using contemporary tools at scale.
Amazon Verified review Amazon
Aishwarya Srinivasan Feb 09, 2021
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
A really detailed and comprehensive book.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.