You're reading from Artificial Intelligence with Python Your complete guide to building intelligent apps using Python 3.x

Product type Paperback

Published in Jan 2020

Publisher Packt

ISBN-13 9781839219535

Length 618 pages

Edition 2nd Edition

Languages

Python

Tools

TensorFlow

Concepts

Artificial Intelligence

Authors (2):

Prateek Joshi

Alberto Artasanchez

View More author details

Table of Contents (26) Chapters

Preface

1. Introduction to Artificial Intelligence

2. Fundamental Use Cases for Artificial Intelligence FREE CHAPTER

3. Machine Learning Pipelines

4. Feature Selection and Feature Engineering

5. Classification and Regression Using Supervised Learning

6. Predictive Analytics with Ensemble Learning

7. Detecting Patterns with Unsupervised Learning

8. Building Recommender Systems

9. Logic Programming

10. Heuristic Search Techniques

11. Genetic Algorithms and Genetic Programming

12. Artificial Intelligence on the Cloud

13. Building Games with Artificial Intelligence

14. Building a Speech Recognizer

15. Natural Language Processing

16. Chatbots

17. Sequential Data and Time Series Analysis

18. Image Recognition

19. Neural Networks

20. Deep Learning with Convolutional Neural Networks

21. Recurrent Neural Networks and Other Deep Learning Models

22. Creating Intelligent Agents with Reinforcement Learning

23. Artificial Intelligence and Big Data

24. Other Books You May Enjoy

25. Index

Data cleansing and transformation

Just as gas powers a car, data is the lifeblood of AI. The age-old adage of "garbage in, garbage out" remains painfully true. For this reason, having clean and accurate data is paramount to producing consistent, reproducible, and accurate AI models. Some of this data cleansing has required painstaking human involvement. By some measures, it is said that a data scientist spends about 80% of their time cleaning, preparing, and transforming their input data and 20% of the time running and optimizing their models. Examples of this are the ImageNet and MS-COCO image datasets. Both contain over a million labeled images of various objects and categories. These datasets are used to train models that can distinguish between different categories and object types. Initially, these datasets were painstakingly and patiently labeled by humans. As these systems become more prevalent, we can use AI to perform the labeling. Furthermore, there is a plethora of AI-enabled tools that help with the cleansing and deduplication process.

One good example is Amazon Lake Formation. In August 2019, Amazon made its service Lake Formation generally available. Amazon Lake Formation automates some of the steps typically involved in the creation of a data lake including the collection, cleansing, deduplication, cataloging, and publication of data. The data then can be made available for analytics and to build machine models. To use Lake Formation, a user can bring data into the lake from a range of sources using predefined templates. They can then define policies that govern data access depending on the level of access that groups across the organization require.

Some automatic preparation, cleansing, and classification that the data undergoes uses machine learning to automatically perform these tasks.

Lake Formation also provides a centralized dashboard where administrators can manage and monitor data access policies, governance, and auditing across multiple analytics engines. Users can also search for datasets in the resulting catalog. As the tool evolves in the next few months and years, it will facilitate the analysis of data using their favorite analytics and machine learning services, including: