Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases now! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Artificial Intelligence with Python

You're reading from   Artificial Intelligence with Python Your complete guide to building intelligent apps using Python 3.x

Arrow left icon
Product type Paperback
Published in Jan 2020
Publisher Packt
ISBN-13 9781839219535
Length 618 pages
Edition 2nd Edition
Languages
Arrow right icon
Authors (2):
Arrow left icon
Prateek Joshi Prateek Joshi
Author Profile Icon Prateek Joshi
Prateek Joshi
Alberto Artasanchez Alberto Artasanchez
Author Profile Icon Alberto Artasanchez
Alberto Artasanchez
Arrow right icon
View More author details
Toc

Table of Contents (26) Chapters Close

Preface 1. Introduction to Artificial Intelligence 2. Fundamental Use Cases for Artificial Intelligence FREE CHAPTER 3. Machine Learning Pipelines 4. Feature Selection and Feature Engineering 5. Classification and Regression Using Supervised Learning 6. Predictive Analytics with Ensemble Learning 7. Detecting Patterns with Unsupervised Learning 8. Building Recommender Systems 9. Logic Programming 10. Heuristic Search Techniques 11. Genetic Algorithms and Genetic Programming 12. Artificial Intelligence on the Cloud 13. Building Games with Artificial Intelligence 14. Building a Speech Recognizer 15. Natural Language Processing 16. Chatbots 17. Sequential Data and Time Series Analysis 18. Image Recognition 19. Neural Networks 20. Deep Learning with Convolutional Neural Networks 21. Recurrent Neural Networks and Other Deep Learning Models 22. Creating Intelligent Agents with Reinforcement Learning 23. Artificial Intelligence and Big Data 24. Other Books You May Enjoy
25. Index

Data cleansing and transformation

Just as gas powers a car, data is the lifeblood of AI. The age-old adage of "garbage in, garbage out" remains painfully true. For this reason, having clean and accurate data is paramount to producing consistent, reproducible, and accurate AI models. Some of this data cleansing has required painstaking human involvement. By some measures, it is said that a data scientist spends about 80% of their time cleaning, preparing, and transforming their input data and 20% of the time running and optimizing their models. Examples of this are the ImageNet and MS-COCO image datasets. Both contain over a million labeled images of various objects and categories. These datasets are used to train models that can distinguish between different categories and object types. Initially, these datasets were painstakingly and patiently labeled by humans. As these systems become more prevalent, we can use AI to perform the labeling. Furthermore, there is a plethora of AI-enabled tools that help with the cleansing and deduplication process.

One good example is Amazon Lake Formation. In August 2019, Amazon made its service Lake Formation generally available. Amazon Lake Formation automates some of the steps typically involved in the creation of a data lake including the collection, cleansing, deduplication, cataloging, and publication of data. The data then can be made available for analytics and to build machine models. To use Lake Formation, a user can bring data into the lake from a range of sources using predefined templates. They can then define policies that govern data access depending on the level of access that groups across the organization require.

Some automatic preparation, cleansing, and classification that the data undergoes uses machine learning to automatically perform these tasks.

Lake Formation also provides a centralized dashboard where administrators can manage and monitor data access policies, governance, and auditing across multiple analytics engines. Users can also search for datasets in the resulting catalog. As the tool evolves in the next few months and years, it will facilitate the analysis of data using their favorite analytics and machine learning services, including:

  • Databricks
  • Tableau
  • Amazon Redshift
  • Amazon Athena
  • AWS Glue
  • Amazon EMR
  • Amazon QuickSight
  • Amazon SageMaker
You have been reading a chapter from
Artificial Intelligence with Python - Second Edition
Published in: Jan 2020
Publisher: Packt
ISBN-13: 9781839219535
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime