Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Data Augmentation with Python
Data Augmentation with Python

Data Augmentation with Python: Enhance deep learning accuracy with data augmentation methods for image, text, audio, and tabular data

Arrow left icon
Profile Icon Duc Haba
Arrow right icon
Can$40.99 Can$45.99
Full star icon Full star icon Full star icon Full star icon Full star icon 5 (10 Ratings)
eBook Apr 2023 394 pages 1st Edition
eBook
Can$40.99 Can$45.99
Paperback
Can$56.99
Subscription
Free Trial
Arrow left icon
Profile Icon Duc Haba
Arrow right icon
Can$40.99 Can$45.99
Full star icon Full star icon Full star icon Full star icon Full star icon 5 (10 Ratings)
eBook Apr 2023 394 pages 1st Edition
eBook
Can$40.99 Can$45.99
Paperback
Can$56.99
Subscription
Free Trial
eBook
Can$40.99 Can$45.99
Paperback
Can$56.99
Subscription
Free Trial

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Table of content icon View table of contents Preview book icon Preview Book

Data Augmentation with Python

Data Augmentation Made Easy

Data augmentation is essential for developing a successful deep learning (DL) project. However, data scientists and developers often overlook this crucial step. It is no secret that you will spend the majority of your project time gathering, cleaning, and augmenting the dataset in a real-world DL project. Thus, learning how to expand the dataset without purchasing new data is essential. This book covers standard and advanced techniques for extending image, text, audio, and tabular datasets. Furthermore, you will learn about data biases and learn how to code on Jupyter Python Notebooks.

Chapter 1 will introduce various data augmentation concepts, set up the coding environment, and create the foundation class. Later chapters will explain various techniques in detail, including Python coding. The effective use of data augmentation has proven to be the deciding factor between success and failure in machine learning (ML). Many real-world ML projects stay in the conceptual phase because of insufficient data for training the ML model. Data augmentation is a cost-effective technique that can increase the size of the dataset, lower the training error rate, and produce a more accurate prediction and forecast.

Fun fact

The car gasoline analogy is helpful for students who first learn about data augmentation and artificial intelligence (AI). You can think of data for the AI engine as the gasoline and data augmentation as the additive, such as the Chevron Techron fuel cleaner, that makes your car engine run faster, smoother, and further without extra petrol.

In this chapter, we’ll define the data augmentation role and the limitations of extending data without changing its integrity. We’ll briefly discuss the different types of input data, such as image, text, audio, and tabular data, and the challenges in supplementing it. Finally, we’ll set up the system requirements and the programming style in the accompanying Python notebook.

I designed this book to be a hands-on journey. It will be most effective to read a chapter, run the code, re-read the part of the chapter that confused you, and jump back to hacking the code until you firmly understand the concept or technique that was presented.

You are encouraged to change or add new code to the Python notebook. The primary purpose of this book is interactive learning. So, if something goes wrong, download a fresh copy from the book's GitHub. The surest method to learn is to make mistakes and create something new.

Data augmentation is an iterative process. There is no fixed recipe. In other words, depending on the dataset, you select augmented functions and jiggle the parameters. A subject domain expert may provide insight into how much distortion is acceptable. By the end of this chapter, you will know the general rules for data augmentation, what type of input data can be augmented, the programming style, and how to set up a Python Notebook online or offline.

In particular, this chapter covers the following primary topics:

  • Data augmentation role
  • Data input types
  • Python Notebook
  • Programming styles

Let’s start with the data augmentation role.

Data augmentation role

Data is paramount in any AI project. This is especially true when using the artificial neural network (ANN) algorithm, also known as DL. The success or failure of a DL project is primarily due to the input data quality.

One primary reason for the significance of data augmentation is that it is relatively too easy to develop an AI for prediction and forecasting, and those models require robust data input. With the remarkable advancement in developing, training, and deploying a DL project, such as using the FastAI framework, you can create a world-class DL model in a handful of Python code lines. Thus, expanding the dataset is an effective option to improve the DL model’s accuracy over your competitor.

The traditional method of acquiring additional data is difficult, expensive, and impractical. Sometimes, the only available option is to use data augmentation techniques to extend the dataset.

Fun fact

Data augmentation methods can increase the data’s size tenfold. For example, it is relatively challenging to acquire additional skin cancer images. Thus, using a random combination of image transformations, such as vertical flip, horizontal flip, rotating, and skewing, is a practical technique that can expand the skin cancer photo data.

Without data augmentation, sourcing new skin cancer photos and labeling them is expensive and time-consuming. The International Skin Imaging Collaboration (ISIC) is the authoritative data source for skin diseases, where a team of dermatologists verified and classified the images. ISIC made the datasets available to the public to download for free. If you can’t find a particular dataset from ISIC, it is difficult to find other means, as accessing hospital or university labs to acquire skin disease images is laced with legal and logistic blockers. After obtaining the photos, hiring a team of dermatologists to classify the pictures to correct diseases would be costly.

Another example of the impracticality of attaining additional images instead of augmentation is when you download photos from social media or online search engines. Social media is a rich source of image, text, audio, and video data. Search engines, such as Google or Bing, make it relatively easy to download additional data for a project, but copyrights and legal usage are a quagmire. Most images, texts, audio, and videos on social media, such as YouTube, Facebook, TikTok, and Twitter, are not clearly labeled as copyrights or public domain material.

Furthermore, social media promotes popular content, not unfavorable or obscure material. For example, let’s say you want to add more images of parrots to your parrot classification AI system. Online searches will return a lot of blue-and-yellow macaws, red-and-green macaws, or sulfur-crested cockatoos, but not as many Galah, Kea, or the mythical Norwegian-blue parrot – a fake parrot from the Monty Python comedy skit.

Insufficient data for AI training is exacerbated for text, audio, and tabular data types. Generally, obtaining additional text, audio, and tabular data is expensive and time-consuming. There are strong copyright laws protecting text data. Audio files are less common online, and tabular data is primarily from private company databases.

The following section will define the four commonly used data types.

Left arrow icon Right arrow icon

Key benefits

  • Explore beautiful, customized charts and infographics in full color
  • Work with fully functional OO code using open source libraries in the Python Notebook for each chapter
  • Unleash the potential of real-world datasets with practical data augmentation techniques

Description

Data is paramount in AI projects, especially for deep learning and generative AI, as forecasting accuracy relies on input datasets being robust. Acquiring additional data through traditional methods can be challenging, expensive, and impractical, and data augmentation offers an economical option to extend the dataset. The book teaches you over 20 geometric, photometric, and random erasing augmentation methods using seven real-world datasets for image classification and segmentation. You’ll also review eight image augmentation open source libraries, write object-oriented programming (OOP) wrapper functions in Python Notebooks, view color image augmentation effects, analyze safe levels and biases, as well as explore fun facts and take on fun challenges. As you advance, you’ll discover over 20 character and word techniques for text augmentation using two real-world datasets and excerpts from four classic books. The chapter on advanced text augmentation uses machine learning to extend the text dataset, such as Transformer, Word2vec, BERT, GPT-2, and others. While chapters on audio and tabular data have real-world data, open source libraries, amazing custom plots, and Python Notebook, along with fun facts and challenges. By the end of this book, you will be proficient in image, text, audio, and tabular data augmentation techniques.

Who is this book for?

This book is for data scientists and students interested in the AI discipline. Advanced AI or deep learning skills are not required; however, knowledge of Python programming and familiarity with Jupyter Notebooks are essential to understanding the topics covered in this book.

What you will learn

  • Write OOP Python code for image, text, audio, and tabular data
  • Access over 150,000 real-world datasets from the Kaggle website
  • Analyze biases and safe parameters for each augmentation method
  • Visualize data using standard and exotic plots in color
  • Discover 32 advanced open source augmentation libraries
  • Explore machine learning models, such as BERT and Transformer
  • Meet Pluto, an imaginary digital coding companion
  • Extend your learning with fun facts and fun challenges

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Apr 28, 2023
Length: 394 pages
Edition : 1st
Language : English
ISBN-13 : 9781803235912
Category :
Languages :
Concepts :
Tools :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Product Details

Publication date : Apr 28, 2023
Length: 394 pages
Edition : 1st
Language : English
ISBN-13 : 9781803235912
Category :
Languages :
Concepts :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just Can$6 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just Can$6 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total Can$ 184.97
Python Deep Learning
Can$63.99
Synthetic Data for Machine Learning
Can$63.99
Data Augmentation with Python
Can$56.99
Total Can$ 184.97 Stars icon

Table of Contents

16 Chapters
Part 1: Data Augmentation Chevron down icon Chevron up icon
Chapter 1: Data Augmentation Made Easy Chevron down icon Chevron up icon
Chapter 2: Biases in Data Augmentation Chevron down icon Chevron up icon
Part 2: Image Augmentation Chevron down icon Chevron up icon
Chapter 3: Image Augmentation for Classification Chevron down icon Chevron up icon
Chapter 4: Image Augmentation for Segmentation Chevron down icon Chevron up icon
Part 3: Text Augmentation Chevron down icon Chevron up icon
Chapter 5: Text Augmentation Chevron down icon Chevron up icon
Chapter 6: Text Augmentation with Machine Learning Chevron down icon Chevron up icon
Part 4: Audio Data Augmentation Chevron down icon Chevron up icon
Chapter 7: Audio Data Augmentation Chevron down icon Chevron up icon
Chapter 8: Audio Data Augmentation with Spectrogram Chevron down icon Chevron up icon
Part 5: Tabular Data Augmentation Chevron down icon Chevron up icon
Chapter 9: Tabular Data Augmentation Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Full star icon 5
(10 Ratings)
5 star 100%
4 star 0%
3 star 0%
2 star 0%
1 star 0%
Filter icon Filter
Top Reviews

Filter reviews by




Dr. Matthias Nagel Aug 11, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Ok
Amazon Verified review Amazon
Ady Ngom Jun 14, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
The author's passion for teaching really shines through this piece. A definitive must read if you want to start navigating the world of possibilities with AI and need a solid compass to guide you through.
Amazon Verified review Amazon
Y. XIA Jun 07, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
"Data Augmentation with Python" is a comprehensive overview of various data augmentation techniques and a demonstration of how to implement them using Python. This book strikes a balance between theoretical discussions and hands-on examples. The chapters are well-sequenced and accompanied by well-structured code snippets and illustrations, which are easy to understand and follow. Throughout the exercises, the author provides numerous fun facts, challenges, and real-world examples, making learning even more engaging.Overall, I enjoyed reading it and I believe this is a great resource for anyone interested in enhancing their understanding of the topic.
Amazon Verified review Amazon
crawdaddie Jul 05, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
"Boosting AI Accuracy with Real-World Datasets" is a helpful and practical guide for people who want to use Python and improve their AI models using data augmentation techniques. The book has lots of useful information, easy-to-understand explanations, and practical examples that allow readers to learn and become skilled in enhancing images, text, audio, and tabular data. This book is a valuable resource because of its practical approach.Data augmentation is a required technique for improving AI accuracy, and Duc Haba does an excellent job of explaining various augmentation methods using real-world datasets. Whether it's image, text, audio, or tabular data, readers will find detailed explanations, code samples, and insights into the biases and safe parameters for each augmentation method.The book covers a wide range of topics, providing over 150 functional object-oriented methods and open-source libraries to help readers achieve optimal results. This book is accessible to aspiring data scientists and those interested in the AI discipline, even without prior AI or deep learning skills.
Amazon Verified review Amazon
Poker Nanny Jun 07, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book effectively introduces readers to the topic of Data Augmentation. While the subject itself is vast and cannot be thoroughly explored in a single book, the author takes a practical approach by presenting various recipes that demonstrate different techniques for augmenting data across a variety of categories. Considering its scope, this book serves its purpose well in acquainting readers with Data Augmentation. Personally, I found the chapter on Biases in Data Augmentation to be the highlight of the book.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.