Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Data Augmentation with Python
Data Augmentation with Python

Data Augmentation with Python: Enhance deep learning accuracy with data augmentation methods for image, text, audio, and tabular data

eBook
€23.99 €26.99
Paperback
€33.99
Subscription
Free Trial
Renews at €18.99p/m

What do you get with Print?

Product feature icon Instant access to your digital copy whilst your Print order is Shipped
Product feature icon Colour book shipped to your preferred address
Product feature icon Redeem a companion digital copy on all Print orders
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Table of content icon View table of contents Preview book icon Preview Book

Data Augmentation with Python

Data Augmentation Made Easy

Data augmentation is essential for developing a successful deep learning (DL) project. However, data scientists and developers often overlook this crucial step. It is no secret that you will spend the majority of your project time gathering, cleaning, and augmenting the dataset in a real-world DL project. Thus, learning how to expand the dataset without purchasing new data is essential. This book covers standard and advanced techniques for extending image, text, audio, and tabular datasets. Furthermore, you will learn about data biases and learn how to code on Jupyter Python Notebooks.

Chapter 1 will introduce various data augmentation concepts, set up the coding environment, and create the foundation class. Later chapters will explain various techniques in detail, including Python coding. The effective use of data augmentation has proven to be the deciding factor between success and failure in machine learning (ML). Many real-world ML projects stay in the conceptual phase because of insufficient data for training the ML model. Data augmentation is a cost-effective technique that can increase the size of the dataset, lower the training error rate, and produce a more accurate prediction and forecast.

Fun fact

The car gasoline analogy is helpful for students who first learn about data augmentation and artificial intelligence (AI). You can think of data for the AI engine as the gasoline and data augmentation as the additive, such as the Chevron Techron fuel cleaner, that makes your car engine run faster, smoother, and further without extra petrol.

In this chapter, we’ll define the data augmentation role and the limitations of extending data without changing its integrity. We’ll briefly discuss the different types of input data, such as image, text, audio, and tabular data, and the challenges in supplementing it. Finally, we’ll set up the system requirements and the programming style in the accompanying Python notebook.

I designed this book to be a hands-on journey. It will be most effective to read a chapter, run the code, re-read the part of the chapter that confused you, and jump back to hacking the code until you firmly understand the concept or technique that was presented.

You are encouraged to change or add new code to the Python notebook. The primary purpose of this book is interactive learning. So, if something goes wrong, download a fresh copy from the book's GitHub. The surest method to learn is to make mistakes and create something new.

Data augmentation is an iterative process. There is no fixed recipe. In other words, depending on the dataset, you select augmented functions and jiggle the parameters. A subject domain expert may provide insight into how much distortion is acceptable. By the end of this chapter, you will know the general rules for data augmentation, what type of input data can be augmented, the programming style, and how to set up a Python Notebook online or offline.

In particular, this chapter covers the following primary topics:

  • Data augmentation role
  • Data input types
  • Python Notebook
  • Programming styles

Let’s start with the data augmentation role.

Data augmentation role

Data is paramount in any AI project. This is especially true when using the artificial neural network (ANN) algorithm, also known as DL. The success or failure of a DL project is primarily due to the input data quality.

One primary reason for the significance of data augmentation is that it is relatively too easy to develop an AI for prediction and forecasting, and those models require robust data input. With the remarkable advancement in developing, training, and deploying a DL project, such as using the FastAI framework, you can create a world-class DL model in a handful of Python code lines. Thus, expanding the dataset is an effective option to improve the DL model’s accuracy over your competitor.

The traditional method of acquiring additional data is difficult, expensive, and impractical. Sometimes, the only available option is to use data augmentation techniques to extend the dataset.

Fun fact

Data augmentation methods can increase the data’s size tenfold. For example, it is relatively challenging to acquire additional skin cancer images. Thus, using a random combination of image transformations, such as vertical flip, horizontal flip, rotating, and skewing, is a practical technique that can expand the skin cancer photo data.

Without data augmentation, sourcing new skin cancer photos and labeling them is expensive and time-consuming. The International Skin Imaging Collaboration (ISIC) is the authoritative data source for skin diseases, where a team of dermatologists verified and classified the images. ISIC made the datasets available to the public to download for free. If you can’t find a particular dataset from ISIC, it is difficult to find other means, as accessing hospital or university labs to acquire skin disease images is laced with legal and logistic blockers. After obtaining the photos, hiring a team of dermatologists to classify the pictures to correct diseases would be costly.

Another example of the impracticality of attaining additional images instead of augmentation is when you download photos from social media or online search engines. Social media is a rich source of image, text, audio, and video data. Search engines, such as Google or Bing, make it relatively easy to download additional data for a project, but copyrights and legal usage are a quagmire. Most images, texts, audio, and videos on social media, such as YouTube, Facebook, TikTok, and Twitter, are not clearly labeled as copyrights or public domain material.

Furthermore, social media promotes popular content, not unfavorable or obscure material. For example, let’s say you want to add more images of parrots to your parrot classification AI system. Online searches will return a lot of blue-and-yellow macaws, red-and-green macaws, or sulfur-crested cockatoos, but not as many Galah, Kea, or the mythical Norwegian-blue parrot – a fake parrot from the Monty Python comedy skit.

Insufficient data for AI training is exacerbated for text, audio, and tabular data types. Generally, obtaining additional text, audio, and tabular data is expensive and time-consuming. There are strong copyright laws protecting text data. Audio files are less common online, and tabular data is primarily from private company databases.

The following section will define the four commonly used data types.

Left arrow icon Right arrow icon

Key benefits

  • Explore beautiful, customized charts and infographics in full color
  • Work with fully functional OO code using open source libraries in the Python Notebook for each chapter
  • Unleash the potential of real-world datasets with practical data augmentation techniques

Description

Data is paramount in AI projects, especially for deep learning and generative AI, as forecasting accuracy relies on input datasets being robust. Acquiring additional data through traditional methods can be challenging, expensive, and impractical, and data augmentation offers an economical option to extend the dataset. The book teaches you over 20 geometric, photometric, and random erasing augmentation methods using seven real-world datasets for image classification and segmentation. You’ll also review eight image augmentation open source libraries, write object-oriented programming (OOP) wrapper functions in Python Notebooks, view color image augmentation effects, analyze safe levels and biases, as well as explore fun facts and take on fun challenges. As you advance, you’ll discover over 20 character and word techniques for text augmentation using two real-world datasets and excerpts from four classic books. The chapter on advanced text augmentation uses machine learning to extend the text dataset, such as Transformer, Word2vec, BERT, GPT-2, and others. While chapters on audio and tabular data have real-world data, open source libraries, amazing custom plots, and Python Notebook, along with fun facts and challenges. By the end of this book, you will be proficient in image, text, audio, and tabular data augmentation techniques.

Who is this book for?

This book is for data scientists and students interested in the AI discipline. Advanced AI or deep learning skills are not required; however, knowledge of Python programming and familiarity with Jupyter Notebooks are essential to understanding the topics covered in this book.

What you will learn

  • Write OOP Python code for image, text, audio, and tabular data
  • Access over 150,000 real-world datasets from the Kaggle website
  • Analyze biases and safe parameters for each augmentation method
  • Visualize data using standard and exotic plots in color
  • Discover 32 advanced open source augmentation libraries
  • Explore machine learning models, such as BERT and Transformer
  • Meet Pluto, an imaginary digital coding companion
  • Extend your learning with fun facts and fun challenges
Estimated delivery fee Deliver to Spain

Premium delivery 7 - 10 business days

€17.95
(Includes tracking information)

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Apr 28, 2023
Length: 394 pages
Edition : 1st
Language : English
ISBN-13 : 9781803246451
Category :
Languages :
Concepts :
Tools :

What do you get with Print?

Product feature icon Instant access to your digital copy whilst your Print order is Shipped
Product feature icon Colour book shipped to your preferred address
Product feature icon Redeem a companion digital copy on all Print orders
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Estimated delivery fee Deliver to Spain

Premium delivery 7 - 10 business days

€17.95
(Includes tracking information)

Product Details

Publication date : Apr 28, 2023
Length: 394 pages
Edition : 1st
Language : English
ISBN-13 : 9781803246451
Category :
Languages :
Concepts :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
€18.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
€189.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts
€264.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 109.97
Python Deep Learning
€37.99
Synthetic Data for Machine Learning
€37.99
Data Augmentation with Python
€33.99
Total 109.97 Stars icon

Table of Contents

16 Chapters
Part 1: Data Augmentation Chevron down icon Chevron up icon
Chapter 1: Data Augmentation Made Easy Chevron down icon Chevron up icon
Chapter 2: Biases in Data Augmentation Chevron down icon Chevron up icon
Part 2: Image Augmentation Chevron down icon Chevron up icon
Chapter 3: Image Augmentation for Classification Chevron down icon Chevron up icon
Chapter 4: Image Augmentation for Segmentation Chevron down icon Chevron up icon
Part 3: Text Augmentation Chevron down icon Chevron up icon
Chapter 5: Text Augmentation Chevron down icon Chevron up icon
Chapter 6: Text Augmentation with Machine Learning Chevron down icon Chevron up icon
Part 4: Audio Data Augmentation Chevron down icon Chevron up icon
Chapter 7: Audio Data Augmentation Chevron down icon Chevron up icon
Chapter 8: Audio Data Augmentation with Spectrogram Chevron down icon Chevron up icon
Part 5: Tabular Data Augmentation Chevron down icon Chevron up icon
Chapter 9: Tabular Data Augmentation Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Full star icon 5
(10 Ratings)
5 star 100%
4 star 0%
3 star 0%
2 star 0%
1 star 0%
Filter icon Filter
Top Reviews

Filter reviews by




Dr. Matthias Nagel Aug 11, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Ok
Amazon Verified review Amazon
Ady Ngom Jun 14, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
The author's passion for teaching really shines through this piece. A definitive must read if you want to start navigating the world of possibilities with AI and need a solid compass to guide you through.
Amazon Verified review Amazon
Y. XIA Jun 07, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
"Data Augmentation with Python" is a comprehensive overview of various data augmentation techniques and a demonstration of how to implement them using Python. This book strikes a balance between theoretical discussions and hands-on examples. The chapters are well-sequenced and accompanied by well-structured code snippets and illustrations, which are easy to understand and follow. Throughout the exercises, the author provides numerous fun facts, challenges, and real-world examples, making learning even more engaging.Overall, I enjoyed reading it and I believe this is a great resource for anyone interested in enhancing their understanding of the topic.
Amazon Verified review Amazon
crawdaddie Jul 05, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
"Boosting AI Accuracy with Real-World Datasets" is a helpful and practical guide for people who want to use Python and improve their AI models using data augmentation techniques. The book has lots of useful information, easy-to-understand explanations, and practical examples that allow readers to learn and become skilled in enhancing images, text, audio, and tabular data. This book is a valuable resource because of its practical approach.Data augmentation is a required technique for improving AI accuracy, and Duc Haba does an excellent job of explaining various augmentation methods using real-world datasets. Whether it's image, text, audio, or tabular data, readers will find detailed explanations, code samples, and insights into the biases and safe parameters for each augmentation method.The book covers a wide range of topics, providing over 150 functional object-oriented methods and open-source libraries to help readers achieve optimal results. This book is accessible to aspiring data scientists and those interested in the AI discipline, even without prior AI or deep learning skills.
Amazon Verified review Amazon
Poker Nanny Jun 07, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book effectively introduces readers to the topic of Data Augmentation. While the subject itself is vast and cannot be thoroughly explored in a single book, the author takes a practical approach by presenting various recipes that demonstrate different techniques for augmenting data across a variety of categories. Considering its scope, this book serves its purpose well in acquainting readers with Data Augmentation. Personally, I found the chapter on Biases in Data Augmentation to be the highlight of the book.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is the digital copy I get with my Print order? Chevron down icon Chevron up icon

When you buy any Print edition of our Books, you can redeem (for free) the eBook edition of the Print Book you’ve purchased. This gives you instant access to your book when you make an order via PDF, EPUB or our online Reader experience.

What is the delivery time and cost of print book? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela
What is custom duty/charge? Chevron down icon Chevron up icon

Customs duty are charges levied on goods when they cross international borders. It is a tax that is imposed on imported goods. These duties are charged by special authorities and bodies created by local governments and are meant to protect local industries, economies, and businesses.

Do I have to pay customs charges for the print book order? Chevron down icon Chevron up icon

The orders shipped to the countries that are listed under EU27 will not bear custom charges. They are paid by Packt as part of the order.

List of EU27 countries: www.gov.uk/eu-eea:

A custom duty or localized taxes may be applicable on the shipment and would be charged by the recipient country outside of the EU27 which should be paid by the customer and these duties are not included in the shipping charges been charged on the order.

How do I know my custom duty charges? Chevron down icon Chevron up icon

The amount of duty payable varies greatly depending on the imported goods, the country of origin and several other factors like the total invoice amount or dimensions like weight, and other such criteria applicable in your country.

For example:

  • If you live in Mexico, and the declared value of your ordered items is over $ 50, for you to receive a package, you will have to pay additional import tax of 19% which will be $ 9.50 to the courier service.
  • Whereas if you live in Turkey, and the declared value of your ordered items is over € 22, for you to receive a package, you will have to pay additional import tax of 18% which will be € 3.96 to the courier service.
How can I cancel my order? Chevron down icon Chevron up icon

Cancellation Policy for Published Printed Books:

You can cancel any order within 1 hour of placing the order. Simply contact customercare@packt.com with your order details or payment transaction id. If your order has already started the shipment process, we will do our best to stop it. However, if it is already on the way to you then when you receive it, you can contact us at customercare@packt.com using the returns and refund process.

Please understand that Packt Publishing cannot provide refunds or cancel any order except for the cases described in our Return Policy (i.e. Packt Publishing agrees to replace your printed book because it arrives damaged or material defect in book), Packt Publishing will not accept returns.

What is your returns and refunds policy? Chevron down icon Chevron up icon

Return Policy:

We want you to be happy with your purchase from Packtpub.com. We will not hassle you with returning print books to us. If the print book you receive from us is incorrect, damaged, doesn't work or is unacceptably late, please contact Customer Relations Team on customercare@packt.com with the order number and issue details as explained below:

  1. If you ordered (eBook, Video or Print Book) incorrectly or accidentally, please contact Customer Relations Team on customercare@packt.com within one hour of placing the order and we will replace/refund you the item cost.
  2. Sadly, if your eBook or Video file is faulty or a fault occurs during the eBook or Video being made available to you, i.e. during download then you should contact Customer Relations Team within 14 days of purchase on customercare@packt.com who will be able to resolve this issue for you.
  3. You will have a choice of replacement or refund of the problem items.(damaged, defective or incorrect)
  4. Once Customer Care Team confirms that you will be refunded, you should receive the refund within 10 to 12 working days.
  5. If you are only requesting a refund of one book from a multiple order, then we will refund you the appropriate single item.
  6. Where the items were shipped under a free shipping offer, there will be no shipping costs to refund.

On the off chance your printed book arrives damaged, with book material defect, contact our Customer Relation Team on customercare@packt.com within 14 days of receipt of the book with appropriate evidence of damage and we will work with you to secure a replacement copy, if necessary. Please note that each printed book you order from us is individually made by Packt's professional book-printing partner which is on a print-on-demand basis.

What tax is charged? Chevron down icon Chevron up icon

Currently, no tax is charged on the purchase of any print book (subject to change based on the laws and regulations). A localized VAT fee is charged only to our European and UK customers on eBooks, Video and subscriptions that they buy. GST is charged to Indian customers for eBooks and video purchases.

What payment methods can I use? Chevron down icon Chevron up icon

You can pay with the following card types:

  1. Visa Debit
  2. Visa Credit
  3. MasterCard
  4. PayPal
What is the delivery time and cost of print books? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela