What this book covers
Chapter 1, Data Augmentation Made Easy, is an introduction to data augmentation. Readers will learn the definition of data augmentation, data types, and its benefits. Furthermore, the readers will learn how to select the appropriate online Jupyter Python Notebook or install it locally. Finally, Chapter 1 concludes with a discussion on coding conventions, GitHub access, and the foundation of Object-Oriented class code, named Pluto.
Chapter 2, Biases in Data Augmentation, defines the computation, human, and systemic biases with plenty of real-world examples to illustrate the differences between these types of biases. Readers will have the opportunity to practice identifying data biases by downloading three real-world image datasets and two text datasets from the Kaggle website to reinforce their learning. Once downloaded, readers will learn how to display image and text batches and discuss potential biases in the data.
Chapter 3, Image Augmentation for Classification, has two parts. First, readers will learn the concepts and techniques of augmentation for Image classification, followed by hands-on Python coding and a detailed explanation of the image augmentation methods with a safe level of image distortion. By the end of this chapter, readers will learn the concepts and hands-on techniques in Python coding for classification image augmentation using six real-world image datasets. In addition, they will examine several Python open-source libraries for image augmentation and write Python wrapper functions using the chosen libraries.
Chapter 4, Image Augmentation for Segmentation, highlights that both Image Segmentation and Image Classification are critical components of the Computer Vision domain. Image Segmentation involves grouping parts of an image that belong to the same object, also known as pixel-level classification. Unlike Image Classification, which identifies and predicts the subject or label of a photo, Image Segmentation determines if a pixel belongs to a list of objects or tags. The image augmentation methods for segmentation or classification are the same, except segmentation comes with an additional mask or ground-truth image. Chapter 4 aims to provide continuing Geometric and Photometric transformations for Image Segmentation.
Chapter 5, Text Augmentation, explores text augmentation, a technique used in natural language processing (NLP) to generate additional data by modifying or creating new text from existing text data. Text augmentation can involve techniques such as character swapping, noise injection, synonym replacement, word deletion, word insertion, and word swapping. Image and Text augmentation has the same goal. They strive to increase the training dataset’s size and improve AI prediction accuracy. In Chapter 5, you will learn about Text augmentation and how to code the methods in the Python Notebooks.
Chapter 6, Text Augmentation with Machine Learning, discusses an advanced technique that aims to improve ML model accuracy. Interestingly, text augmentation uses a pre-trained ML model to create additional training NLP data, creating a circular process. Although ML coding is beyond the scope of this book, understanding the difference between using libraries and ML for text augmentation can be beneficial. Chapter 6 will cover text augmentation with Machine Learning.
Chapter 7, Audio Data Augmentation, explains that similar to image and text augmentation, the objective of audio augmentation is to extend the dataset for gaining higher accuracy forecast or prediction in a Generative AI system. Audio augmentation is cost-effective and a viable option when acquiring additional audio files is expensive or time-consuming. Writing about audio augmentation methods poses unique challenges. The first is that audio is not visual like images or text. If the format is audiobooks, web pages, or mobile apps, we play the sound, but the medium is paper. Thus, we will transform the audio signal into a visual representation. Chapter 6 will cover Audio augmentation using Waveform transformation. You can play the audio file on the Python Notebook.
Chapter 8, Audio Data Augmentation with Spectogram, builds on the previous chapter’s topic of audio augmentation by exploring additional visualization methods beyond the Waveform graph. An audio spectrogram is another visualizing method to see the audio components. The inputs to the spectrogram are a one-dimensional array of amplitude values and the sampling rate. They are the same inputs as the Waveform graph. An audio spectrogram is sometimes called sonographs, sonagrams, voiceprints, or voicegrams. The typical usage is for music, human speech, and sonar. A short standard definition is a spectrum of frequency maps with time duration. In other words, the Y-axis is the frequency in Hz or kHz, and the X-axis is the time duration in seconds or milliseconds. Chapter 8 will cover the audio spectrogram standard format, variation of a spectrogram, Mel-spectrogram, Chroma Short-time Fourier transformation (STFT), and augmentation techniques.
Chapter 9, Tabular Data Augmentation, involves taking data from a database, spreadsheet, or table format and extending it for the AI training cycle. The goal is to increase the accuracy of prediction or forecast, which is the same for image, text, and audio augmentations. Tabular augmentation is a relativelynew field for Data scientists. It is contrary to using analytics for reporting, summarizing, or forecasting. In analytics, altering or adding data to skew the results to a preconceived desired outcome is unethical. In data augmentation, the purpose is to derive new data from an existing dataset. The two goals are incongruent, but they are not. There will be a slight departure from the image, text, and audio augmentation format. We will spend more time in Python code studying the real-world tabular dataset.