Part 3: Downstream Data Cleaning – Consuming Unstructured Data
This part focuses on the challenges and techniques involved in processing unstructured data, such as text, images, and audio, in the context of modern machine learning, particularly large language models (LLMs). It provides a comprehensive overview of how to prepare unstructured data types for machine learning applications, ensuring that the data is properly preprocessed for analysis and model training. The chapters cover essential preprocessing methods for text, as well as image and audio data, offering readers the tools to work with more complex and varied datasets in today’s AI-driven landscape.
This part has the following chapters:
- Chapter 12, Text Preprocessing in the Era of LLMs
- Chapter 13, Image and Audio Preprocessing with LLMs