Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Preprocessing Unstructured Data for LLMs and RAG Systems
Preprocessing Unstructured Data for LLMs and RAG Systems

Preprocessing Unstructured Data for LLMs and RAG Systems: Unlock the Power of Unstructured Data for LLMs and Retrieval-Augmented Generation Systems.

Arrow left icon
Profile Icon Paulo Dichone
Arrow right icon
€67.99
Video Sep 2024 3hrs 1min 1st Edition
Video
€67.99
Subscription
Free Trial
Renews at €18.99p/m
Arrow left icon
Profile Icon Paulo Dichone
Arrow right icon
€67.99
Video Sep 2024 3hrs 1min 1st Edition
Video
€67.99
Subscription
Free Trial
Renews at €18.99p/m
Video
€67.99
Subscription
Free Trial
Renews at €18.99p/m

What do you get with a video?

Product feature icon Download this video in MP4 format
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want

Key benefits

  • Development setup and API configuration for efficient data preprocessing workflows.
  • Advanced data preprocessing techniques for PDFs, HTML, and PPTX using the Unstructured Framework.
  • Build a RAG system for intelligent interaction with complex documents, images, and tables.

Description

This course offers an in-depth exploration of preprocessing unstructured data for large language models and retrieval-augmented generation systems. You’ll start by setting up your development environment and configuring essential APIs, ensuring a solid technical foundation. Next, you’ll dive into data preprocessing techniques, tackling challenges like content extraction, cleaning, and data normalization, making your data ready for advanced AI models. As you progress, the course provides hands-on experience with various document types such as PDFs, HTML, and PPTX files. You’ll learn to transform these unstructured formats into structured data that AI systems can easily process. Advanced modules cover chunking, metadata extraction, and handling complex documents using cutting-edge techniques like visual transformers and document layout detectors. The final section guides you in building a complete RAG system using the skills acquired throughout the course. You’ll preprocess diverse documents, implement semantic similarity searches, and save elements to a vector database. By the end, you’ll be equipped to create intelligent data pipelines and interact with your documents using AI, significantly enhancing your data-driven projects.

Who is this book for?

This course is ideal for data scientists, machine learning engineers, and AI developers who want to enhance their skills in data preprocessing for LLMs and RAG systems. Prerequisites include basic knowledge of Python programming, familiarity with APIs, and a general understanding of machine learning concepts.

What you will learn

  • Configure a complete data preprocessing environment.
  • Extract and clean data from various document types.
  • Normalize and chunk data for efficient processing.
  • Perform metadata extraction and semantic analysis.
  • Develop a full Retrieval-Augmented Generation system.
  • Interact with processed documents using advanced AI tools.

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Sep 30, 2024
Length: 3hrs 1min
Edition : 1st
Language : English
ISBN-13 : 9781836642930
Category :

What do you get with a video?

Product feature icon Download this video in MP4 format
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want

Product Details

Publication date : Sep 30, 2024
Length: 3hrs 1min
Edition : 1st
Language : English
ISBN-13 : 9781836642930
Category :

Packt Subscriptions

See our plans and pricing
Modal Close icon
€18.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
€189.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts
€264.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 146.97
Hadoop Beginner's Guide
€41.99
Learning PySpark
€36.99
Preprocessing Unstructured Data for LLMs and RAG Systems
€67.99
Total 146.97 Stars icon

Table of Contents

8 Chapters
Introduction Chevron down icon Chevron up icon
Development Environment Setup Chevron down icon Chevron up icon
Data Preprocessing for LLMs - Deep Dive Chevron down icon Chevron up icon
Hands-on: The Unstructured Framework - Preprocessing HTML, PDFs & PPTX Documents Chevron down icon Chevron up icon
Chunking and Metadata Extraction Chevron down icon Chevron up icon
Preprocessing Complex Documents - PDFs and Images Chevron down icon Chevron up icon
Build a RAG System Using Learned Techniques - Full Use Case Chevron down icon Chevron up icon
Wrap up Chevron down icon Chevron up icon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How can I download a video package for offline viewing? Chevron down icon Chevron up icon
  1. Login to your account at Packtpub.com.
  2. Click on "My Account" and then click on the "My Videos" tab to access your videos.
  3. Click on the "Download Now" link to start your video download.
How can I extract my video file? Chevron down icon Chevron up icon

All modern operating systems ship with ZIP file extraction built in. If you'd prefer to use a dedicated compression application, we've tested WinRAR / 7-Zip for Windows, Zipeg / iZip / UnRarX for Mac and 7-Zip / PeaZip for Linux. These applications support all extension files.

How can I get help and support around my video package? Chevron down icon Chevron up icon

If your video course doesn't give you what you were expecting, either because of functionality problems or because the content isn't up to scratch, please mail customercare@packt.com with details of the problem. In addition, so that we can best provide the support you need, please include the following information for our support team.

  1. Video
  2. Format watched (HTML, MP4, streaming)
  3. Chapter or section that issue relates to (if relevant)
  4. System being played on
  5. Browser used (if relevant)
  6. Details of support
Why can’t I download my video package? Chevron down icon Chevron up icon

In the even that you are having issues downloading your video package then please follow these instructions:

  1. Disable all your browser plugins and extensions: Some security and download manager extensions can cause issues during the download.
  2. Download the video course using a different browser: We've tested downloads operate correctly in current versions of Chrome, Firefox, Internet Explorer, and Safari.