Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases now! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Essential Guide to LLMOps

You're reading from   Essential Guide to LLMOps Implementing effective strategies for Large Language Models in deployment and continuous improvement

Arrow left icon
Product type Paperback
Published in Jul 2024
Publisher Packt
ISBN-13 9781835887509
Length 190 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Ryan Doan Ryan Doan
Author Profile Icon Ryan Doan
Ryan Doan
Arrow right icon
View More author details
Toc

Table of Contents (14) Chapters Close

Preface 1. Part 1: Foundations of LLMOps FREE CHAPTER
2. Chapter 1: Introduction to LLMs and LLMOps 3. Chapter 2: Reviewing LLMOps Components 4. Part 2: Tools and Strategies in LLMOps
5. Chapter 3: Processing Data in LLMOps Tools 6. Chapter 4: Developing Models via LLMOps 7. Chapter 5: LLMOps Review and Compliance 8. Part 3: Advanced LLMOps Applications and Future Outlook
9. Chapter 6: LLMOps Strategies for Inference, Serving, and Scalability 10. Chapter 7: LLMOps Monitoring and Continuous Improvement 11. Chapter 8: The Future of LLMOps and Emerging Technologies 12. Index 13. Other Books You May Enjoy

Data collection and preparation

Data collection and preparation form the backbone of large language model (LLM) training and efficiency. This phase involves gathering, processing, and storing data in a manner that makes it most useful for training LLMs.

Data collection

Data collection for LLM training typically involves sourcing from a variety of public datasets that are rich in language diversity. These datasets include the following:

  • Web text: Data scraped from websites, encompassing a wide range of topics and styles
  • Books and publications: Texts from books, especially those in the public domain, provide a classic and varied literary perspective
  • Social media feeds: Platforms such as Twitter or Reddit offer insights into colloquial and current language usage
  • News articles: Datasets from news websites present formal and contemporary language

Here’s an example of what a web scraper may obtain from a news site in JSON form:

{
  &quot...
lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime