Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
The Machine Learning Solutions Architect Handbook

You're reading from   The Machine Learning Solutions Architect Handbook Practical strategies and best practices on the ML lifecycle, system design, MLOps, and generative AI

Arrow left icon
Product type Paperback
Published in Apr 2024
Publisher Packt
ISBN-13 9781805122500
Length 602 pages
Edition 2nd Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
David Ping David Ping
Author Profile Icon David Ping
David Ping
Arrow right icon
View More author details
Toc

Table of Contents (19) Chapters Close

Preface 1. Navigating the ML Lifecycle with ML Solutions Architecture FREE CHAPTER 2. Exploring ML Business Use Cases 3. Exploring ML Algorithms 4. Data Management for ML 5. Exploring Open-Source ML Libraries 6. Kubernetes Container Orchestration Infrastructure Management 7. Open-Source ML Platforms 8. Building a Data Science Environment Using AWS ML Services 9. Designing an Enterprise ML Architecture with AWS ML Services 10. Advanced ML Engineering 11. Building ML Solutions with AWS AI Services 12. AI Risk Management 13. Bias, Explainability, Privacy, and Adversarial Attacks 14. Charting the Course of Your ML Journey 15. Navigating the Generative AI Project Lifecycle 16. Designing Generative AI Platforms and Solutions 17. Other Books You May Enjoy
18. Index

Choosing an LLM adaptation method

We have covered various LLM adaptation methods, including prompt engineering, domain adaptation pre-training, fine-tuning, and RAG. All these methods are intended to get better responses from the pre-trained LLMs. With all these options, it leaves one wondering: how do we choose which method to use?

Let’s break down some of the considerations when choosing these different methods.

Response quality

Response quality measures how accurately the LLM response is aligned with the intent of the user queries. The evaluation of response quality can be intricate for different use cases, as there are different considerations for evaluating response quality, such as knowledge domain affinity, task accuracy, up-to-date data, source data transparency, and hallucination.

For knowledge domain affinity, domain adaptation pre-training can be used to effectively teach LLM domain-specific knowledge and terminology. RAG is efficient in retrieving relevant data, but the LLM used for the response synthetization may not capture domain-specific patterns, terminology, and nuance as well as fine-tuned or domain adaptation pre-training models. If you need strong domain-specific performance, you want to consider domain adaptation pre-training.

If you need to maximize accuracy for specific tasks, then fine-tuning is the recommended approach. Prompt engineering can also help improve task accuracy through single-shot or few-shot prompting techniques, but it is prompt-specific and does not generalize across different prompts.

If information freshness in the response is the primary goal, then RAG is the ideal solution since it has access to dynamic external data sources. Prompt engineering can also help with data freshness when up-to-date knowledge is provided as part of the prompt. Fine-tuning and domain adaptation pre-training have knowledge cutoffs based on the latest training dataset used.

For some applications such as medical diagnosis or financial analysis, knowing how the decisions were made and what data sources were used in making the decision is crucial. If this is a critical requirement for the use case, then RAG is the clear choice here, as RAG can provide references to the knowledge it used for constructing the response. Fine-tuning and domain adaptation pre-training behave more like a “black box,” often obscuring what data sources are used for decision-making.

As mentioned in the previous chapter, LLMs sometimes generate inaccurate responses that are not grounded in their training data or user input when they encounter unfamiliar queries and hallucinate plausible but false information. Fine-tuning can reduce fabrication by focusing the model on domain-specific knowledge. However, the risk remains for unfamiliar inputs. RAG systems better address hallucination risks by anchoring responses to retrieved documents. The initial retrieval step acts as a fact check, finding relevant passages to ground the response in real data. Subsequent generation is confined within the context of the retrievals rather than being unconstrained. This mechanism minimizes fabricated responses not supported by data.

Cost of the adaptation

When evaluating LLM adaptation approaches, it is important to consider both initial implementation costs as well as long-term maintenance costs. With this in mind, let’s compare the costs of the different approaches.

Prompt engineering has the lowest overhead, involving simply writing and testing prompts to yield good results from the pre-trained language model. Maintenance may require occasional prompt updates as the foundation model is updated over time.

RAG systems have moderately high startup costs due to requiring multiple components – embeddings, vector stores, retrievers, and language models. However, these systems are relatively static over time.

Full fine-tuning and domain adaptation pre-training can be expensive, needing massive computational resources and time to completely update potentially all parameters of a large foundation model, as well as the cost of dataset preparation. Parameter Efficient Fine-Tuning (PEFT) can be cheaper than full fine-tuning and domain adaptation pre-training. However, it is still considered more expensive than RAG due to the requirement for high-quality dataset preparation and training resource requirements.

Implementation complexity

The implementation complexity varies significantly across different techniques, from straightforward to highly advanced configurations.

Prompt engineering has relatively low complexity, requiring mainly language skills and few-shot learning familiarity to craft prompts that elicit good performance from the foundation model. There are minimal requirements for programming skills and science knowledge.

RAG systems have moderate complexity, needing software engineering to build the pipeline components like retrievers and integrators. The complexity rises with advanced RAG configurations and infrastructure, such as complex workflows involving agents and tools, and infrastructure components for monitoring, observability, evaluation, and orchestration.

PEFT and full model fine-tuning have the highest complexity. These require deep expertise in deep learning, NLP, and data science to select training data, write tuning scripts, choose hyperparameters like learning rates, loss functions, etc., and ultimately update the model’s internal representations.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image