Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
LLM Engineer's Handbook

You're reading from   LLM Engineer's Handbook Master the art of engineering large language models from concept to production

Arrow left icon
Product type Paperback
Published in Oct 2024
Publisher Packt
ISBN-13 9781836200079
Length 522 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Authors (3):
Arrow left icon
Maxime Labonne Maxime Labonne
Author Profile Icon Maxime Labonne
Maxime Labonne
Paul Iusztin Paul Iusztin
Author Profile Icon Paul Iusztin
Paul Iusztin
Alex Vesa Alex Vesa
Author Profile Icon Alex Vesa
Alex Vesa
Arrow right icon
View More author details
Toc

Table of Contents (15) Chapters Close

Preface 1. Understanding the LLM Twin Concept and Architecture FREE CHAPTER 2. Tooling and Installation 3. Data Engineering 4. RAG Feature Pipeline 5. Supervised Fine-Tuning 6. Fine-Tuning with Preference Alignment 7. Evaluating LLMs 8. Inference Optimization 9. RAG Inference Pipeline 10. Inference Pipeline Deployment 11. MLOps and LLMOps 12. Other Books You May Enjoy
13. Index
Appendix: MLOps Principles

Python ecosystem and project installation

Any Python project needs three fundamental tools: the Python interpreter, dependency management, and a task execution tool. The Python interpreter executes your Python project as expected. All the code within the book is tested with Python 3.11.8. You can download the Python interpreter from here: https://www.python.org/downloads/. We recommend installing the exact Python version (Python 3.11.8) to run the LLM Twin project using pyenv, making the installation process straightforward.

Instead of installing multiple global Python versions, we recommend managing them using pyenv, a Python version management tool that lets you manage multiple Python versions between projects. You can install it using this link: https://github.com/pyenv/pyenv?tab=readme-ov-file#installation.

After you have installed pyenv, you can install the latest version of Python 3.11, using pyenv, as follows:

pyenv install 3.11.8

Now list all installed Python versions to see that it was installed correctly:

pyenv versions

You should see something like this:

# * system
#   3.11.8

To make Python 3.11.8 the default version across your entire system (whenever you open a new terminal), use the following command:

pyenv global 3.11.8

However, we aim to use Python 3.11.8 locally only in our repository. To achieve that, first, we have to clone the repository and navigate to it:

git clone https://github.com/PacktPublishing/LLM-Engineers-Handbook.git 
cd LLM-Engineers-Handbook

Because we defined a .python-version file within the repository, pyenv will know to pick up the version from that file and use it locally whenever you are working within that folder. To double-check that, run the following command while you are in the repository:

python --version

It should output:

# Python 3.11.8

To create the .python-version file, you must run pyenv local 3.11.8 once. Then, pyenv will always know to use that Python version while working within a specific directory.

Now that we have installed the correct Python version using pyenv, let’s move on to Poetry, which we will use as our dependency and virtual environment manager.

Poetry: dependency and virtual environment management

Poetry is one of the most popular dependency and virtual environment managers within the Python ecosystem. But let’s start by clarifying what a dependency manager is. In Python, a dependency manager allows you to specify, install, update, and manage external libraries or packages (dependencies) that a project relies on. For example, this is a simple Poetry requirements file that uses Python 3.11 and the requests and numpy Python packages.

[tool.poetry.dependencies]
python = "^3.11"
requests = "^2.25.1"
numpy = "^1.19.5"
[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

By using Poetry to pin your dependencies, you always ensure that you install the correct version of the dependencies that your projects work with. Poetry, by default, saves all its requirements in pyproject.toml files, which are stored at the root of your repository, as you can see in the cloned LLM-Engineers-Handbook repository.

Another massive advantage of using Poetry is that it creates a new Python virtual environment in which it installs the specified Python version and requirements. A virtual environment allows you to isolate your project’s dependencies from your global Python dependencies and other projects. By doing so, you ensure there are no version clashes between projects. For example, let’s assume that Project A needs numpy == 1.19.5, and Project B needs numpy == 1.26.0. If you keep both projects in the global Python environment, that will not work, as Project B will override Project A’s numpy installation, which will corrupt Project A and stop it from working. Using Poetry, you can isolate each project in its own Python environment with its own Python dependencies, avoiding any dependency clashes.

You can install Poetry from here: https://python-poetry.org/docs/. We use Poetry 1.8.3 throughout the book. Once Poetry is installed, navigate to your cloned LLM-Engineers-Handbook repository and run the following command to install all the necessary Python dependencies:

poetry install --without aws

This command knows to pick up all the dependencies from your repository that are listed in the pyproject.toml and poetry.lock files. After the installation, you can activate your Poetry environment by running poetry shell in your terminal or by prefixing all your CLI commands as follows: poetry run <your command>.

One final note on Poetry is that it locks down the exact versions of the dependency tree in the poetry.lock file based on the definitions added to the project.toml file. While the pyproject.toml file may specify version ranges (e.g., requests = "^2.25.1"), the poetry.lock file records the exact version (e.g., requests = "2.25.1") that was installed. It also locks the versions of sub-dependencies (dependencies of your dependencies), which may not be explicitly listed in your pyproject.toml file. By locking all the dependencies and sub-dependencies to specific versions, the poetry.lock file ensures that all project installations use the same versions of each package. This consistency leads to predictable behavior, reducing the likelihood of encountering “works on my machine” issues.

Other tools similar to Poetry are Venv and Conda for creating virtual environments. Still, they lack the dependency management option. Thus, you must do it through Python’s default requirements.txt files, which are less powerful than Poetry’s lock files. Another option is Pipenv, which feature-wise is more like Poetry but slower, and uv, which is a replacement for Poetry built in Rust, making it blazing fast. uv has lots of potential to replace Poetry, making it worthwhile to test out: https://github.com/astral-sh/uv.

The final piece of the puzzle is to look at the task execution tool we used to manage all our CLI commands.

Poe the Poet: task execution tool

Poe the Poet is a plugin on top of Poetry that is used to manage and execute all the CLI commands required to interact with the project. It helps you define and run tasks within your Python project, simplifying automation and script execution. Other popular options are Makefile, Invoke, or shell scripts, but Poe the Poet eliminates the need to write separate shell scripts or Makefiles for managing project tasks, making it an elegant way to manage tasks using the same configuration file that Poetry already uses for dependencies.

When working with Poe the Poet, instead of having all your commands documented in a README file or other document, you can add them directly to your pyproject.toml file and execute them in the command line with an alias. For example, using Poe the Poet, we can define the following tasks in a pyproject.toml file:

[tool.poe.tasks]
test = "pytest"
format = "black ."
start = "python main.py"

You can then run these tasks using the poe command:

poetry poe test
poetry poe format
poetry poe start

You can install Poe the Poet as a Poetry plugin, as follows:

poetry self add 'poethepoet[poetry_plugin]'

To conclude, using a tool as a façade over all your CLI commands is necessary to run your application. It significantly simplifies the application’s complexity and enhances collaboration as it acts as out-of-the-box documentation.

Assuming you have pyenv and Poetry installed, here are all the commands you need to run to clone the repository and install the dependencies and Poe the Poet as a Poetry plugin:

git clone https://github.com/PacktPublishing/LLM-Engineers-Handbook.gitcd LLM-Engineers-Handbook
poetry install --without aws
poetry self add 'poethepoet[poetry_plugin]'

To make the project fully operational, there are still a few steps to follow, such as filling out a .env file with your credentials and getting tokens from OpenAI and Hugging Face. But this book isn’t an installation guide, so we’ve moved all these details into the repository’s README as they are useful only if you plan to run the repository: https://github.com/PacktPublishing/LLM-Engineers-Handbook.

Now that we have installed our Python project, let’s present the MLOps tools we will use in the book. If you are already familiar with these tools, you can safely skip the following tooling section and move on to the Databases for storing unstructured and vector data section.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime