Python ecosystem and project installation
Any Python project needs three fundamental tools: the Python interpreter, dependency management, and a task execution tool. The Python interpreter executes your Python project as expected. All the code within the book is tested with Python 3.11.8. You can download the Python interpreter from here: https://www.python.org/downloads/. We recommend installing the exact Python version (Python 3.11.8) to run the LLM Twin project using pyenv
, making the installation process straightforward.
Instead of installing multiple global Python versions, we recommend managing them using pyenv
, a Python version management tool that lets you manage multiple Python versions between projects. You can install it using this link: https://github.com/pyenv/pyenv?tab=readme-ov-file#installation.
After you have installed pyenv
, you can install the latest version of Python 3.11, using pyenv
, as follows:
pyenv install 3.11.8
Now list all installed Python versions to see that it was installed correctly:
pyenv versions
You should see something like this:
# * system
# 3.11.8
To make Python 3.11.8 the default version across your entire system (whenever you open a new terminal), use the following command:
pyenv global 3.11.8
However, we aim to use Python 3.11.8 locally only in our repository. To achieve that, first, we have to clone the repository and navigate to it:
git clone https://github.com/PacktPublishing/LLM-Engineers-Handbook.git
cd LLM-Engineers-Handbook
Because we defined a .python-version
file within the repository, pyenv
will know to pick up the version from that file and use it locally whenever you are working within that folder. To double-check that, run the following command while you are in the repository:
python --version
It should output:
# Python 3.11.8
To create the .python-version
file, you must run pyenv local 3.11.8
once. Then, pyenv
will always know to use that Python version while working within a specific directory.
Now that we have installed the correct Python version using pyenv
, let’s move on to Poetry, which we will use as our dependency and virtual environment manager.
Poetry: dependency and virtual environment management
Poetry is one of the most popular dependency and virtual environment managers within the Python ecosystem. But let’s start by clarifying what a dependency manager is. In Python, a dependency manager allows you to specify, install, update, and manage external libraries or packages (dependencies) that a project relies on. For example, this is a simple Poetry requirements file that uses Python 3.11 and the requests
and numpy
Python packages.
[tool.poetry.dependencies]
python = "^3.11"
requests = "^2.25.1"
numpy = "^1.19.5"
[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"
By using Poetry to pin your dependencies, you always ensure that you install the correct version of the dependencies that your projects work with. Poetry, by default, saves all its requirements in pyproject.toml
files, which are stored at the root of your repository, as you can see in the cloned LLM-Engineers-Handbook repository.
Another massive advantage of using Poetry is that it creates a new Python virtual environment in which it installs the specified Python version and requirements. A virtual environment allows you to isolate your project’s dependencies from your global Python dependencies and other projects. By doing so, you ensure there are no version clashes between projects. For example, let’s assume that Project A needs numpy == 1.19.5
, and Project B needs numpy == 1.26.0
. If you keep both projects in the global Python environment, that will not work, as Project B will override Project A’s numpy
installation, which will corrupt Project A and stop it from working. Using Poetry, you can isolate each project in its own Python environment with its own Python dependencies, avoiding any dependency clashes.
You can install Poetry from here: https://python-poetry.org/docs/. We use Poetry 1.8.3 throughout the book. Once Poetry is installed, navigate to your cloned LLM-Engineers-Handbook repository and run the following command to install all the necessary Python dependencies:
poetry install --without aws
This command knows to pick up all the dependencies from your repository that are listed in the pyproject.toml
and poetry.lock
files. After the installation, you can activate your Poetry environment by running poetry shell
in your terminal or by prefixing all your CLI commands as follows: poetry run <your command>
.
One final note on Poetry is that it locks down the exact versions of the dependency tree in the poetry.lock
file based on the definitions added to the project.toml
file. While the pyproject.toml
file may specify version ranges (e.g., requests = "^2.25.1"
), the poetry.lock
file records the exact version (e.g., requests = "2.25.1"
) that was installed. It also locks the versions of sub-dependencies (dependencies of your dependencies), which may not be explicitly listed in your pyproject.toml
file. By locking all the dependencies and sub-dependencies to specific versions, the poetry.lock
file ensures that all project installations use the same versions of each package. This consistency leads to predictable behavior, reducing the likelihood of encountering “works on my machine” issues.
Other tools similar to Poetry are Venv and Conda for creating virtual environments. Still, they lack the dependency management option. Thus, you must do it through Python’s default requirements.txt
files, which are less powerful than Poetry’s lock
files. Another option is Pipenv, which feature-wise is more like Poetry but slower, and uv
, which is a replacement for Poetry built in Rust, making it blazing fast. uv
has lots of potential to replace Poetry, making it worthwhile to test out: https://github.com/astral-sh/uv.
The final piece of the puzzle is to look at the task execution tool we used to manage all our CLI commands.
Poe the Poet: task execution tool
Poe the Poet is a plugin on top of Poetry that is used to manage and execute all the CLI commands required to interact with the project. It helps you define and run tasks within your Python project, simplifying automation and script execution. Other popular options are Makefile, Invoke, or shell scripts, but Poe the Poet eliminates the need to write separate shell scripts or Makefiles for managing project tasks, making it an elegant way to manage tasks using the same configuration file that Poetry already uses for dependencies.
When working with Poe the Poet, instead of having all your commands documented in a README file or other document, you can add them directly to your pyproject.toml
file and execute them in the command line with an alias. For example, using Poe the Poet, we can define the following tasks in a pyproject.toml
file:
[tool.poe.tasks]
test = "pytest"
format = "black ."
start = "python main.py"
You can then run these tasks using the poe
command:
poetry poe test
poetry poe format
poetry poe start
You can install Poe the Poet as a Poetry plugin, as follows:
poetry self add 'poethepoet[poetry_plugin]'
To conclude, using a tool as a façade over all your CLI commands is necessary to run your application. It significantly simplifies the application’s complexity and enhances collaboration as it acts as out-of-the-box documentation.
Assuming you have pyenv
and Poetry installed, here are all the commands you need to run to clone the repository and install the dependencies and Poe the Poet as a Poetry plugin:
git clone https://github.com/PacktPublishing/LLM-Engineers-Handbook.gitcd LLM-Engineers-Handbook
poetry install --without aws
poetry self add 'poethepoet[poetry_plugin]'
To make the project fully operational, there are still a few steps to follow, such as filling out a .env
file with your credentials and getting tokens from OpenAI and Hugging Face. But this book isn’t an installation guide, so we’ve moved all these details into the repository’s README as they are useful only if you plan to run the repository: https://github.com/PacktPublishing/LLM-Engineers-Handbook.
Now that we have installed our Python project, let’s present the MLOps tools we will use in the book. If you are already familiar with these tools, you can safely skip the following tooling section and move on to the Databases for storing unstructured and vector data section.