Technical requirements
The code for this chapter is available here: https://github.com/PacktPublishing/Unlocking-Data-with-Generative-AI-and-RAG/tree/main/Chapter_02
You will need to run this chapter’s code in an environment that’s been set up to run Jupyter notebooks. Experience with Jupyter notebooks is a prerequisite for using this book, and it is too difficult to cover it in a short amount of text. There are numerous ways to set up a notebook environment. There are online versions, versions you can download, notebook environments that universities provide students, and different interfaces you can use. If you are doing this at a company, they will likely have an environment you will want to get familiar with. Each of these options takes very different instructions to set up, and those instructions change often. If you need to brush up on your knowledge about this type of environment, you can start on the Jupyter website: https://docs.jupyter.org/en/latest/. Start here, then ask your favorite LLM for more help to get your environment set up.
What do I use? When I use my Chromebook, often when I am traveling, I use a notebook set up in one of the cloud environments. I prefer Google Colab or their Colab Enterprise notebooks, which you can find in the Vertex AI section of Google Cloud Platform. But these environments cost money, often exceeding $20 a month if you are active. If you are as active as me, it can exceed $1,000 per month!
As a cost-effective alternative for when I am that active, I use Docker Desktop on my Mac, which hosts a Kubernetes cluster locally, and set up my notebook environment in the cluster. All these approaches have several environmental requirements that are often changing. It is best to do a little research and figure out what works best for your situation. There are similar solutions for Windows-based computers.
Ultimately, the primary requirement is to find an environment in which you can run a Jupyter notebook using Python 3. The code we will provide will indicate what other packages you will need to install.
Note
All of this code assumes you are working in a Jupyter notebook. You could do this directly in a Python file (.py
), but you may have to change some of it. Running this in a notebook gives you the ability to step through it cell by cell and see what happens at each point to better understand the entire process.