Exploring local models
We can also run local models from LangChain. The advantages of running models locally are complete control over the model and not sharing any data over the internet.
Please note that we don’t need an API token for local models!
Let’s preface this with a note of caution: an LLM is big, which means that it’ll take up a lot of disk space or system memory. The use cases presented in this section should run even on old hardware, like an old MacBook; however, if you choose a big model, it can take an exceptionally long time to run or may crash the Jupyter notebook. One of the main bottlenecks is memory requirement. In rough terms, if quantized (roughly, compressed; we’ll discuss quantization in Chapter 8, Customizing LLMs and Their Output), 1 billion parameters correspond to 1 GB of RAM (please note that not all models will come quantized).
You can also run these models on hosted resources or services such as Kubernetes...