We all got familiar with conversational UI powered by Large Language Models: ChatGPT has been the first and most powerful example of how LLMs can boost our productivity daily. To be so accurate, LLMs are, by design, “large”, meaning that they are made of billions of parameters, hence they are hosted in powerful infrastructure, typically in the public cloud (namely, OpenAI’s models including ChatGPT are hosted in Microsoft Azure). As such, those models are accessible with an internet connection.
But what if you could run those powerful models on your local PC, having a ChatGPT-like experience?
GTP4All is an ecosystem to train and deploy powerful and customized large language models that run locally on consumer-grade CPUs – no GPU is required. A GPT4All model is a 3GB - 8GB file that you can download and plug into the GPT4All open-source ecosystem software, which is optimized to host models of size between 7 and 13 billion of parameters.
To start working with the GPT4All Desktop app, you can download it from the official website hereàhttps://gpt4all.io/index.html.
At the moment, there are 3 main LLMs families you can use within GPT4All:
You can download various versions of these models’ families directly within the GPT4All app:
Image 1: Download section in GPT 4
Image 2: Download Path
In my case, I downloaded a GPT-J with 6 billion parameters. You can select the model you want to use in the upper menu list in your application. Once selected the model, you can use it via the well-known ChatGPT UI, with the difference that it is now running on your local PC:
Image 3: GPT4All response
As you can see, the user experience is almost identical to the well-known ChatGPT, with the difference that we are running it locally and with different underlying LLMs.
Another great thing about GPT4All is its integration with your local docs via plugins (currently in beta). To do so, you can go to settings (in the upper bar menu) and select LocalDocs Plugin. Here, you can browse the folder path you want to connect to and then “attach” it to the model knowledge base via the database icon in the upper right. In this example, I used the SQL licensing documentation in PDF format.
Image 4: SQL documentation
In this case, the model is going to answer taking into consideration also (but not only) the attached documentation, which will be quoted in case the answer is based on it:
Image 5: Automatically generated computer description
Image 6: Computer description with medium confidence
The technique used to store and index the knowledge provided by our document is called Retrieval Augmented Generation (RAG), a type of language generation model that combines two types of memories:
Finally, the LocalDocs plugin supports a variety of data formats including txt, docx
, pdf
, html
and xlsx
. For a comprehensive list of the supported formats, you can visit the page https://docs.gpt4all.io/gpt4all_chat.html#localdocs-capabilities.
In addition to the Desktop app mode, GPT4All comes with two additional ways of consumption, which are:
import openai
openai.api_base = "http://localhost:4891/v1"
openai.api_key = "not needed for a local LLM"
prompt = "What is AI?"
# model = "gpt-3.5-turbo"
#model = "mpt-7b-chat"
model = "gpt4all-j-v1.3-groovy"
# Make the API request
response = openai.Completion.create(
model=model,
prompt=prompt,
max_tokens=50,
temperature=0.28,
top_p=0.95,
n=1,
echo=True,
stream=False
)
Running LLMs on local hardware opens a new spectrum of possibilities, especially when we think about disconnected scenarios. Plus, having the possibility to chat with local documents with an easy-to-use interface adds the custom non-parametric memory to the model in such a way that we can use it already as a sort of copilot.
Henceforth, even though it is in an initial phase, this ecosystem is paving the way for new interesting scenarios.
Valentina Alto graduated in 2021 in Data Science. Since 2020 she has been working in Microsoft as Azure Solution Specialist and, since 2022, she focused on Data&AI workloads within the Manufacturing and Pharmaceutical industry. She has been working on customers’ projects closely with system integrators to deploy cloud architecture with a focus on datalake house and DWH, data integration and engineering, IoT and real-time analytics, Azure Machine Learning, Azure cognitive services (including Azure OpenAI Service), and PowerBI for dashboarding. She holds a BSc in Finance and an MSc degree in Data Science from Bocconi University, Milan, Italy. Since her academic journey, she has been writing Tech articles about Statistics, Machine Learning, Deep Learning, and AI in various publications. She has also written a book about the fundamentals of Machine Learning with Python.