ChatGPT on Your Hardware: GPT4All

We all got familiar with conversational UI powered by Large Language Models: ChatGPT has been the first and most powerful example of how LLMs can boost our productivity daily. To be so accurate, LLMs are, by design, “large”, meaning that they are made of billions of parameters, hence they are hosted in powerful infrastructure, typically in the public cloud (namely, OpenAI’s models including ChatGPT are hosted in Microsoft Azure). As such, those models are accessible with an internet connection.

But what if you could run those powerful models on your local PC, having a ChatGPT-like experience?

Introducing GPT4All

GTP4All is an ecosystem to train and deploy powerful and customized large language models that run locally on consumer-grade CPUs – no GPU is required. A GPT4All model is a 3GB - 8GB file that you can download and plug into the GPT4All open-source ecosystem software, which is optimized to host models of size between 7 and 13 billion of parameters.

To start working with the GPT4All Desktop app, you can download it from the official website hereàhttps://gpt4all.io/index.html.

At the moment, there are 3 main LLMs families you can use within GPT4All:

LlaMa - a collection of foundation language models ranging from 7B to 65B parameters. They were developed by Meta AI, Facebook’s parent company, and trained on trillions of tokens from 20 languages that use Latin or Cyrillic scripts. LLaMA can generate human-like conversations and outperforms GPT-3 on most benchmarks despite being 10x smaller. LLaMA is designed to run on less computing power and to be versatile and adaptable to many different use cases. However, LLaMA also faces challenges such as bias, toxicity, and hallucinations in large language models. Meta AI has released all the LLaMA models to the research community for open science.
GPT-J- An open-source artificial intelligence language model developed by EleutherAI. It is a GPT-2-like causal language model trained on the Pile dataset, which is an open-source 825-gigabyte language modeling data set that is split into 22 smaller datasets. GPT-J comes in sizes ranging from 7B to 65B parameters and can generate creative text, solve mathematical theorems, predict protein structures, answer reading comprehension questions, and more. GPT-J performs very similarly to similarly-sized OpenAI’s GPT-3 versions on various zero-shot down-streaming tasks and can even outperform it on code generation tasks. GPT-J is designed to run on less computing power and to be versatile and adaptable to many different use cases.
MPT - series of open source, commercially usable large language models developed by MosaicML. MPT-7B is a decoder-style transformer pretrained from scratch on 1T tokens of English text and code. MPT-7B is part of the family of MosaicPretrainedTransformer (MPT) models, which use a modified transformer architecture optimized for efficient training and inference. These architectural changes include performance-optimized layer implementations and the elimination of context length limits by replacing positional embeddings with Attention to Linear Biases (ALiBi). MPT-7B can handle highly long inputs thanks to ALiBi (up to 84k tokens vs. 2k-4k for other open-source models). MPT-7B also has several finetuned variants for different tasks, such as story writing, instruction following, and dialogue generation.

You can download various versions of these models’ families directly within the GPT4All app:

Image 1: Download section in GPT 4

Image 2: Download Path

In my case, I downloaded a GPT-J with 6 billion parameters. You can select the model you want to use in the upper menu list in your application. Once selected the model, you can use it via the well-known ChatGPT UI, with the difference that it is now running on your local PC:

chatgpt-on-your-hardware-gpt4all-img-2

Image 3: GPT4All response

As you can see, the user experience is almost identical to the well-known ChatGPT, with the difference that we are running it locally and with different underlying LLMs.

Chatting with your own data

Another great thing about GPT4All is its integration with your local docs via plugins (currently in beta). To do so, you can go to settings (in the upper bar menu) and select LocalDocs Plugin. Here, you can browse the folder path you want to connect to and then “attach” it to the model knowledge base via the database icon in the upper right. In this example, I used the SQL licensing documentation in PDF format.

chatgpt-on-your-hardware-gpt4all-img-3

Image 4: SQL documentation

In this case, the model is going to answer taking into consideration also (but not only) the attached documentation, which will be quoted in case the answer is based on it:

chatgpt-on-your-hardware-gpt4all-img-5

Image 5: Automatically generated computer description

chatgpt-on-your-hardware-gpt4all-img-6

Image 6: Computer description with medium confidence

The technique used to store and index the knowledge provided by our document is called Retrieval Augmented Generation (RAG), a type of language generation model that combines two types of memories:

Pre-trained parametric memoryàthe one stored in the model’s parameters, derived from the training dataset;
Non-parametric memoryàthe one derived from the attached knowledge provided, which is in the form of a vector database where the documents’ embeddings are stored.

Finally, the LocalDocs plugin supports a variety of data formats including txt, docx, pdf, html and xlsx. For a comprehensive list of the supported formats, you can visit the page https://docs.gpt4all.io/gpt4all_chat.html#localdocs-capabilities.

Using GPT4All with API

In addition to the Desktop app mode, GPT4All comes with two additional ways of consumption, which are:

Server mode- once enabled the server mode in the settings of the Desktop app, you can start using the API key of GPT4All at localhost 4891, embedding in your app the following code:

import openai
 
openai.api_base = "http://localhost:4891/v1"
 
openai.api_key = "not needed for a local LLM"
 
prompt = "What is AI?"
 
# model = "gpt-3.5-turbo"
#model = "mpt-7b-chat"
model = "gpt4all-j-v1.3-groovy"
 
# Make the API request
response = openai.Completion.create(
    model=model,
    prompt=prompt,
    max_tokens=50,
    temperature=0.28,
    top_p=0.95,
    n=1,
    echo=True,
    stream=False
)

Python API-It comes with a lightweight SDK of GPT4All which you can easily install via pip install gpt4all. Here you can find a sample notebook on how to use this SDK: https://colab.research.google.com/drive/1QRFHV5lj1Kb7_tGZZGZ-E6BfX6izpeMI?usp=sharing

Conclusions

Running LLMs on local hardware opens a new spectrum of possibilities, especially when we think about disconnected scenarios. Plus, having the possibility to chat with local documents with an easy-to-use interface adds the custom non-parametric memory to the model in such a way that we can use it already as a sort of copilot.

Henceforth, even though it is in an initial phase, this ecosystem is paving the way for new interesting scenarios.

References

Author Bio

Valentina Alto graduated in 2021 in Data Science. Since 2020 she has been working in Microsoft as Azure Solution Specialist and, since 2022, she focused on Data&AI workloads within the Manufacturing and Pharmaceutical industry. She has been working on customers’ projects closely with system integrators to deploy cloud architecture with a focus on datalake house and DWH, data integration and engineering, IoT and real-time analytics, Azure Machine Learning, Azure cognitive services (including Azure OpenAI Service), and PowerBI for dashboarding. She holds a BSc in Finance and an MSc degree in Data Science from Bocconi University, Milan, Italy. Since her academic journey, she has been writing Tech articles about Statistics, Machine Learning, Deep Learning, and AI in various publications. She has also written a book about the fundamentals of Machine Learning with Python.

LinkedIn Medium