Overview of Auto-GPT
Auto-GPT is more or less a category of what it already describes:
This means it automates GPT or ChatGPT. However, in this book, the main focus is on Auto-GPT by name. If you haven’t heard of it and just grabbed this book out of curiosity, then you’re in the right place!
Auto-GPT started as an experimental self-prompting AI application that is an attempt to create an autonomous system capable of creating “agents” to perform various specialized tasks to achieve larger objectives with minimal human input. It is based on OpenAI’s GPT and was developed by Toran Bruce Richards, who is better known by his GitHub handle Significant Gravitas.
Now, how does Auto-GPT think? Auto-GPT creates prompts that are fed to large language models (LLMs) and allows AI models to generate original content and execute command actions such as browsing, coding, and more. It represents a significant step forward in the development of autonomous AI, making it the fastest-growing open source project in GitHub’s history (at the time of writing).
Auto-GPT strings together multiple instances of OpenAI’s language model – GPT – and by doing so creates so-called “agents” that are tasked with simplified tasks. These agents work together to accomplish complex goals, such as writing a blog, with minimal human intervention.
Now, let’s talk about how it rose to fame.
From an experiment to one of the fastest-growing GitHub projects
Auto-GPT was initially named Entrepreneur-GPT and was released on March 16, 2023. The initial goal of the project was to give GPT-4 autonomy to see if it could thrive in the business world and test its capability to make real-world decisions.
For some time, the development of Auto-GPT remained mostly unnoticed until late March 2023. However, on March 30, 2023, Significant Gravitas tweeted about the latest demo of Auto-GPT and posted a demo video, which began to gain traction. The real surge in interest came on April 2, 2023, when computer scientist Andrej Karpathy quoted one of Significant Gravitas’ tweets, saying that the next frontier of prompt engineering was Auto-GPT.
This tweet went viral, and Auto-GPT became a subject of discussion on social media. One of the agents that was created by Auto-GPT, known as ChaosGPT, became particularly famous when it was humorously assigned the task of “destroying humanity,” which contributed to the viral nature of Auto-GPT (https://decrypt.co/126122/meet-chaos-gpt-ai-tool-destroy-humanity).
Of course, we don’t want to destroy humanity; for a reference on what Entrepreneur-GPT can do, take a look at the old logs of Entrepreneur-GPT here:
https://github.com/Significant-Gravitas/Auto-GPT/blob/c6f61db06cde7bd766e521bf7df1dc0c2285ef73/.
The more creative you are with your prompts and configuration, the more creative Auto-GPT will be. This will be covered in Chapter 2 when we run our first Auto-GPT instance together.
LLMs – the core of AI
Although Auto-GPT can be used with other LLMs, it best leverages the power of GPT-4, a state-of-the-art language model by OpenAI.
It offers a huge advantage for users who don’t own a graphics card that can hold models such as GPT-4 equivalents. Although there are many 7-B and 13-B LLMs (B stands for billion parameters) that do compete with ChatGPT, they cannot hold enough context in each prompt to be useful or are just not stable enough.
At the time of writing, GPT-4 and GPT-3.5-turbo are both used with Auto-GPT by default. Depending on the complexity of the situation, Auto-GPT differs between two types of models:
- Smart model
- Fast model
When does Auto-GPT use GPT-3.5-turbo and not GPT-4 all the time?
When Auto-GPT goes through its thought process, it uses the fast model. For example, as Auto-GPT loops through its thoughts, it uses the configured fast model, but when it summarizes the content of a website or writes code, it will decide to use the smart model.
The default for the fast model is GPT-3.5-turbo. Although it isn’t as precise as GPT-4, its response time is much better, leading to a more fluent response time; GPT-4 can seem stuck if it thinks for too long.
OpenAI has also added new functionalities to assist applications such as Auto-GPT. One of them is the ability to call functions. Before this new feature, Auto-GPT had to explain to GPT what a command is and how to formulate it correctly in text. This resulted in many errors as GPT sometimes decides to change the syntax of the output that’s expected. This was a huge step forward as this feature now reduces the complexity of how commands are communicated and executed. This empowers GPT to better understand what the context of each task is.
So, why don’t we use an LLM directly? Because LLMs are only responsive:
- They cannot fulfill any tasks
- Their knowledge is fixed, and they cannot update it themselves
- They don’t remember anything; only frameworks that run them can do it
How does Auto-GPT make use of LLMs?
Auto-GPT is structured in a way that it takes in an initial prompt from the user via the terminal:
Figure 1.1 – Letting Auto-GPT define its role
Here, you can either define a main task or enter –-manual
to then answer questions, as shown here:
Figure 1.2 – Setting Auto-GPT’s main goals
The main prompt is then saved as an ai_settings.yaml
file that may look like this:
ai_goals: - Conduct a thorough analysis of the current state of the book and identify areas for improvement. - Develop a comprehensive plan for creating task lists that will help you structure research, a detailed outline per chapter and individual parts. - Be sure to ask the user for feedback and improvements. - Continuously assess the current state of the work and use the speak property to give the user positive affirmations. ai_name: AuthorGPT ai_role: An AI-powered author and researcher specializing in creating comprehensive, well-structured, and engaging content on Auto-GPT and its plugins, while maintaining an open line of communication with the user for feedback and guidance. api_budget: 120.0
Let’s look at some of the AI components in the preceding file:
- First, we have
ai_goals
, which specifies the main tasks that Auto-GPT must undertake. It will use those to decide which individual steps to take. Each iteration will decide to follow one of the goals. - Then, we have
ai_name
, which is also taken as a reference and defines parts of the behavior or character of the bot. This means that if you call it AuthorGPT, it will play the role of a GPT-based author, while if you call it Author, it will try to behave like a person. It is generally hard to tell how it will behave because GPT mostly decides what it puts out on its own. - Finally, we have
ai_role
, which can be viewed as a more detailed role description. However, in my experience, it only nudges the thoughts slightly. Goals are more potent here.
Once this is done, it summarizes what it’s going to do and starts thinking correctly:
Figure 1.3 – Example of Auto-GPT’s thought process
Thinking generally means that it is sending a chat completion request to the LLM.
This process can be slow – the more tokens that are used, the more processing that’s needed. In the Understanding tokens in LLMs section, we will take a look at what this means.
Once Auto-GPT has started “thinking,” it initiates a sequence of AI “conversations.” During these conversations, it forms a query, sends it to the LLM, and then processes the response. This process repeats until it finds a satisfactory solution or reaches the end of its thinking time.
This entire process produces thoughts. These fall into the following categories:
- Reasoning
- Planning
- Criticism
- Speak
- Command
These individual thoughts are then displayed in the terminal and the user is asked whether they want to approve the command or not – it’s that simple.
Of course, a lot more goes on here, including a prompt being built to create that response.
Simply put, Auto-GPT passes the name, role, goals, and some background information. You can see an example here: https://github.com/PacktPublishing/Unlocking-the-Power-of-Auto-GPT-and-Its-Plugins/blob/main/Auto-GPT_thoughts_example.md.
Auto-GPT’s thought process – understanding the one-shot action
Let’s understand the thought process behind this one-shot action:
- Overview of the thought process: Auto-GPT operates on a one-shot action basis. This approach involves processing each data block that’s sent to OpenAI as a single chat completion action. The outcome of this process is that a response text from GPT is generated that’s crafted based on a specified structure.
- Structure and task definition for GPT: The structure that’s provided to GPT encompasses both the task at hand and the format for the response. This dual-component structure ensures that GPT’s responses are not only relevant but also adhere to the expected conversational format.
- Role assignment in Auto-GPT: There are two role assignments here:
- System role: The “system” role is crucial in providing context. It functions as a vessel for information delivery and maintains the historical thread of the conversation with the LLM.
- User role: Toward the end of the process, a “user” role is assigned. This role is pivotal in guiding GPT to determine the subsequent command to execute. It adheres to a predefined format, ensuring consistency in interactions.
- Command options and decision-making: GPT is equipped with various command options, including the following:
- Ask the user (
ask_user
) - Sending messages (
send_message
) - Browsing (
browse
) - Executing code (
execute_code
)
- Ask the user (
In some instances, Auto-GPT may opt not to select any command. This typically occurs in situations of confusion, such as when the provided task is unclear or when Auto-GPT completes a task and requires user feedback for further action.
Either way, each response is only one text and just a text that is being autocompleted, meaning the LLM only responds once with such a response.
In the following example, I have the planner plugin activated; more on plugins later:
{ "thoughts": { "text": "I need to start the planning cycle to create a plan for the book.", "reasoning": "Starting the planning cycle will help me outline the steps needed to achieve my goals.", "plan": "- run_planning_cycle - research Auto-GPT and its plugins - collaborate with user - create book structure - write content - refine content based on feedback", "criticism": "I should have started the planning cycle earlier to ensure a smooth start.", "speak": "I'm going to start the planning cycle to create a plan for the book." }, "command": { "name": "run_planning_cycle", "args": {} } }
Each thought property is then displayed to the user and the “speak” output is read aloud if text-to-speech is enabled:
"I am going to start the planning cycle to create a plan for the book. I want to run planning cycle."
The user can now respond in one of the following ways:
y
: To accept the execution.n
: To decline the execution and close Auto-GPT.s
: To let Auto-GPT re-evaluate its decisions.y -n
: To tell Auto-GPT to just keep going for the number of steps (for example, entery -5
to allow it to run on its own for 5 steps). Here,n
is always a number.
If the user confirms, the command is executed and the result of that command is added as system content:
# Check if there is a result from the command append it to the message # history if result is not None: self.history.add("system", result, "action_result")
At this point, you’re probably wondering what history is in this context and why self
?
Auto-GPT uses agents and the instance of the agent has its own history that acts as a short-term memory. It contains the context of what the previous messages and results were.
The history is trimmed down on every run cycle of the agent to make sure it doesn’t reach its token limit.
So, why not directly ask the LLM for a solution? There are several reasons for this:
- While LLMs are incredibly sophisticated, they cannot solve complex, multi-step problems in a single query. Instead, they need to be asked a series of interconnected questions that guide them toward a final solution. This is where Auto-GPT shines – it can strategically ask these questions and digest the responses.
- LLMs can’t maintain their context. They don’t remember previous queries or answers, which means they cannot build on past knowledge to answer future questions. Auto-GPT compensates for this by maintaining a history of the conversation, allowing it to understand the context of previous queries and responses and use that information to craft new queries.
- While LLMs are powerful tools for generating human-like text, they cannot take initiative. They respond to prompts but don’t actively seek out new tasks or knowledge. Auto-GPT, on the other hand, is designed to be more proactive. It not only responds to the tasks that have been assigned to it but also proactively explores diverse ways to accomplish those tasks, making it a true autonomous agent.
Before we delve deeper into how Auto-GPT utilizes LLMs, it’s important to understand a key component of how these models process information: tokens.