GPT-4 feels like Einstein's theory of general relativity, in that it will take programmers a long time to uncover all the new possibilities which is analogous to how physicists today are still extracting useful insights from Einstein's theory of general relativity.
Here is one example. Big problems can be solved by breaking them down into small tasks. Our jobs usually boil down into performing one problem-solving or productive activity after the next. So, what if you could discern clearly what those smaller tasks were and give them to your side-kick brain to figure out and run for you ‘auto-magically’?
Note that the above says, what if you could discern. Already we bump up against a lesson in using LLMs. Why do you need to discern it? Can’t GPT-4 be given the job of discerning the tasks required? If so, could it then orchestrate other GPT-4 “agents” to break down the task further? This is exactly where we can make use of a tool like Auto-GPT.
Let’s look at an example that I am developing right now for an Auto-GPT using Rust.
Say you build websites for a living. Websites need a web server. Web servers take a request (usually through an API), run some code in line with that request and send back a response. Let’s break down the goal of building a web server into what actors or agents might be required when building an Auto-GPT:
You may have guessed that we will not be hiring 7 members of staff to build web servers. However, planning out this structure is necessary to help us organise building a bot that can itself build, test, and deploy any backend server you ask it to.
Our agents will need tools to work with. They need to be given a set of instructions and functions that will get the subtask done without the usual response of “I am just an AI language model”. When developing such a tool, it became clear to me that a new paradigm has hit developers, “AI Functions”. What we are about to discuss brings up potential gaps in the guard-rails of GPT-4 which is unlikely to be news to the OpenAI team but may be of some surprise to you.
The fact that AI hallucinates would annoy most people. But not to programmers and the folk who developed Marvin. Right now, when you ask ChatGPT to provide you with a response, it sends a whole bunch of jargon with it.
What if I told you there was a hack, a way to get distilled and relevant information out of ChatGPT automatically through the OpenAI API that leads to a structured response that can drive actions in your code. Here is an example. Suppose you give an AI the following “function” which is just text disguised as a programming function:
struct Sentiment {
happy_rating: 0,
sad_rating: 0
}
fn convert_comments_into_sentiment(comments: String) -> Sentiment {
// Function takes in comments from YouTube video
// Considers all aspects of what is said in comments
// Returns Sentiment struct with happy and sad rating
// Does not return anything else. No commentary.
return sentiment;
}
You then ask the AI to print only what the function would return and pass it the YouTube comments as an input. The AI (GPT-4) would then hallucinate a response, even without there being any actual code within the function. None. There is no code to perform this task.
No jargon like “I am an AI I cannot blah blah blah” is returned. Nothing like that, you would extract the AI’s pure sentiment analysis in a format which you can then build into your application.
The hack here is that we have built a pretend function, without any code in it. We described exactly what this function (written in Rust syntax in this example) does and the structure of its output using comments and a struct.
GPT-4 then interprets what it thinks the function does and assesses it against the input provided (YouTube comments in this example). The output is then structured in the LLM’s response. Therefore, we can have LLMs supply responses that are structured in such a way that can be used by other agents (also the GPT-4 LLM).
The entire process would look something like this:
This is an oversimplified view of what is happening. Note that agents are not always needed. Sometimes the function can just be called as regular code, no LLM needed.
Why stop there? Auto-GPTs will likely then become like plugins. With Auto-GPTs popping up everywhere, most software related tasks can be taken care of. Rather than calling an API, your Auto-GPT could call another Auto-GPT. Rather than write AI Functions, a primary agent would know what Auto-GPTs should be assigned to various tasks. It could then write its own program that connects to them.
This sort of recursion or self-improvement is where predicted growth in AI is already shown to function as predicted either exponentially or in sigmoid-like jumps. My own version of this self-programming task is already done. It was surprisingly trivial to do with GPT-4’s help. It works well, but it doesn’t replace human developers yet.
GPT-4 is not without some annoying limitations; these include:
These limitations could technically be worked around by breaking down tasks into even smaller chunks, but this is where I draw a line. What is the point of trying to automate anything, if you end up having to code everything? However, it has become clear through this exercise, that just a few improvements on the above limitations will lead to a significant amount of task automation with AutoGPTs.
One unique algorithm which has worked very well with the current technology is the ability for GPT-4 to write code and for our program to then execute and test the code and provide feedback to GPT-4 of what is not working. The AI Function then prompts GPT-4 as a “Senior Developer Agent” to re-write the code and return to the “Quality Agent” for more unit testing.
What you end up with is a program that can write code, test it, and confirm all is working. This is working now and is useful. The downside is still the cost of making too many calls to the OpenAI API and some quality limitations with GPT-4. Therefore, until GPT-5, I remain working directly in my development environment or in the ChatGPT interface extracting quick code snippets.
It sounds clichéd but I’m learning something new every day working with LLMs. A recurring realisation is that we should not only be trying to solve newer problems in less time, but rather, learning what the right questions to ask our current LLMs even are. They are capable of more than I believe we realise. When Open Source LLMs reach a GPT-5 like level, Auto-GPTs I suspect will be everywhere.
Shaun McDonogh is the lead developer and researcher at Crypto Wizards. He has 6 years’ experience in developing high performance trading software solutions with ML and 15 years’ experience working with frontend and backend web technologies.
In his prior career, Shaun worked as a Finance Director, Data Scientist and Analytics Lead in various multi-million-dollar entities before making the transition to developing pioneering solutions for developers in emerging tech industries. His favorite programming languages to teach include Rust, Typescript and Python.
Shaun’s mission is to support developers in navigating the ever-changing landscape of emerging technologies in computer software engineering through a new coding arm, Code Raiders.