Introducing Davinci, Babbage, Curie, and Ada
The massive dataset that is used for training GPT-3 is the primary reason why it's so powerful. However, bigger is only better when it's necessary—and more power comes at a cost. For those reasons, OpenAI provides multiple models to choose from. Today there are four primary models available, along with a model for content filtering and instruct models.
The available models or engines (as they're also referred to) are named Davinci
, Babbage
, Curie
, and Ada
. Of the four, Davinci
is the largest and most capable. Davinci
can perform any tasks that any other engine can perform. Babbage
is the next most capable engine, which can do anything that Curie
or Ada
can do. Ada
is the least capable engine, but the best-performing and lowest-cost engine.
When you're getting started and for initially testing new prompts, you'll usually want to begin with Davinci , then try, Ada
, Babbage
, or Curie
to see if one of them can complete the task faster or more cost-effectively. The following is an overview of each engine and the types of tasks that might be best suited for each. However, keep in mind that you'll want to test. Even though the smaller engines might not be trained with as much data, they are all still general-purpose models.
Davinci
Davinci
is the most capable model and can do anything that any other model can do, and much more—often with fewer instructions. Davinci
is able to solve logic problems, determine cause and effect, understand the intent of text, produce creative content, explain character motives, and handle complex summarization tasks.
Curie
Curie
tries to balance power and speed. It can do anything that Ada
or Babbage
can do but it's also capable of handling more complex classification tasks and more nuanced tasks like summarization, sentiment analysis, chatbot applications, and Question and Answers.
Babbage
Babbage
is a bit more capable than Ada
but not quite as performant. It can perform all the same tasks as Ada
, but it can also handle a bit more involved classification tasks, and it's well suited for semantic search tasks that rank how well documents match a search query.
Ada
Ada
is usually the fastest model and least costly. It's best for less nuanced tasks—for example, parsing text, reformatting text, and simpler classification tasks. The more context you provide Ada
, the better it will likely perform.
Content filtering model
To help prevent inappropriate completions, OpenAI provides a content filtering model that is fine-tuned to recognize potentially offensive or hurtful language.
Instruct models
These are models that are built on top of the Davinci
and Curie
models. Instruct models are tuned to make it easier to tell the API what you want it to do. Clear instructions can often produce better results than the associated core model.
A snapshot in time
A final note to keep in mind about all of the engines is that they are all a snapshot in time, meaning the data used to train them cuts off on the date the model was built. So, GPT-3 is not working with up-to-the-minute or even up-to-the-day data—it's likely weeks or months old. OpenAI is planning to add more continuous training in the future, but today this is a consideration to keep in mind.
All of the GPT-3 models are extremely powerful and capable of generating text that is indistinguishable from human-written text. This holds tremendous potential for all kinds of potential applications. In most cases, that's a good thing. However, not all potential use cases are good.