To tell the story of how we got here, to AI tools like ChatGPT, powered by large language models (LLMs), let’s first cover natural language processing (NLP).
NLP is a field of computer science, artificial intelligence, and computational linguistics. It’s concerned with the interactions between computers and human language, and how to program computers to process and analyze large amounts of natural language data. NLP is a hugely interesting area that has a range of useful applications in the real world. Here are some:
- Speech recognition: If you have a modern smartphone, you’ve likely interacted with voice assistants like Siri or Alexa, for example.
- Machine translation: Google Translate is perhaps what comes to mind when thinking of machine translation, the ability to translate from one language to another automatically.
- Sentiment analysis: A very useful area is understanding the sentiment in areas like social media, for example. Companies want to know how brands are perceived; e-commerce wants to quickly understand product reviews to boost their business.
- Chatbots and virtual assistants: You’ve likely seen chatbots being integrated on web pages even before the advent of ChatGPT. These chatbots can answer simpler questions, and companies have them to ensure you quickly get an answer to simpler questions and provide a more natural experience than an FAQ page, among other usage areas.
- Text summaries: Search engines come to mind again when thinking about text summaries. You might have seen how, when you use search engines like Bing or Google, it’s able to summarize a page and show the summary together with the link to the page in a search result page. As a user, you get a better understanding of what link to click.
- Content recommendation: This is another important area used by a variety of different domains. E-commerce uses this to present products you’re likely to be interested in, Xbox uses this to recommend what games to play and buy, and video streaming services display content you might want to watch next.
As you can see already, with NLP, both companies and end users benefit greatly from adopting it.
The rise of LLMs
How did we evolve from NLP to LLMs, then? Initially, NLP used rule-based systems and statistical methods underneath. This approach, although working well for some tasks, struggled with human language.
This changed for the better when deep learning, a subset of machine learning, was introduced to NLP, and we got models like RNN, recurrent neural networks, and transformer-based models, capable of learning patterns in data. The result was a considerable improvement in performance. With transformer-based models, we’re starting to lay the foundations of large language models.
LLMs are a type of transformer model. They can generate human-like text and, unlike NLP models, they’re good at a variety of tasks without needing specific training data. How is this possible, you ask? The answer is a combination of improved architecture, a vast increase in computational power, and gigantic datasets.
LLMs rest on the idea that a large enough neural network can learn to do anything, given enough data and compute. This is a paradigm shift in how we program computers. Instead of writing code, we write prompts and let the model do the rest.
GPT models
There are many different types of LLMs out there, but let’s focus on GPT for a second, a type of LLM on which the book’s chosen tools are based (even if GitHub Copilot uses a specific subset known as Codex).
There have been several different versions developed in the last few years. Here are some models developed by the company OpenAI:
- GPT-1: The first one, with 117 million parameters using transformer architecture.
- GPT-2: This model has 1.5 billion parameters and is able to generate coherent and relevant text.
- GPT-3: This model has 175 billion parameters and is considerably better than its predecessor with features like answering questions, fiction generation, and even writing code.
- GPT-4: This model has been quoted to have 1.76 trillion parameters.
- The number of parameters allows the model to understand more nuanced and coherent text. It should also be said that the larger the model, the larger the computational resources that are needed to train it.
- ChatGPT recently switched to GPT-4 and the difference compared to GPT-3 is significant.
How LLMs are better
Now that we have a better understanding of how LLMs came to be and where they came from, what makes LLMs great? What are some good examples of why we really should adopt AI assistants based on LLMs?
Because LLMs are bigger and more advanced, there are some areas in which they clearly outperform traditional NLP models:
- Context: LLMs can understand not just the recent input but can produce responses based on a longer conversation.
- Few-shot learning: To perform a task, LLMs usually just need a few examples to produce a correct response. This should be contrasted with NLP models, which usually use a large amount of task-specific training data to perform properly.
- Performance: LLMs are better than traditional NLP models in areas like translations, questions, and summarization.
It’s worth mentioning that LLMs aren’t perfect; they do generate incorrect responses and can sometimes make up responses, also known as hallucinations. It’s our hope though that by reading this book, you will see the advantages of using LLM-based AI assistants and you will feel the pros clearly outweigh the cons.