Road to ChatGPT: the math of the model behind it
Since its foundation in 2015, OpenAI has invested in the research and development of the class of models called Generative Pre-trained Transformers (GPT), and they have captured everyone’s attention as being the engine behind ChatGPT.
GPT models belong to the architectural framework of transformers introduced in a 2017 paper by Google researchers, Attention Is All You Need.
The transformer architecture was introduced to overcome the limitations of traditional Recurrent Neural Networks (RNNs). RNNs were first introduced in the 1980s by researchers at the Los Alamos National Laboratory, but they did not gain much attention until the 1990s. The original idea behind RNNs was that of processing sequential data or time series data, keeping information across time steps.
Indeed, up to that moment in time, the classic Artificial Neural Network (ANN) architecture was that of the feedforward ANN, where the output of each hidden...