Origins of LLMs
Earlier neural networks with their ability to read sentences and predict the next word could only read one word at a time and were called recurrent neural networks, (RNNs). RNNs attempted to mimic human-like sequential processing of words and sentences but faced challenges in handling long-term dependencies between words and sentences due to very limited memory capacity.
In 1925, the groundwork was laid by Wilhelm Lenz and Ernst Ising with their non-learning Ising model, considered an early RNN architecture [Brush, Gemini].
In 1972, Shun’ichi Amari made this architecture adaptive, paving the way for learning RNNs. This work was later popularized by John Hopfield in 1982 [Amari, Gemini].
Due to this, there has been a fair amount of research to find ways to stretch this memory to include more text to get more context. RNNs are transformers. There are other transformers, including LSTMs, which are long short-term memory neural networks that are based on...