A very brief history of NLP
If you research the history of NLP, you will not find one conclusive answer as to its origins. As I was planning the outline for this chapter, I realized that I knew quite a bit about the uses and implementation of NLP but that I had a blind spot regarding its history and origins. I knew that it was tied to computational linguistics, but I did not know the history of that field, either. The earliest conceptualization of Machine Translation (MT) supposedly took place in the seventeenth century; however, I am deeply skeptical that this was the origin of the idea of MT or NLP, as I bet people have been puzzling over the relationships between words and characters for as long as language has existed. I would assume that to be unavoidable, as people thousands of years ago were not simpletons. They were every bit as clever and inquisitive as we are, if not more. However, let me give some interesting information I have dug up on the origins of NLP. Please understand that this is not the complete history. An entire book could be written about the origins and history of NLP. So that I quickly move on, I am going to keep this brief. I am going to just list some of the highlights that I found. If you want to know more, this is a rich topic for research.
One thing that puzzles me is that I rarely see cryptology (cryptography and cryptanalysis) mentioned as being part of the origins of NLP or even MT when cryptography is the act of translating a message into gibberish, and cryptanalysis is the act of reversing secret gibberish into a useful message. So, to me, any automation, even hundreds or thousands of years ago, that could assist in carrying out cryptography or cryptanalysis should be part of the conversation. It might not be MT in the same way that modern translation is, but it is a form of translation, nonetheless. So, I would suggest that MT goes back even to the Caesar cipher invented by Julius Caesar, and probably much earlier than that. The Caesar cipher translated a message into code by shifting the text by a certain number. As an example, let’s take the sentence:
I really love NLP.
First, we should probably remove the spaces and casing so that any eavesdropper can’t get hints on word boundaries. The string is now as follows:
ireallylovenlp
If we do a shift-1
, we shift each letter by one character to the right, so we get:
jsfbmmzmpwfomq
The number that we shift is arbitrary. We could also use a reverse shift. Wooden sticks were used for converting text into code, so I would consider that as a translation tool.
After the Caesar cipher, many, many other techniques were invented for encrypting human text, some of which were quite sophisticated. There is an outstanding book called The Code Book by Simon Singh that goes into the several thousand-year-long history of cryptology. With that said, let’s move on to what people typically think of with regard to NLP and MT.
In the seventeenth century, philosophers began to submit proposals for codes that could be used to relate words between languages. This was all theoretical, and none of them were used in the development of an actual machine, but ideas such as MT came about first by considering future possibilities, and then implementation was considered. A few hundred years later, in the early 1900s, Ferdinand de Saussure, a Swiss linguistics professor, developed an approach for describing language as a system. He passed away in the early 1900s and almost deprived the world of the concept of language as a science, but realizing the importance of his ideas, two of his colleagues wrote the Cours de linguistique generale in 1916. This book laid the foundation for the structuralist approach that started with linguistics but eventually expanded to other fields, including computers.
Finally, in the 1930s, the first patents for MT were applied for.
Later, World War II began, and this is what caused me to consider the Caesar cipher and cryptology as early forms of MT. During World War II, Germany used a machine called the Enigma machine to encrypt German messages. The sophistication of the technique made the codes nearly unbreakable, with devastating effects. In 1939, along with other British cryptanalysts, Alan Turing designed the bombe after the Polish bomba that had been decrypting Enigma messages the seven years prior. Eventually, the bombe was able to reverse German codes, taking away the advantage of secrecy that German U-boats were enjoying and saving many lives. This is a fascinating story in itself, and I encourage readers to learn more about the effort to decrypt messages that were encrypted by the Enigma machines.
After the war, research into MT and NLP really took off. In 1950, Alan Turing published Computing Machinery and Intelligence, which proposed the Turing Test as a way of assessing intelligence. To this day, the Turing Test is frequently mentioned as a criterion of intelligence for Artificial Intelligence (AI) to be judged by.
In 1954, the Georgetown experiment fully automated translations of more than sixty Russian sentences into English. In 1957, Noam Chomsky’s Syntactic Structures revolutionized linguistics with a rule-based system of syntactic structures known as Universal Grammar (UG).
To evaluate the progress of MT and NLP research, the US National Research Council (NRC) created the Automatic Language Processing Advisory Committee (ALPAC) in 1964. At the same time, at MIT, Joseph Weizenbaum had created ELIZA, the world’s first chatbot. Based on reflection techniques and simple grammar rules, ELIZA was able to rephrase any sentence into another sentence as a response to users.
Then winter struck. In 1966, due to a report by ALPAC, an NLP stoppage occurred, and funding for NLP and MT was discontinued. As a result, AI and NLP research were seen as a dead end by many people, but not all. This freeze lasted until the late 1980s, when a new revolution in NLP would begin, driven by a steady increase in computational power and the shift to Machine Learning (ML) algorithms rather than hard-coded rules.
In the 1990s, the popularity of statistical models for NLP arose. Then, in 1997, Long Short-Term Memory (LSTM) and Recurrent Neural Network (RNN) models were introduced, and they found their niche for voice and text processing in 2007. In 2001, Yoshua Bengio and his team provided the first feed-forward neural language model. In 2011, Apple’s Siri became known as one of the world’s first successful AI and NLP assistants to be used by general consumers.
Since 2011, NLP research and development has exploded, so this is as far as I will go into history. I am positive that there are many gaps in the history of NLP and MT, so I encourage you to do your own research and really dig into the parts that fascinate you. I have spent much of my career working in cyber security, so I am fascinated by almost anything having to do with the history of cryptology, especially old techniques for cryptography.