The Basics of Natural Language Processing
To understand what natural language processing is, let's break the term into two:
- Natural language is a form of written and spoken communication that has developed organically and naturally.
- Processing means analyzing and making sense of input data with computers.
Figure 1.1: Natural language processing
Therefore, natural language processing is the machine-based processing of human communication. It aims to teach machines how to process and understand the language of humans, thereby allowing an easy channel of communication between human and machines.
For example, the personal voice assistants found in our phones and smart speakers, such as Alexa and Siri, are a result of natural language processing. They have been created in such a manner that they are able to not only understand what we say to them but also to act upon what we say and respond with feedback. Natural language processing algorithms aid these technologies in communicating with humans.
The key thing to consider in the mentioned definition of natural language processing is that the communication needs to occur in the natural language of humans. We've been communicating with machines for decades now by creating programs to perform certain tasks and executing them. However, these programs are written in languages that are not natural languages, because they are not forms of spoken communication and they haven't developed naturally or organically. These languages, such as Java, Python, C, and C++, were created with machines in mind and the consideration always being, "what will the machine be able to understand and process easily?"
While Python is a more user-friendly language and so is easier for humans to learn and be able to write code in, the basic point remains the same – to communicate with a machine, humans must learn a language that the machine is able to understand.
Figure 1.2: Venn diagram for natural language processing
The purpose of natural language processing is the opposite of this. Rather than having humans conform to the ways of a machine and learn how to effectively communicate with them, natural language processing enables machines to conform to humans and learn their way of communication. This makes more sense since the aim of technology is to make our lives easier.
To clarify this with an example, your first ever program was probably a piece of code that asked the machine to print 'hello world'. This was you conforming to the machine and asking it to execute a task in a language that it understood. Asking your voice assistant to say 'hello world' by voicing this command to it, and having it say 'hello world' back to you, is an example of the application of natural language processing, because you are communicating with a machine in your natural language (in this case, English). The machine is conforming to your form of communication, understanding what you're saying, processing what you're asking it to do, and then executing the task.
Importance of natural language processing
The following figure illustrates the various sections of the field of artificial intelligence:
Fig 1.3: Artificial intelligence and some of its subfields
Along with machine learning and deep learning, natural language processing is a subfield of artificial intelligence, and because it deals with natural language, it's actually at the intersection of artificial intelligence and linguistics.
As mentioned, natural language processing is what enables machines to understand the language of humans, thus allowing an efficient channel of communication between the two. However, there is another reason Natural language processing is necessary, and that is because, like machines, machine learning and deep learning models work best with numerical data. Numerical data is hard for humans to naturally produce; imagine us talking in numbers rather than words. So, natural language processing works with textual data and converts it into numerical data, enabling machine learning and deep learning models to be fitted on it. Thus, it exists to bridge the communication gap between humans and machines by taking the spoken and written forms of language from humans and converting them into data that can be understood by machines. Thanks to natural language processing, the machine is able to make sense of, answer questions based on, solve problems using, and communicate in a natural language, among other things.