Why use NLP?
NLP is used in a wide variety of disciplines to solve many different types of problems. Text analysis is performed on text that ranges from a few words of user input for an Internet query to multiple documents that need to be summarized. We have seen a large growth in the amount and availability of unstructured data in recent years. This has taken forms such as blogs, tweets, and various other social media. NLP is ideal for analyzing this type of information.
Machine learning and text analysis are used frequently to enhance an application's utility. A brief list of application areas follow:
- Searching: This identifies specific elements of text. It can be as simple as finding the occurrence of a name in a document or might involve the use of synonyms and alternate spelling/misspelling to find entries that are close to the original search string.
- Machine translation: This typically involves the translation of one natural language into another.
- Summation: Paragraphs, articles, documents, or collections of documents may need to be summarized. NLP has been used successfully for this purpose.
- Named Entity Recognition (NER): This involves extracting names of locations, people, and things from text. Typically, this is used in conjunction with other NLP tasks such as processing queries.
- Information grouping: This is an important activity that takes textual data and creates a set of categories that reflect the content of the document. You have probably encountered numerous websites that organize data based on your needs and have categories listed on the left-hand side of the website.
- Parts of Speech Tagging (POS): In this task, text is split up into different grammatical elements such as nouns and verbs. This is useful in analyzing the text further.
- Sentiment analysis: People's feelings and attitudes regarding movies, books, and other products can be determined using this technique. This is useful in providing automated feedback with regards to how well a product is perceived.
- Answering queries: This type of processing was illustrated when IBM's Watson successfully won a Jeopardy competition. However, its use is not restricted to winning game shows and has been used in a number of other fields including medicine.
- Speech recognition: Human speech is difficult to analyze. Many of the advances that have been made in this field are the result of NLP efforts.
- Natural Language Generation: This is the process of generating text from a data or knowledge source, such as a database. It can automate reporting of information such as weather reports, or summarize medical reports.
NLP tasks frequently use different machine learning techniques. A common approach starts with training a model to perform a task, verifying that the model is correct, and then applying the model to a problem. We will examine this process further in Understanding NLP models later in the chapter.