Chapter 6. Named Entity Recognition in Text
The next text mining tool we are going to add to our toolbox is actually from the domain of information extraction. When we talk about information extraction, we typically mean text mining techniques that use natural language processing to pull out key pieces of desired information from a large amount of unstructured text. I like to think of information extraction as being like a gold miner's sifting pan. Using these tools, we extract only the good stuff - the gold nuggets - and let the rest of the dirt fall away. In this chapter, the gold nuggets we will be sifting for are called named entities. Given a semi-structured or unstructured body of text, can we locate and extract all the named entities, such as people, places, or organizations, and leave the rest of the text behind?
In this chapter, we will learn:
- What named entities are and why they are useful to search for
- What the different techniques are for finding named entities,...