Getting Started with Information Extraction
In this chapter, we will cover the basics of information extraction. Information extraction is the task of pulling very specific information from text. For example, you might want to know the companies mentioned in a news article. Instead of spending time reading the whole article, you can use information extraction techniques to access the companies almost instantly.
We will start with extracting emails addresses and URLs from job announcements. Then, we will use an algorithm called Levenshtein distance to find similar strings. Next, we will extract important keywords from text. After that, we will use spaCy to find named entities in text, and later, we will train our own named entity recognition model in spaCy. We will then do basic sentiment analysis, and, finally, we will train two custom sentiment analysis models.
You will learn how to use existing tools and train your own models for information extraction tasks.
We will cover...