A regular expression is simply a sequence of character strings that defines the search pattern. In natural language, processing and text mining are the two areas where regular expressions are used a lot. There are other application areas as well. In this recipe, you will perform text data pre-processing without using the tm library but by using a regular expression.
Using regular expression in text processing
Getting ready
Suppose you have a corpus of documents and your objective is to find the frequent words in the corpus. So, the first thing is to do the pre-processing and then create term a document matrix. In this recipe, you will use a regular expression on the text data retrieved from a web page using the readLines(...