Packages
While the standard R system has a number of features and functions available, one of the better aspects of R is the use of packages to add functionalities. A package contains a number of functions (and sometimes sample data) that can be used to solve a particular problem in R. Packages are developed by interested groups for the general good of all R developers. In this chapter, we will be using the following packages:
tm
: This contains text mining toolsXML
: This contains XML processing tools
Text processing
R has built-in functions for manipulating text. These include adjustments to the text to make it more analyzable (such as using word stems or removing punctuation) and developing a document matrix showing usage of words throughout a document. Once these steps are done, we can then submit our documents to analysis and clustering.
Example
In this example, we will perform the following steps:
- We will take an HTML document from the Internet.
- We will clean up the document using text processing...