What this book covers
Chapter 1, Statistical Linguistics with R, covers the basics of statistical analysis, which forms the basis of computational linguistic. This chapter also discusses about various R packages for text mining and their utilities.
Chapter 2, Processing Text, intends to guide readers in handling textual data, right from scratch. Accessing the data from various sources, cleansing texts using Regular expressions, stop words, and help develop skills to process raw texts effectively using R language.
Chapter 3, Categorizing and Tagging Text, empowers the readers to categorize the texts into different word classes or lexical categories.
Chapter 4, Dimensionality Reduction, covers in detail, the various dimensionality reduction methods that can be applied on text data and extending the concept to extract contexts from data in the next chapter.
Chapter 5, Text summarization and Clustering, deals with text summarization and methods that can be applied to textual documents.
Chapter 6, Text Classification, deals with pattern recognition in text data, using classification mechanism. We will deal with statistical and mathematical aspects along with the implementation on public data sets using R language.
Chapter 7, Entity Recognition, deals with named entity recognition using R and extends the concepts further to the ontology Learning and expansion concepts.