You're reading from Natural Language Processing with Java Explore various approaches to organize and extract useful text from unstructured data using Java

Product type Paperback

Published in Mar 2015

Publisher

ISBN-13 9781784391799

Length 262 pages

Edition 1st Edition

Languages

Java

Concepts

Data Analysis

Authors (2):

Richard M. Reese

Richard M Reese

View More author details

Table of Contents (10) Chapters

Preface

1. Introduction to NLP

2. Finding Parts of Text FREE CHAPTER

3. Finding Sentences

4. Finding People and Things

5. Detecting Part of Speech

6. Classifying Texts and Documents

7. Using Parser to Extract Relationships

8. Combined Approaches

Index

What this book covers

Chapter 1, Introduction to NLP, explains the importance and uses of NLP. The NLP techniques used in this chapter are explained with simple examples illustrating their use.

Chapter 2, Finding Parts of Text, focuses primarily on tokenization. This is the first step in more advanced NLP tasks. Both core Java and Java NLP tokenization APIs are illustrated.

Chapter 3, Finding Sentences, proves that sentence boundary disambiguation is an important NLP task. This step is a precursor for many other downstream NLP tasks where text elements should not be split across sentence boundaries. This includes ensuring that all phrases are in one sentence and supporting parts of speech analysis.

Chapter 4, Finding People and Things, covers what is commonly referred to as Named Entity Recognition. This task is concerned with identifying people, places, and similar entities in text. This technique is a preliminary step for processing queries and searches.

Chapter 5, Detecting Parts of Speech, shows you how to detect parts of speech, which are grammatical elements of text, such as nouns and verbs. Identifying these elements is a significant step in determining the meaning of text and detecting relationships within text.

Chapter 6, Classifying Texts and Documents, proves that classifying text is useful for tasks such as spam detection and sentiment analysis. The NLP techniques that support this process are investigated and illustrated.

Chapter 7, Using Parser to Extract Relationships, demonstrates parse trees. A parse tree is used for many purposes, including information extraction. It holds information regarding the relationships between these elements. An example implementing a simple query is presented to illustrate this process.

Chapter 8, Combined Approaches, contains techniques for extracting data from various types of documents, such as PDF and Word files. This is followed by an examination of how the previous NLP techniques can be combined into a pipeline to solve larger problems.

The rest of the chapter is locked

You're reading from Natural Language Processing with Java Explore various approaches to organize and extract useful text from unstructured data using Java

Table of Contents (10) Chapters

What this book covers

Unlock this book and the full library FREE for 7 days

Authors (2)

Personalised recommendations for you