Subscription

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Conferences

Free Learning

You're reading from Natural Language Processing with Java Techniques for building machine learning and neural network models for NLP

Product type Paperback

Published in Jul 2018

Publisher

ISBN-13 9781788993494

Length 318 pages

Edition 2nd Edition

Languages

Java

Tools

Processing

Concepts

Machine Learning

Authors (2):

Ashish Bhatia

Richard M. Reese

View More author details

Table of Contents (14) Chapters

Preface

1. Introduction to NLP FREE CHAPTER

2. Finding Parts of Text

3. Finding Sentences

4. Finding People and Things

5. Detecting Part of Speech

6. Representing Text with Features

7. Information Retrieval

8. Classifying Texts and Documents

9. Topic Modeling

10. Using Parsers to Extract Relationships

11. Combined Pipeline

12. Creating a Chatbot

13. Other Books You May Enjoy

Leave a review - let other readers know what you think

Training a sentence-detector model

We will use OpenNLP's SentenceDetectorME class to illustrate the training process. This class has a static train method that uses sample sentences found in a file. The method returns a model that is usually serialized to a file for later use.

Models use special annotated data to clearly specify where a sentence ends. Frequently, a large file is used to provide a good sample for training purposes. Part of the file is used for training purposes, and the rest is used to verify the model after it has been trained.

The training file used by OpenNLP consists of one sentence per line. Usually, at least 10 to 20 sample sentences are needed to avoid processing errors. To demonstrate this process, we will use a file called sentence.train. It consists of Chapter 5, Twenty Thousand Leagues Under the Sea, by Jules Verne. The text of the book can be found...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €18.99/month. Cancel anytime

Authors (1)

Richard M. Reese

Richard Reese has worked in the industry and academics for the past 29 years. For 10 years he provided software development support at Lockheed and at one point developed a C based network application. He was a contract instructor providing software training to industry for 5 years. Richard is currently an Associate Professor at Tarleton State University in Stephenville Texas. Richard is the author of various books and video courses some of which are as follows: Natural Language Processing with Java. Java for Data Science Getting Started with Natural Language Processing in Java

See other products by Richard M. Reese