Packt+ | Advance your knowledge in tech

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Free Learning

Natural Language Processing with Java

You're reading from Natural Language Processing with Java Techniques for building machine learning and neural network models for NLP

Product type Paperback

Published in Jul 2018

Publisher

ISBN-13 9781788993494

Length 318 pages

Edition 2nd Edition

Languages

Java

Tools

Processing

Concepts

Machine Learning

Authors (2):

Ashish Bhatia

Richard M. Reese

View More author details

Table of Contents (14) Chapters

Preface

1. Introduction to NLP FREE CHAPTER

2. Finding Parts of Text

3. Finding Sentences

4. Finding People and Things

5. Detecting Part of Speech

6. Representing Text with Features

7. Information Retrieval

8. Classifying Texts and Documents

9. Topic Modeling

10. Using Parsers to Extract Relationships

11. Combined Pipeline

12. Creating a Chatbot

13. Other Books You May Enjoy

Leave a review - let other readers know what you think

Using boilerpipe to extract text from HTML

There are several libraries available for extracting text from HTML documents. We will demonstrate how to use boilerpipe (https://code.google.com/p/boilerpipe/) to perform this operation. This is a flexible API that not only extracts the entire text of an HTML document but can also extract selected parts of an HTML document, such as its title and individual text blocks. We will use the HTML page at http://en.wikipedia.org/wiki/Berlin to illustrate the use of boilerpipe. Part of this page is shown in the following screenshot:

In order to use boilerpipe, you will need to download the binary for the Xerces Parser, which can be found at http://xerces.apache.org/index.html.

We start by creating a URL object that represents this page. We will use two classes to extract text. The first is the HTMLDocument class that represents the HTML document. The second is the TextDocument class that represents the text within an HTML document. It consists of one or more...

The rest of the chapter is locked

Register for a free Packt account to unlock a world of extra content!

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (1)

Richard M. Reese

Richard M. Reese

Richard Reese has worked in the industry and academics for the past 29 years. For 10 years he provided software development support at Lockheed and at one point developed a C based network application. He was a contract instructor providing software training to industry for 5 years. Richard is currently an Associate Professor at Tarleton State University in Stephenville Texas. Richard is the author of various books and video courses some of which are as follows: Natural Language Processing with Java. Java for Data Science Getting Started with Natural Language Processing in Java

See other products by Richard M. Reese

Other recommended products

Related to this chapter

Natural Language Processing with Java Cookbook

Natural Language Processing with Java Cookbook

This book will teach you how to perform basic and advanced NLP tasks in Java, using independent recipes. The book not only covers the essential aspects of NLP but also addresses other important areas such as the acquisition of text and techniques for utilizing NLP in varied domains

Apr 2019 12h 52m

Java Data Science Cookbook

Java Data Science Cookbook

Java has been one of the most popular languages for developers for several decades and yet the potential of the Java ecosystem still remains untapped when it comes to using JVM-based languages and platforms to solve data science related problems. A variety of tools and libraries are available such as Spark, Hadoop, and Mahout for computation and libraries such as MLlib, Weka, DL4j to implement smart data models. This book uncovers practically all these techniques in the form of recipes showing you how these tools and libraries can solve statistical, analytical, data mining, and information science related problems.

Mar 2017 12h 24m

Java for Data Science

Java for Data Science

Harness the incredible power of Java-based approaches to data science and create new, innovative applications to explore, visualise and analyse big data. With its tutorial approach and step-by-step instructional style, Java for Data Science is the ultimate data science book for Java developers interested in Java-based data science solutions.

Jan 2017 12h 52m

Mastering spaCy

Mastering spaCy

Using machine learning-based NLP models, you can speed up business processes, make more accurate predictions, and uncover new insights from your existing data, where spaCy, an advanced industrial-grade natural language processing library, can help. With this book, you'll learn how to use it and create high-impact ML solutions for NLP.

Jul 2021 11h 52m

Mastering spaCy

Mastering spaCy

Using machine learning-based NLP models, you can speed up business processes, make more accurate predictions, and uncover new insights from your existing data, where spaCy, an advanced industrial-grade natural language processing library, can help. With this book, you'll learn how to use it and create high-impact ML solutions for NLP.

Jul 2021 11h 52m

Mastering spaCy

Mastering spaCy

Using machine learning-based NLP models, you can speed up business processes, make more accurate predictions, and uncover new insights from your existing data, where spaCy, an advanced industrial-grade natural language processing library, can help. With this book, you'll learn how to use it and create high-impact ML solutions for NLP.

Jul 2021 11h 52m

Mastering spaCy

Mastering spaCy

Using machine learning-based NLP models, you can speed up business processes, make more accurate predictions, and uncover new insights from your existing data, where spaCy, an advanced industrial-grade natural language processing library, can help. With this book, you'll learn how to use it and create high-impact ML solutions for NLP.

Jul 2021 11h 52m

Mastering spaCy

Mastering spaCy

Using machine learning-based NLP models, you can speed up business processes, make more accurate predictions, and uncover new insights from your existing data, where spaCy, an advanced industrial-grade natural language processing library, can help. With this book, you'll learn how to use it and create high-impact ML solutions for NLP.

Jul 2021 11h 52m

Mastering spaCy

Mastering spaCy

Using machine learning-based NLP models, you can speed up business processes, make more accurate predictions, and uncover new insights from your existing data, where spaCy, an advanced industrial-grade natural language processing library, can help. With this book, you'll learn how to use it and create high-impact ML solutions for NLP.

Jul 2021 11h 52m

Mastering spaCy

Mastering spaCy

Using machine learning-based NLP models, you can speed up business processes, make more accurate predictions, and uncover new insights from your existing data, where spaCy, an advanced industrial-grade natural language processing library, can help. With this book, you'll learn how to use it and create high-impact ML solutions for NLP.

Jul 2021 11h 52m

Mastering spaCy

Mastering spaCy

Using machine learning-based NLP models, you can speed up business processes, make more accurate predictions, and uncover new insights from your existing data, where spaCy, an advanced industrial-grade natural language processing library, can help. With this book, you'll learn how to use it and create high-impact ML solutions for NLP.

Jul 2021 11h 52m

Mastering spaCy

Mastering spaCy

Using machine learning-based NLP models, you can speed up business processes, make more accurate predictions, and uncover new insights from your existing data, where spaCy, an advanced industrial-grade natural language processing library, can help. With this book, you'll learn how to use it and create high-impact ML solutions for NLP.

Jul 2021 11h 52m