Subscription

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Conferences

Free Learning

You're reading from Machine Learning in Java Helpful techniques to design, build, and deploy powerful machine learning applications in Java

Product type Paperback

Published in Nov 2018

Publisher Packt

ISBN-13 9781788474399

Length 300 pages

Edition 2nd Edition

Languages

Java

Tools

JAVA-ML

Concepts

Machine Learning

Authors (2):

Ashish Bhatia

Bostjan Kaluza

View More author details

Table of Contents (13) Chapters

Preface

1. Applied Machine Learning Quick Start FREE CHAPTER

2. Java Libraries and Platforms for Machine Learning

3. Basic Algorithms - Classification, Regression, and Clustering

4. Customer Relationship Prediction with Ensembles

5. Affinity Analysis

6. Recommendation Engines with Apache Mahout

7. Fraud and Anomaly Detection

8. Image Recognition with Deeplearning4j

9. Activity Recognition with Mobile Phone Sensors

10. Text Mining with Mallet - Topic Modeling and Spam Detection

11. What Is Next?

12. Other Books You May Enjoy

Leave a review - let other readers know what you think

Working with text data

One of the main challenges in text mining is transforming unstructured written natural language into structured attribute-based instances. The process involves many steps, as shown here:

First, we extract some text from the internet, existing documents, or databases. At the end of the first step, the text could still be present in the XML format or some other proprietary format. The next step is to extract the actual text and segment it into parts of the document, for example, title, headline, abstract, and body. The third step is involved with normalizing text encoding to ensure the characters are presented in the same way; for example, documents encoded in formats such as ASCII, ISO 8859-1 and Windows-1250 are transformed into Unicode encoding. Next, tokenization splits the document into particular words, while the next step removes frequent words that...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €18.99/month. Cancel anytime

Authors (2)

Ashish Bhatia

See other products by Ashish Bhatia

Bostjan Kaluza

Bostjan Kaluza is a researcher in artificial intelligence and machine learning with extensive experience in Java and Python. Bostjan is the chief data scientist at Evolven, a leading IT operations analytics company. He works with machine learning, predictive analytics, pattern mining, and anomaly detection to turn data into relevant information. Prior to Evolven, Bostjan served as a senior researcher in the department of intelligent systems at the Jozef Stefan Institute and led research projects involving pattern and anomaly detection, ubiquitous computing, and multi-agent systems. In 2013, Bostjan published his first book, Instant Weka How-To, published by Packt Publishing, exploring how to leverage machine learning using Weka.

See other products by Bostjan Kaluza