Subscription

Explore Products

Best Sellers

New Releases

Books

Events

Videos

Audiobooks

Packt Hub

Free Learning

You're reading from Mastering Java for Data Science Analytics and more for production-ready applications

Product type Paperback

Published in Apr 2017

Publisher Packt

ISBN-13 9781782174271

Length 364 pages

Edition 1st Edition

Languages

Java

Tools

Hadoop

Concepts

Data Science

Author (1):

Alexey Grigorev

View More author details

Table of Contents (11) Chapters

Preface

1. Data Science Using Java

2. Data Processing Toolbox FREE CHAPTER

3. Exploratory Data Analysis

4. Supervised Learning - Classification and Regression

5. Unsupervised Learning - Clustering and Dimensionality Reduction

6. Working with Text - Natural Language Processing and Information Retrieval

7. Extreme Gradient Boosting

8. Deep Learning with DeepLearning4J

9. Scaling Data Science

10. Deploying Data Science Models

Exploratory data analysis in Java

Exploratory Data Analysis is about taking a dataset and extracting the most important information from it, in such a way that it is possible to get an idea of what the data looks like. This includes two main parts: summarization and visualization.

The summarization step is very helpful for understanding data. For numerical variables, in this step we calculate the most important sample statistics:

The extremes (the minimal and the maximal values)
The mean value, or the sample average
The standard deviation, which describes the spread of the data

Often we consider other statistics, such as the median and the quartiles (25% and 75%).

As we have already seen in the previous chapter, Java offers a great set of tools for data preparation. The same set of tools can be used for EDA, and especially for creating summaries.

The rest of the chapter is locked

Tech Concepts

Programming languages

Tech Tools

Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

50+ new titles added per month and exclusive early access to books as they are being written.

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €18.99/month. Cancel anytime

Authors (1)

Alexey Grigorev

Alexey Grigorev is a skilled data scientist, machine learning engineer, and software developer with more than 8 years of professional experience. He started his career as a Java developer working at a number of large and small companies, but after a while he switched to data science. Right now, Alexey works as a data scientist at Simplaex, where, in his day-to-day job, he actively uses Java and Python for data cleaning, data analysis, and modeling. His areas of expertise are machine learning and text mining.

See other products by Alexey Grigorev