Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Mastering Java for Data Science

You're reading from   Mastering Java for Data Science Analytics and more for production-ready applications

Arrow left icon
Product type Paperback
Published in Apr 2017
Publisher Packt
ISBN-13 9781782174271
Length 364 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Alexey Grigorev Alexey Grigorev
Author Profile Icon Alexey Grigorev
Alexey Grigorev
Arrow right icon
View More author details
Toc

Preface

Data science has become a quite important tool for organizations nowadays: they have collected large amounts of data, and to be able to put it into good use, they need data science--the discipline about methods for extracting knowledge from data. Every day more and more companies realize that they can benefit from data science and utilize the data that they produce more effectively and more profitably.

It is especially true for IT companies, they already have the systems and the infrastructure for generating and processing the data. These systems are often written in Java--the language of choice for many large and small companies across the world. It is not a surprise, Java offers a very solid and mature ecosystem of libraries that are time proven and reliable, so many people trust Java and use it for creating their applications.

Thus, it is also a natural choice for many data processing applications. Since the existing systems are already in Java, it makes sense to use the same technology stack for data science, and integrate the machine learning model directly in the application's production code base.

This book will cover exactly that. We will first see how we can utilize Java’s toolbox for processing small and large datasets, then look into doing initial exploration data analysis. Next, we will review the Java libraries that implement common Machine Learning models for classification, regression, clustering, and dimensionality reduction problems. Then we will get into more advanced techniques and discuss Information Retrieval and Natural Language Processing, XGBoost, deep learning, and large scale tools for processing big datasets such as Apache Hadoop and Apache Spark. Finally, we will also have a look at how to evaluate and deploy the produced models such that the other services can use them.

We hope you will enjoy the book. Happy reading!

lock icon The rest of the chapter is locked
Next Section arrow right
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at R$50/month. Cancel anytime