Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Practical Big Data Analytics

You're reading from   Practical Big Data Analytics Hands-on techniques to implement enterprise analytics and machine learning using Hadoop, Spark, NoSQL and R

Arrow left icon
Product type Paperback
Published in Jan 2018
Publisher Packt
ISBN-13 9781783554393
Length 412 pages
Edition 1st Edition
Languages
Concepts
Arrow right icon
Author (1):
Arrow left icon
Nataraj Dasgupta Nataraj Dasgupta
Author Profile Icon Nataraj Dasgupta
Nataraj Dasgupta
Arrow right icon
View More author details
Toc

Table of Contents (13) Chapters Close

Preface 1. Too Big or Not Too Big FREE CHAPTER 2. Big Data Mining for the Masses 3. The Analytics Toolkit 4. Big Data With Hadoop 5. Big Data Mining with NoSQL 6. Spark for Big Data Analytics 7. An Introduction to Machine Learning Concepts 8. Machine Learning Deep Dive 9. Enterprise Data Science 10. Closing Thoughts on Big Data 11. External Data Science Resources 12. Other Books You May Enjoy

Preface

This book introduces the reader to a broad spectrum of topics related to big data as used in the enterprise. Big data is a vast area that encompasses elements of technology, statistics, visualization, business intelligence, and many other related disciplines. To get true value from data that oftentimes remains inaccessible, either due to volume or technical limitations, companies must leverage proper tools both at the software as well as the hardware level.

To that end, the book not only covers the theoretical and practical aspects of big data, but also supplements the information with high-level topics such as the use of big data in the enterprise, big data and data science initiatives and key considerations such as resources, hardware/software stack and other related topics. Such discussions would be useful for IT departments in organizations that are planning to implement or upgrade the organizational big data and/or data science platform.

The book focuses on three primary areas:

1. Data mining on large-scale datasets

Big data is ubiquitous today, just as the term data warehouse was omnipresent not too long ago. There are a myriad of solutions in the industry. In particular, Hadoop and products in the Hadoop ecosystem have become both popular and increasingly common in the enterprise. Further, more recent innovations such as Apache Spark have also found a permanent presence in the enterprise - Hadoop clients, realizing that they may not need the complexity of the Hadoop framework have shifted to Spark in large numbers. Finally, NoSQL solutions, such as MongoDB, Redis, Cassandra and commercial solutions such as Teradata, Vertica and kdb+ have provided have taken the place of more conventional database systems.

This book will cover these areas with a fair degree of depth. Hadoop and related products such as Hive, HBase, Pig Latin and others have been covered. We have also covered Spark and explained key concepts in Spark such as Actions and Transformations. NoSQL solutions such as MongoDB and KDB+ have also been covered to a fair extent and hands-on tutorials have also been provided.

2. Machine learning and predictive analytics

The second topic that has been covered is machine learning, also known by various other names, such as Predictive Analytics, Statistical Learning and others. Detailed explanations with corresponding machine learning code written using R and machine learning packages in R have been provided. Algorithms, such as random forest, support vector machines, neural networks, stochastic gradient boosting, decision trees have been discussed. Further, key concepts in machine learning such as bias and variance, regularization, feature section, data pre-processing have also been covered.

3. Data mining in the enterprise

In general, books that cover theoretical topics seldom discuss the more high-level aspects of big data - such as the key requirements for a successful big data initiative. The book includes survey results from IT executives and highlights the shared needs that are common across the industry. The book also includes a step-by-step guide on how to select the right use cases, whether it is for big data or for machine learning based on lessons learned from deploying production solutions in large IT departments.

We believe that with a strong foundational knowledge of these three areas, any practitioner can deliver successful big data and/or data science projects. That is the primary intention behind the overall structure and content of the book.

lock icon The rest of the chapter is locked
Next Section arrow right
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime