Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Big Data Analytics with Hadoop 3

You're reading from   Big Data Analytics with Hadoop 3 Build highly effective analytics solutions to gain valuable insight into your big data

Arrow left icon
Product type Paperback
Published in May 2018
Publisher Packt
ISBN-13 9781788628846
Length 482 pages
Edition 1st Edition
Languages
Tools
Concepts
Arrow right icon
Author (1):
Arrow left icon
Sridhar Alla Sridhar Alla
Author Profile Icon Sridhar Alla
Sridhar Alla
Arrow right icon
View More author details
Toc

Table of Contents (13) Chapters Close

Preface 1. Introduction to Hadoop FREE CHAPTER 2. Overview of Big Data Analytics 3. Big Data Processing with MapReduce 4. Scientific Computing and Big Data Analysis with Python and Hadoop 5. Statistical Big Data Computing with R and Hadoop 6. Batch Analytics with Apache Spark 7. Real-Time Analytics with Apache Spark 8. Batch Analytics with Apache Flink 9. Stream Processing with Apache Flink 10. Visualizing Big Data 11. Introduction to Cloud Computing 12. Using Amazon Web Services

To get the most out of this book

The examples have been implemented using Scala, Java, R, and Python on a Linux 64-bit. You will also need, or be prepared to install, the following on your machine (preferably the latest version):

  • Spark 2.3.0 (or higher)
  • Hadoop 3.1 (or higher)
  • Flink 1.4
  • Java (JDK and JRE) 1.8+
  • Scala 2.11.x (or higher)
  • Python 2.7+/3.4+
  • R 3.1+ and RStudio 1.0.143 (or higher)
  • Eclipse Mars or Idea IntelliJ (latest)

Regarding the operating system: Linux distributions are preferable (including Debian, Ubuntu, Fedora, RHEL, and CentOS) and, to be more specific, for example, as regards Ubuntu, it is recommended having a complete 14.04 (LTS) 64-bit (or later) installation, VMWare player 12, or Virtual box. You can also run code on Windows (XP/7/8/10) or macOS X (10.4.7+).

Regarding hardware configuration: Processor Core i3, Core i5 (recommended) ~ Core i7 (to get the best result). However, multicore processing would provide faster data processing and scalability. At least 8 GB RAM (recommended) for a standalone mode. At least 32 GB RAM for a single VM and higher for cluster. Enough storage for running heavy jobs (depending on the dataset size you will be handling) preferably at least 50 GB of free disk storage (for stand alone and SQL warehouse).

Download the example code files

You can download the example code files for this book from your account at www.packtpub.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

  1. Log in or register at www.packtpub.com.
  2. Select the SUPPORT tab.
  3. Click on Code Downloads & Errata.
  4. Enter the name of the book in the Search box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

  • WinRAR/7-Zip for Windows
  • Zipeg/iZip/UnRarX for Mac
  • 7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Big-Data-Analytics-with-Hadoop-3. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "This file, temperatures.csv, is available as a download and once downloaded, you can move it into hdfs by running the command, as shown in the following code."

A block of code is set as follows:

hdfs dfs -copyFromLocal temperatures.csv /user/normal

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

Map-Reduce Framework -- output average temperature per city name
Map input records=35
Map output records=33
Map output bytes=208
Map output materialized bytes=286

Any command-line input or output is written as follows:

$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys

Bold: Indicates a new term, an important word, or words that you see on screen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Clicking on the Datanodes tab shows all the nodes."

Warnings or important notes appear like this.
Tips and tricks appear like this.
lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime