Packt+ | Advance your knowledge in tech

You're reading from Hadoop Cluster Deployment Construct a modern Hadoop data platform effortlessly and gain insights into how to manage clusters efficiently

Product type Paperback

Published in Nov 2013

Publisher Packt

ISBN-13 9781783281718

Length 126 pages

Edition 1st Edition

Languages

Java

Tools

Hadoop

Concepts

System Administration

Author (1):

Danil Zburvisky

View More author details

Table of Contents (13) Chapters

Hadoop Cluster Deployment

Credits

About the Author

About the Reviewers

www.PacktPub.com

Preface

1. Setting Up Hadoop Cluster – from Hardware to Distribution

2. Installing and Configuring Hadoop FREE CHAPTER

3. Configuring the Hadoop Ecosystem

4. Securing Hadoop Installation

5. Monitoring Hadoop Cluster

6. Deploying Hadoop to the Cloud

Index

A

Apache Bigtop project
- URL / Setting up NameNode
Authentication Server (AS) / Kerberos overview
automatic failover option / Setting up NameNode

B

bigtop-jsvc package / Setting up NameNode
bigtop-utils package / Setting up NameNode

C

-chmod command / HDFS security
CapacityTaskScheduler
- about / CapacityTaskScheduler
CDH 4.1 / Setting up NameNode
CDH HA guide
- URL / JobTracker configuration
CDH repositories
- setting up / Setting up the CDH repositories
CDH repository
- used, for installing Sqoop / Installing and configuring Sqoop
check_ping plugin / NameNode checks
CLI / Installing the EMR command-line interface
client, Hive
- installing / Installing the Hive client
clients, Kerberos
- configuring / Configuring Kerberos clients
Cloudera documentation
- on Impala, URL / Installing Impala state store
Cloudera Hadoop distribution
- about / Cloudera Hadoop distribution
cluster administrator
- about / MapReduce security
core-site.xml file / Hadoop configuration files, NameNode HA configuration, core-site.xml
CorruptedBlocks variable / NameNode checks

D

--describe option / Launching the EMR cluster
--driver option / Sqoop import example
DataNode
- hardware, selecting / Choosing the DataNode hardware
- about / Kerberos in Hadoop
DataNode configuration
- about / DataNode configuration
- TaskTracker configuration / TaskTracker configuration
- Hadoop tuning / Advanced Hadoop tuning
DataNode metrics
- URL / JMX Metrics
dfs.client.failover.proxy.provider.sample-cluster variable / NameNode HA configuration
dfs.data.dir variable / DataNode configuration
dfs.datanode.balance.bandwidthPerSec variable / hdfs-site.xml
dfs.ha.fencing.method / NameNode HA configuration
dfs.journalnode.edits.dir variable / NameNode HA configuration
dfs.namenode.replication.min setting / DataNode configuration
dfs.namenode.shared.edits.dir variable / NameNode HA configuration
dfs.nameservices variable / NameNode HA configuration

E

elastic-mapreduce CLI / Choosing the Hadoop version
EMR
- about / Amazon Elastic MapReduce
- command-line interface, installing / Installing the EMR command-line interface
EMR cluster
- launching / Launching the EMR cluster
- master instance / Launching the EMR cluster
- terminating / Launching the EMR cluster
- temporary EMR clusters / Temporary EMR clusters
- input and output locations, preparing / Preparing input and output locations
EMR Web console
- URL / Launching the EMR cluster
EXT4 filesystem / Choosing and setting up the filesystem

F

Failover Controller
- installing / JournalNode, ZooKeeper, and Failover Controller
FairScheduler
- about / FairScheduler, MapReduce security
filesystem
- setting up / Choosing and setting up the filesystem
flex_bg option / Choosing and setting up the filesystem

G

Gangila
- Hadoop, monitoring with / Monitoring Hadoop with Ganglia
Gateway servers
- about / Gateway and other auxiliary services

H

-hadoop-version option / Choosing the Hadoop version
ha.zookeeper.quorum variable / NameNode HA configuration
Hadoop
- cluster hardware, selecting / Choosing Hadoop cluster hardware
- hardware, summary / Hadoop hardware summary
- distributions / Hadoop distributions
- versions / Hadoop versions
- distribution, selecting / Choosing Hadoop distribution
- Cloudera Hadoop distribution / Cloudera Hadoop distribution
- Hortonworks Hadooop distribution / Hortonworks Hadoop distribution
- MapR / MapR
- configuration files / Hadoop configuration files
- table, importing from MySQL / Sqoop import example
- security, overview / Hadoop security overview
- Service Level Authorization / Hadoop Service Level Authorization
- and Kerberos / Hadoop and Kerberos
- Kerberos / Kerberos in Hadoop
- metrics / Hadoop Metrics
- monitoring, with Nagios / Monitoring Hadoop with Nagios
- monitoring, with Gangila / Monitoring Hadoop with Ganglia
hadoop-hdfs-datanode package / DataNode configuration
hadoop-hdfs package / Setting up NameNode
hadoop-metrics.properties file / Hadoop configuration files, Hadoop Metrics
hadoop-metrics2.properties file / Hadoop configuration files
Hadoop cluster
- hardware, selecting / Choosing Hadoop cluster hardware
- data sources, identifying / Choosing the DataNode hardware
- data growth rate, estimating / Choosing the DataNode hardware
- estimated storage requirements, multiplying by replication factor / Choosing the DataNode hardware
- MapReduce temporary files and system data, factoring in / Choosing the DataNode hardware
- low storage density cluster / Low storage density cluster
- high storage density cluster / High storage density cluster
- NameNode hardware / The NameNode hardware
- JobTracker hardware / The JobTracker hardware
- Gateway servers / Gateway and other auxiliary services
- network, considerations / Network considerations
- OS, selecting for / Choosing OS for the Hadoop cluster
- OS, configuring for / Configuring OS for Hadoop cluster
- monitoring strategy / Monitoring strategy overview
Hadoop cluster hardware
- selecting / Choosing Hadoop cluster hardware
Hadoop Distributed File System (HDFS) / Setting up NameNode
Hadoop ecosystem
- hosting / Hosting the Hadoop ecosystem
hadoop jar command / TaskTracker configuration
hadoop package / Setting up NameNode
Hadoop tuning
- hdfs-site.xml / hdfs-site.xml
- mapred-site.xml / mapred-site.xml
- core-site.xml / core-site.xml
Hadoop version
- selecting / Choosing the Hadoop version
HDFS
- security / HDFS security
- Kerberos, enabling for / Enabling Kerberos for HDFS
- monitoring / Monitoring HDFS
hdfs-site.xml file / Hadoop configuration files, hdfs-site.xml
hdfs balancer command / hdfs-site.xml
hdfs command-line client tool / DataNode configuration
high storage density cluster
- about / High storage density cluster
Hive
- about / Hive
- architecture / Hive architecture, Installing Hive Metastore
- Metastore, installing / Installing Hive Metastore
- client, installing / Installing the Hive client
- Server, installing / Installing Hive Server
HiveQL / Hive
Hortonworks Hadooop distribution
- about / Hortonworks Hadoop distribution

I

Impala
- about / Impala
- architecture / Impala architecture
- state store, installing / Installing Impala state store
- server, installing / Installing the Impala server
- server, starting / Installing the Impala server
- using, in command line / Installing the Impala server
- server, connecting to / Installing the Impala server
import command / Sqoop export example

J

Java versions, Hadoop
- URL / Setting up Java Development Kit
JBOD (Just a Bunch of Disks) / Choosing the DataNode hardware
JMX metrics
- about / JMX Metrics
JobQueueTaskScheduler
- about / JobQueueTaskScheduler
job scheduler
- configuring / Configuring the job scheduler
JobTracker
- hardware / The JobTracker hardware
- package, installing / JobTracker configuration
JobTracker checks
- host-level checks / JobTracker checks
- service-level checks / JobTracker checks
JobTracker configuration
- about / JobTracker configuration
- job scheduler, configuring / Configuring the job scheduler
- FairScheduler / FairScheduler
- CapacityTaskScheduler / CapacityTaskScheduler
JournalNode
- about / Setting up NameNode
- installing, on server / JournalNode, ZooKeeper, and Failover Controller
JournalNode checks
- host-level resources / JournalNode checks

K

Kerberos
- about / Hadoop security overview, Kerberos overview
- and Hadoop / Hadoop and Kerberos
- principal, example / Kerberos overview
- in Hadoop / Kerberos in Hadoop
- clients, configuring / Configuring Kerberos clients
- principals, generating / Generating Kerberos principals
- enabling, for HDFS / Enabling Kerberos for HDFS
- enabling, for MapReduce / Enabling Kerberos for MapReduce
Key Distribution Center (KDC) / Kerberos overview

L

Linux Alternatives
- URL / Hadoop configuration files
log4j.properties configuration file / Hadoop configuration files
low storage density cluster
- about / Low storage density cluster

M

-m 0 option / Choosing and setting up the filesystem
manual failover option / Setting up NameNode
MapR
- about / MapR
mapred-site.xml file / Hadoop configuration files, mapred-site.xml
mapred.java.child.opts variable / TaskTracker configuration
mapred.job.racker.handler.count variable / mapred-site.xml
mapred.local.dir directory / JobTracker configuration
MapReduce
- about / JobTracker configuration
- security / MapReduce security
- Kerberos, enabling for / Enabling Kerberos for MapReduce
- monitoring / Monitoring MapReduce
MasterPublicDnsName field / Launching the EMR cluster
Metastore, Hive
- installing / Installing Hive Metastore
- starting / Installing Hive Metastore
Metastore service / Hive architecture
Metrics2 / Hadoop Metrics
MissingBlocks status variable / NameNode checks
mntr command / ZooKeeper checks
MySQL
- table, importing to Hadoop / Sqoop import example
MySQL JDBC driver
- URL, for downloading / Installing and configuring Sqoop

N

--num-mappers option / Sqoop import example
Nagios
- Hadoop, monitoring with / Monitoring Hadoop with Nagios
- documentation, URL / Monitoring Hadoop with Nagios
NameNode
- hardware / The NameNode hardware
- setting up / Setting up NameNode
- manual failover option / Setting up NameNode
- automatic failover option / Setting up NameNode
- HA configuration / NameNode HA configuration
- about / Kerberos in Hadoop
NameNode checks
- about / NameNode checks
NumDeadDataNodes status variables / NameNode checks

O

-O extent,sparse_super,flex_bg option / Choosing and setting up the filesystem
Oracle JDK
- URL, for downloading / Setting up Java Development Kit
OS
- selecting, for Hadoop cluster / Choosing OS for the Hadoop cluster
OS configuration
- for Hadoop cluster / Configuring OS for Hadoop cluster
- filesystem, setting up / Choosing and setting up the filesystem
- filesystem, selecting / Choosing and setting up the filesystem
- Java Development Kit, setting up / Setting up Java Development Kit
- other settings / Other OS settings
- CDH repositories, setting up / Setting up the CDH repositories

P

principal, components
- primary component / Kerberos overview
- secondary component / Kerberos overview
- realm component / Kerberos overview
principals, Kerberos
- generating / Generating Kerberos principals

Q

queue administrator
- about / MapReduce security
quorum / Monitoring strategy overview
Quorum Journal Manager / Setting up NameNode

R

RAID / Choosing the DataNode hardware
repository
- adding / Setting up the CDH repositories

S

S3 documentation
- URL / Preparing input and output locations
SELECT statement / Sqoop export example
server, Hive
- installing / Installing Hive Server
server, Impala
- installing / Installing the Impala server
service level authorization
- about / Hadoop Service Level Authorization
shell. sshfence / NameNode HA configuration
sink / Hadoop Metrics
slaves file / Hadoop configuration files
source / Hadoop Metrics
sparse_super option / Choosing and setting up the filesystem
split-brain / NameNode HA configuration
Sqoop
- about / Sqoop
- installing / Installing and configuring Sqoop
- configuring / Installing and configuring Sqoop
- installing, CDH repository used / Installing and configuring Sqoop
- import, example / Sqoop import example
- export, example / Sqoop export example
sshfence / NameNode HA configuration
State field / Launching the EMR cluster
state store, Impala
- about / Impala architecture
- installing / Installing Impala state store

T

TaskTracker
- configuring / TaskTracker configuration
Ticket-Granting Service (TGS) / Kerberos overview
Ticket-Granting Ticket (TGT) / Kerberos overview

V

-version option / Installing the EMR command-line interface

W

--warehouse-dir option / Sqoop import example
Whirr
- about / Using Whirr
- installing / Installing and configuring Whirr
- configuration files / Installing and configuring Whirr

Y

yum command / Setting up NameNode
yum package
- setting up / Setting up the CDH repositories

Z

ZooKeeper
- about / Setting up NameNode, JournalNode, ZooKeeper, and Failover Controller
- service, starting / JournalNode, ZooKeeper, and Failover Controller
Zookeeper checks
- about / ZooKeeper checks
zookeeper package / Setting up NameNode

The rest of the chapter is locked

You're reading from Hadoop Cluster Deployment Construct a modern Hadoop data platform effortlessly and gain insights into how to manage clusters efficiently

Table of Contents (13) Chapters

Index

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

Q

R

S

T

V

W

Y

Z

Authors (1)

Personalised recommendations for you

You're reading from Hadoop Cluster Deployment Construct a modern Hadoop data platform effortlessly and gain insights into how to manage clusters efficiently

Table of Contents (13) Chapters

Index

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

Q

R

S

T

V

W

Y

Z

Unlock this book and the full library FREE for 7 days

Authors (1)

Personalised recommendations for you