Packt+ | Advance your knowledge in tech

You're reading from Hadoop Cluster Deployment Construct a modern Hadoop data platform effortlessly and gain insights into how to manage clusters efficiently

Product type Paperback

Published in Nov 2013

Publisher Packt

ISBN-13 9781783281718

Length 126 pages

Edition 1st Edition

Languages

Java

Tools

Hadoop

Concepts

System Administration

Author (1):

Danil Zburvisky

View More author details

Table of Contents (13) Chapters

Hadoop Cluster Deployment

Credits

About the Author

About the Reviewers

www.PacktPub.com

Preface

1. Setting Up Hadoop Cluster – from Hardware to Distribution FREE CHAPTER

2. Installing and Configuring Hadoop

3. Configuring the Hadoop Ecosystem

4. Securing Hadoop Installation

5. Monitoring Hadoop Cluster

6. Deploying Hadoop to the Cloud

Index

Choosing OS for the Hadoop cluster

Choosing an operating system for your future Hadoop cluster is a relatively simple task. Hadoop core and its ecosystem components are all written in Java, with a few exceptions. While Java code itself is cross-platform, currently Hadoop only runs on Linux-like systems. The reason for this is that too many design decisions were made with Linux in mind, which made the code surrounding core Hadoop components such as start/stop scripts and permissions model dependent on the Linux environment.

When it comes to Linux, Hadoop is pretty indifferent to specific implementations and runs well on different varieties of this OS: Red Hat, CentOS, Debian, Ubuntu, Suse, and Fedora. All these distributions don't have specific requirements for running Hadoop. In general, nothing prevents Hadoop from successfully working on any other POSIX-style OS, such as Solaris or BSD, if you make sure that all dependencies are resolved properly and all shell supporting scripts are working. Still, most of the production installations of Hadoop are running on Linux and this is the OS that we will be focusing on in our further discussions. Specifically, examples in this book will be focused on CentOS, since it is one of the popular choices for the production system, as well as its twin, Red Hat.

Apache Hadoop provides source binaries, as well as RPM and DEB packages for stable releases. Currently, this is a 1.0 branch. Building Hadoop from the source code, while still being an option, is not recommended for most of the users, since it requires experience in assembling big Java-based projects and careful dependencies resolution. Both Cloudera and Hortonworks distributions provide an easy way to setup a repository on your servers and install all required packages from there.

Tip

There is no strict requirement to run the same operating system across all Hadoop nodes, but common sense suggests, that the lesser the deviation in nodes configuration, the easier it is to administer and manage it.

You're reading from Hadoop Cluster Deployment Construct a modern Hadoop data platform effortlessly and gain insights into how to manage clusters efficiently

Table of Contents (13) Chapters

Choosing OS for the Hadoop cluster

Tip

Authors (1)

Personalised recommendations for you