Log into the machine/host as root user and install jdk:
Once Java is installed, make sure Java is in PATH
for execution. This can be done by setting JAVA_HOME
and exporting it as an environment variable. The following screenshot shows the content of the directory where Java gets installed:
Now we need to copy the tarball hadoop-2.7.3.tar.gz
--which was built in the Build Hadoop section earlier in this chapter—to the home directory of the user root. For this, the user needs to login to the node where Hadoop was built and execute the following command:
Create a directory named/opt/cluster
to be used for Hadoop:
Then untar the hadoop-2.7.3.tar.gz
to the preceding created directory:
Create a user named hadoop
, if you haven't already, and set the password as hadoop
:
As step 6 was done by the root user, the directory and file under /opt/cluster
will be owned by the root user. Change the ownership to the Hadoop user:
If the user lists the directory structure under /opt/cluster
, he will see it as follows:
The directory structure under /opt/cluster/hadoop-2.7.3
will look like the one shown in the following screenshot:
The listing shows etc
, bin
, sbin
, and other directories.
The etc/hadoop
directory is the one that contains the configuration files for configuring various Hadoop daemons. Some of the key files are core-site.xml
, hdfs-site.xml
, hadoop-env.xml
, and mapred-site.xml
among others, which will be explained in the later sections:
The directories bin
and sbin
contain executable binaries, which are used to start and stop Hadoop daemons and perform other operations such as filesystem listing, copying, deleting, and so on:
To execute a command /opt/cluster/hadoop-2.7.3/bin/hadoop, a
complete path to the command needs to be specified. This could be cumbersome, and can be avoided by setting the environment variable HADOOP_HOME
.
Similarly, there are other variables that need to be set that point to the binaries and the configuration file locations:
The environment file is set up system-wide so that any user can use the commands. Once the hadoopenv.sh
file is in place, execute the command to export the variables defined in it:
Change to the Hadoop
user using the command su – hadoop
:
Change to the /opt/cluster
directory and create a symlink:
To verify that the preceding changes are in place, the user can execute either the which Hadoop
or which java
commands, or the user can execute the command hadoop
directly without specifying the complete path.
In addition to setting the environment as discussed, the user has to add the JAVA_HOME
variable in the hadoop-env.sh
file.
The next thing is to set up the Namenode address, which specifies the host:port
address on which it will listen. This is done using the file core-site.xml
:
The important thing to keep in mind is the property fs.defaultFS
.
The next thing that the user needs to configure is the location where Namenode will store its metadata. This can be any location, but it is recommended that you always have a dedicated disk for it. This is configured in the file hdfs-site.xml
:
The next step is to format the Namenode. This will create an HDFS file system:
Similarly, we have to add the rule for the Datanode
directory under hdfs-site.xml
. Nothing needs to be done to the core-site.xml
file:
Then the services need to be started for Namenode and Datanode:
The command jps
can be used to check for running daemons: