Command-line access to the HDFS filesystem
Within the Hadoop distribution, there is a command-line utility called hdfs
, which is the primary way to interact with the filesystem from the command line. Run this without any arguments to see the various subcommands available. There are many, though; several are used to do things like starting or stopping various HDFS components. The general form of the hdfs
command is:
hdfs <sub-command> <command> [arguments]
The two main subcommands we will use in this book are:
dfs
: This is used for general filesystem access and manipulation, including reading/writing and accessing files and directoriesdfsadmin
: This is used for administration and maintenance of the filesystem. We will not cover this command in detail, though. Have a look at the-report
command, which gives a listing of the state of the filesystem and all DataNodes:$ hdfs dfsadmin -report
Note
Note that the dfs
and dfsadmin
commands can also be used with the main Hadoop command-line...