Managing HDFS
As we saw when killing and restarting nodes in Chapter 6, When Things Break, Hadoop automatically manages many of the availability concerns that would consume a lot of effort on a more traditional filesystem. There are some things, however, that we still need to be aware of.
Where to write data
Just as the NameNode can have multiple locations for storage of fsimage
specified via the dfs.name.dir
property, we explored earlier that there is a similar-appearing property called dfs.data.dir
that allows HDFS to use multiple data locations on a host, which we will look at now.
This is a useful mechanism that works very differently from the NameNode property. If multiple directories are specified in dfs.data.dir
, Hadoop will view these as a series of independent locations that it can use in parallel. This is useful if you have multiple physical disks or other storage devices mounted at distinct points on the filesystem. Hadoop will use these multiple devices intelligently, maximizing...