Run and monitor HDFS

This section will walk through starting HDFS on all the nodes, and monitoring that everything is working properly.

Start and Stop HDFS

Start the HDFS by running the following script from node-master:

start-dfs.sh

It’ll start NameNode and SecondaryNameNode on node-master, and DataNode on node1 and node2, according to the configuration in the slaves config file.

Check the running process with the jps command on each node.
To stop HDFS on master and slave nodes, run the following command from node-master

stop-dfs.sh

Monitor HDFS Cluster

Get useful information about the HDFS cluster, using dfsadmin command

hdfs dfsadmin -report

This will print out useful information such as resource capacity and usage for all running DataNodes. To get the help for all available commands, use

hdfs dfsadmin -help

You can also automatically use the friendlier web user interface. Point your browser to http://node-master-IP:50070 and you’ll get a user-friendly monitoring console.

Put and Get Data to HDFS

Writing and reading to HDFS is done with command hdfs dfs. First, manually create your home directory. All other commands will use a path relative to this default home directory: hdfs dfs -mkdir -p /user/hadoop

Create a test directory in HDFS. The following command will create it in the home directory,

/user/hadoop/test:
hdfs dfs -mkdir test

Copy three local file using HDFS into the test directory:

hdfs dfs -put a123.txt b123.txt c123.txt books

List the contents of the test directory:

hdfs dfs -ls test

Move one of the files back to the local filesystem:

hdfs dfs -get test/a123.txt

You can also directly print the books from HDFS:

hdfs dfs -cat test/a123.txt

There are many commands to manage your HDFS. For a complete list, you can look at the Apache HDFS shell documentation, or print help with:

hdfs dfs -help

PreviousCluster Install from Ambari NextApache Hadoop

Last updated 2 months ago