Run and monitor HDFS

This section will walk through starting HDFS on all the nodes, and monitoring that everything is working properly.

Start and Stop HDFS

  • Start the HDFS by running the following script from node-master:

start-dfs.sh

It’ll start NameNode and SecondaryNameNode on node-master, and DataNode on node1 and node2, according to the configuration in the slaves config file.

  • Check the running process with the jps command on each node.

  • To stop HDFS on master and slave nodes, run the following command from node-master

stop-dfs.sh

Monitor HDFS Cluster

  • Get useful information about the HDFS cluster, using dfsadmin command

hdfs dfsadmin -report

This will print out useful information such as resource capacity and usage for all running DataNodes. To get the help for all available commands, use

hdfs dfsadmin -help
  • You can also automatically use the friendlier web user interface. Point your browser to http://node-master-IP:50070 and you’ll get a user-friendly monitoring console.

Put and Get Data to HDFS

Writing and reading to HDFS is done with command hdfs dfs. First, manually create your home directory. All other commands will use a path relative to this default home directory: hdfs dfs -mkdir -p /user/hadoop

  • Create a test directory in HDFS. The following command will create it in the home directory,

/user/hadoop/test:
hdfs dfs -mkdir test
  • Copy three local file using HDFS into the test directory:

hdfs dfs -put a123.txt b123.txt c123.txt books
  • List the contents of the test directory:

hdfs dfs -ls test
  • Move one of the files back to the local filesystem:

hdfs dfs -get test/a123.txt 
  • You can also directly print the books from HDFS:

hdfs dfs -cat test/a123.txt 
  • There are many commands to manage your HDFS. For a complete list, you can look at the Apache HDFS shell documentation, or print help with:

hdfs dfs -help

Last updated