Run and monitor HDFS
This section will walk through starting HDFS on all the nodes, and monitoring that everything is working properly.
Start and Stop HDFS
Start the HDFS by running the following script from node-master:
start-dfs.sh
It’ll start NameNode and SecondaryNameNode on node-master, and DataNode on node1 and node2, according to the configuration in the slaves config file.
Check the running process with the jps command on each node.
To stop HDFS on master and slave nodes, run the following command from node-master
stop-dfs.sh
Monitor HDFS Cluster
Get useful information about the HDFS cluster, using dfsadmin command
hdfs dfsadmin -report
This will print out useful information such as resource capacity and usage for all running DataNodes. To get the help for all available commands, use
hdfs dfsadmin -help
You can also automatically use the friendlier web user interface. Point your browser to http://node-master-IP:50070 and you’ll get a user-friendly monitoring console.
Put and Get Data to HDFS
Writing and reading to HDFS is done with command hdfs dfs. First, manually create your home directory. All other commands will use a path relative to this default home directory: hdfs dfs -mkdir -p /user/hadoop
Create a test directory in HDFS. The following command will create it in the home directory,
/user/hadoop/test:
hdfs dfs -mkdir test
Copy three local file using HDFS into the test directory:
hdfs dfs -put a123.txt b123.txt c123.txt books
List the contents of the test directory:
hdfs dfs -ls test
Move one of the files back to the local filesystem:
hdfs dfs -get test/a123.txt
You can also directly print the books from HDFS:
hdfs dfs -cat test/a123.txt
There are many commands to manage your HDFS. For a complete list, you can look at the Apache HDFS shell documentation, or print help with:
hdfs dfs -help
Last updated