Multi-Node Cluster Install
Setting up a multi-node Hadoop cluster involves configuring several nodes (master and worker nodes) to work together. In this guide, we'll focus on a three-node cluster, where one node will act as the NameNode (master) and the other nodes will act as DataNodes (workers). We'll use a Linux-based OS (Ubuntu/CentOS) for this installation.
Pre-requisites
Java 8 or higher installed on all nodes.
SSH passwordless setup between all nodes.
All nodes should have static IP addresses.
Network configuration:
Ensure that all nodes can communicate with each other via their IP addresses.
All nodes should be able to resolve hostnames (
hostname
should resolve to an IP).
Step 1: Configure SSH on All Nodes
Generate SSH key on the master node:
Copy the public key to all nodes (master and workers):
This will enable passwordless SSH access between the nodes.
Test SSH between the nodes:
Set up a hostname for each node in
/etc/hosts
(on all nodes):Open
/etc/hosts
and add entries like:This allows each node to resolve the names of the others.
Step 2: Install Java on All Nodes
Install Java 8 or higher (if not already installed) on all nodes:
On Ubuntu:
On CentOS:
Set JAVA_HOME on all nodes:
Add the following to
~/.bashrc
or~/.profile
:Reload the profile:
Step 3: Install Hadoop on All Nodes
Download Hadoop and extract it on all nodes:
On the master node:
On each worker node: Download Hadoop and extract it in the same way as on the master node:
Set Hadoop environment variables on all nodes: Add the following to the end of
~/.bashrc
:Reload
~/.bashrc
:
Step 4: Configure Hadoop on All Nodes
1. Configure hadoop-env.sh
Edit
hadoop-env.sh
on all nodes (/usr/local/hadoop/etc/hadoop/hadoop-env.sh
):Add the following line to set
JAVA_HOME
:
2. Configure core-site.xml
Edit
core-site.xml
on all nodes (/usr/local/hadoop/etc/hadoop/core-site.xml
):Add the following configuration:
3. Configure hdfs-site.xml
Edit
hdfs-site.xml
on all nodes (/usr/local/hadoop/etc/hadoop/hdfs-site.xml
):Add the following configuration:
4. Configure mapred-site.xml
Edit
mapred-site.xml
on all nodes (/usr/local/hadoop/etc/hadoop/mapred-site.xml
):Add the following configuration:
5. Configure yarn-site.xml
Edit
yarn-site.xml
on all nodes (/usr/local/hadoop/etc/hadoop/yarn-site.xml
):Add the following configuration:
Step 5: Set Up Directories for HDFS
On the master node: Create the directories for HDFS and give appropriate permissions:
On each worker node: Create the directories for HDFS:
Step 6: Format the Hadoop Filesystem (Only on Master Node)
Run the following command on the master node to format the NameNode:
Step 7: Start the Hadoop Cluster
Start HDFS: On the master node:
Start YARN: On the master node:
Check the status of your Hadoop cluster:
NameNode:
http://master-node:9870
ResourceManager:
http://master-node:8088
Step 8: Stop the Hadoop Cluster
To stop Hadoop, use the following commands:
Stop HDFS:
Stop YARN:
Troubleshooting Tips
SSH issues: Make sure passwordless SSH works correctly between nodes.
Permissions: Ensure that the Hadoop user has appropriate permissions for directories.
Firewall settings: Check firewall settings if the nodes cannot communicate.
Logs: Look into Hadoop log files (located in
$HADOOP_HOME/logs
) for detailed error messages if something goes wrong.
This multi-node setup allows Hadoop to operate efficiently across multiple machines, enabling scalable and distributed data processing.
Last updated