Single Node Install

Here is a step-by-step guide for installing Hadoop on a single-node setup on a Linux-based operating system (e.g., Ubuntu or CentOS), as Hadoop runs best on Linux.

Pre-requisites

Java (version 8 or above) should be installed on the machine.
SSH should be configured for passwordless login (even on a single-node).
Linux distribution (Ubuntu, CentOS, etc.)

Step 1: Install Java

Hadoop requires Java to be installed. To install Java, follow these steps:

Check Java version:
```
java -version
```
If Java is already installed, you’ll see the version. If not, proceed with installation.

Install Java (OpenJDK 8):

On Ubuntu:

sudo apt update
sudo apt install openjdk-8-jdk

On CentOS:

sudo yum install java-1.8.0-openjdk-devel

Set the JAVA_HOME environment variable: Find the Java path using:
```
update-alternatives --config java
```
Copy the path (e.g., /usr/lib/jvm/java-8-openjdk-amd64/), and add the following to your ~/.bashrc or ~/.profile file:
```
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
export PATH=$PATH:$JAVA_HOME/bin
```
Reload the bash profile:
```
source ~/.bashrc
```

Step 2: Download Hadoop

Go to the official Hadoop website and download the latest stable version. For this example, we'll use Hadoop 3.3.0.
Alternatively, use the wget command to download:
```
wget https://downloads.apache.org/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz
```
Extract the Hadoop archive:
```
tar -xzvf hadoop-3.3.0.tar.gz
```
Move the extracted folder to /usr/local/hadoop:
```
sudo mv hadoop-3.3.0 /usr/local/hadoop
```

Step 3: Set Hadoop Environment Variables

You need to set environment variables to tell the system where Hadoop is located.

Open ~/.bashrc (or ~/.profile depending on your system):
```
nano ~/.bashrc
```

Add the following lines at the end of the file:

# Set Hadoop-related environment variables
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

Apply the changes:
```
source ~/.bashrc
```

Step 4: Configure Hadoop

Navigate to the Hadoop configuration directory:
```
cd /usr/local/hadoop/etc/hadoop
```
Edit hadoop-env.sh: Open hadoop-env.sh for editing:
```
nano hadoop-env.sh
```
Add the following line to set Java:
```
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
```

Edit core-site.xml: Open core-site.xml for editing:

nano core-site.xml

Add the following configuration:

<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://localhost:9000</value>
  </property>
</configuration>

Edit hdfs-site.xml: Open hdfs-site.xml for editing:

nano hdfs-site.xml

Add the following configuration:

<configuration>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:///usr/local/hadoop/hadoop_data/hdfs/namenode</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:///usr/local/hadoop/hadoop_data/hdfs/datanode</value>
  </property>
</configuration>

Edit mapred-site.xml: You’ll need to copy the template file and then configure it.

cp mapred-site.xml.template mapred-site.xml
nano mapred-site.xml

Add the following configuration:

<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
</configuration>

Edit yarn-site.xml: Open yarn-site.xml for editing:

nano yarn-site.xml

Add the following configuration:

<configuration>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>localhost:8025</value>
  </property>
  <property>
    <name>yarn.resourcemanager.address</name>
    <value>localhost:8032</value>
  </property>
  <property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>localhost:8030</value>
  </property>
</configuration>

Step 5: Format HDFS

Create the directory where Hadoop will store data:

sudo mkdir -p /usr/local/hadoop/hadoop_data/hdfs/namenode
sudo mkdir -p /usr/local/hadoop/hadoop_data/hdfs/datanode

Format the Hadoop filesystem:
```
hdfs namenode -format
```

Step 6: Start Hadoop

Start the Hadoop Distributed File System (HDFS):
```
start-dfs.sh
```
Start YARN:
```
start-yarn.sh
```
Verify the installation: You can check the status of the HDFS and YARN services by visiting:
- NameNode: http://localhost:9870
- ResourceManager: http://localhost:8088

Step 7: Stop Hadoop

To stop the Hadoop services, run:

stop-dfs.sh
stop-yarn.sh

Troubleshooting

Java version issues: Ensure that you're using Java 8, as newer versions may not be compatible with Hadoop 3.x.
Permissions: Ensure that all directories have the proper permissions for Hadoop to read/write to them.
Firewall: If running Hadoop on a cluster or multiple nodes, ensure the necessary ports are open.

This setup assumes a single-node Hadoop cluster. For a multi-node cluster, you would need to configure additional settings for nodes, and the setup would involve configuring slaves and other services accordingly.

PreviousStart Ambari Server NextMulti-Node Cluster Install

Last updated 6 months ago