Single Node/Pseudo Distributed Hadoop Cluster on macOS

This article walks through setting up and configuring a single node Hadoop Cluster or pseudo-distributed cluster on macOS. A single node cluster is very useful for development as it reduces the need for an actual cluster for running quick tests.

At the end of this tutorial, you’ll have a single node Hadoop cluster with all the essential Hadoop daemons such as NameNode, DataNode, NodeManager, ResourceManager and SecondaryNameNode.

Prerequisites

The two prerequisites for setting up a single node Hadoop cluster are Java and SSH.

Java

Java must be installed and $JAVA_HOME environment variable should be set.

Install Java

Install java from the official website – https://java.com/en/download/

Verify Java is installed

Check the version of java in terminal

If java is installed, java version will be printed similar to the below output.

$ java -version
java version “1.8.0_211”
Java(TM) SE Runtime Environment (build 1.8.0_211-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.211-b12, mixed mode)

Set $JAVA_HOME

Add $JAVA_HOME environment variable to .bash_profile file

Source the .bash_profile file

Verify that $JAVA_HOME is set up properly

SSH

SSH (Remote Login) is disabled by default on MacOS. SSH should be enabled and SSH keys should be set up to manage remote Hadoop daemons.

Enable SSH

Open System Preferences and go to Sharing

Select the Remote Login checkbox to enable SSH

Select the Remote Login checkbox to enable SSH
System Preferences -> Sharing
Setup SSH Key

Generate SSH Key

Add the newly created public key to authorized ssh keys

Verify SSH

Verify you can SSH to the localhost with a passphrase:


Install and configure Hadoop

Download Hadoop Distribution

Download the latest Hadoop distribution from the official website – https://hadoop.apache.org/releases.html

hadoop-3.1.2 was the latest distribution at the time of writing.

Unpack and move

Unpack the tar file. Update the location in the command if the tar file is in a different directory.

Move the Hadoop distribution directory to a preferred directory. We are using /User/ash/bin/ directory to store the Hadoop distribution. You can use any directory of your preference.

Set variables

hadoop-env.sh

Edit ~/bin/hadoop-3.1.2/etc/hadoop/hadoop-env.sh file to define the following parameters. Set HADOOP_HOME to the Hadoop distribution location in your machine.

.bash_profile

Add the following properties to ~/.bash_profile file

Source the .bash_profile file.

Verify the variables are set

Verify $HADOOP_HOME is set.

The output should look similar to the following.

$ echo $HADOOP_HOME
/Users/ashwin/bin/hadoop-3.1.2

Verify Hadoop executable binaries are added to $PATH.

The output should look similar to the following.

$ hadoop version
Hadoop 3.1.2
Source code repository https://github.com/apache/hadoop.git -r 1019dde65bcf12e05ef48ac71e84550d589e5d9a
Compiled by sunilg on 2019-01-29T01:39Z
Compiled with protoc 2.5.0
From source with checksum 64b8bdd4ca6e77cce75a93eb09ab2a9

Configure site.xml files

Modify the following site.xml files with the properties shown below.

mapred-site.xml

$HADOOP_HOME/etc/hadoop/mapred-site.xml

yarn-site.xml

$HADOOP_HOME/etc/hadoop/yarn-site.xml

hdfs-site.xml

$HADOOP_HOME/etc/hadoop/hdfs-site.xml

core-site.xml

$HADOOP_HOME/etc/hadoop/core-site.xml


Start up the Hadoop cluster

Start the cluster

Format the HDFS filesystem.

Start the Hadoop daemons.

Verify the cluster is up

Verify NameNode, DataNode, NodeManager, ResourceManager and SecondaryNameNode are running.

The output should look similar to the following.

33703 SecondaryNameNode
34376 ResourceManager
34537 Jps
34473 NodeManager
33466 NameNode
33567 DataNode


Information about the cluster

Browse the following web pages to find information about the Hadoop cluster.

Hadoop Health

Browse Hadoop Health web page at http://localhost:9870.

Hadoop Health

Yarn Resource Manager

Browse Yarn Resource Manager UI at http://localhost:8088/cluster.

Yarn Resource Manager

Voila! You have a single node Hadoop cluster up and running on your mac.

2 Comments

  • Hi using the same procedure hadoop -version says
    ERROR: -version is not COMMAND nor fully qualified CLASSNAME.

    Reply
    • It’s supposed to be hadoop version. I have fixed the typo. Thank you.

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Scroll to Top