Single Node/Pseudo Distributed Hadoop Cluster on macOS
This article walks through setting up and configuring a single node Hadoop Cluster or pseudo-distributed cluster on macOS. A single node cluster is very useful for development as it reduces the need for an actual cluster for running quick tests.
At the end of this tutorial, you’ll have a single node Hadoop cluster with all the essential Hadoop daemons such as NameNode, DataNode, NodeManager, ResourceManager and SecondaryNameNode.
Table of Contents
Prerequisites
The two prerequisites for setting up a single node Hadoop cluster are Java and SSH.
Java
Java must be installed and $JAVA_HOME environment variable should be set.
Install Java
Install java from the official website – https://java.com/en/download/
Verify Java is installed
Check the version of java in terminal
1 |
$ java -version |
If java is installed, java version will be printed similar to the below output.
$ java -version
java version “1.8.0_211”
Java(TM) SE Runtime Environment (build 1.8.0_211-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.211-b12, mixed mode)
Set $JAVA_HOME
Add $JAVA_HOME environment variable to .bash_profile file
1 |
$ export “JAVA_HOME=$(/usr/libexec/java_home)” >> ~/.bash_profile |
Source the .bash_profile file
1 |
$ source ~/.bash_profile |
Verify that $JAVA_HOME is set up properly
1 |
$ echo $JAVA_HOME |
SSH
SSH (Remote Login) is disabled by default on MacOS. SSH should be enabled and SSH keys should be set up to manage remote Hadoop daemons.
Enable SSH
Open System Preferences and go to Sharing
Select the Remote Login checkbox to enable SSH

Setup SSH Key
Generate SSH Key
1 |
$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa |
Add the newly created public key to authorized ssh keys
1 2 |
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys $ chmod 0600 ~/.ssh/authorized_keys |
Verify SSH
Verify you can SSH to the localhost with a passphrase:
1 |
$ ssh localhost |
Install and configure Hadoop
Download Hadoop Distribution
Download the latest Hadoop distribution from the official website – https://hadoop.apache.org/releases.html
hadoop-3.1.2 was the latest distribution at the time of writing.
Unpack and move
Unpack the tar file. Update the location in the command if the tar file is in a different directory.
1 |
$ tar xzvf ~/Downloads/hadoop-3.1.2.tar |
Move the Hadoop distribution directory to a preferred directory. We are using /User/ash/bin/ directory to store the Hadoop distribution. You can use any directory of your preference.
1 2 |
$ mkdir /User/ash/bin/ $ mv -f ~/Downloads/hadoop-3.1.2 /User/ash/bin/ |
Set variables
hadoop-env.sh
Edit ~/bin/hadoop-3.1.2/etc/hadoop/hadoop-env.sh file to define the following parameters. Set HADOOP_HOME to the Hadoop distribution location in your machine.
1 2 |
export JAVA_HOME="$(/usr/libexec/java_home)" export HADOOP_HOME=/User/ash/bin/hadoop-3.1.2 |
.bash_profile
Add the following properties to ~/.bash_profile file
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
export HADOOP_VERSION=3.1.2 export HADOOP_HOME=$HOME/bin/hadoop-$HADOOP_VERSION export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export HADOOP_YARN_HOME=$HADOOP_HOME export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native" export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export YARN_HOME=$HADOOP_HOME export HADOOP_INSTALL=$HADOOP_HOME export HADOOP_CONF_DIR=$HADOOP_HOME export HADOOP_LIBEXEC_DIR=$HADOOP_HOME/libexec export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export HADOOP_INSTALL=$HADOOP_HOME export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbin |
Source the .bash_profile file.
1 |
$ source ~/.bash_profile |
Verify the variables are set
Verify $HADOOP_HOME is set.
1 |
$ echo $HADOOP_HOME |
The output should look similar to the following.
$ echo $HADOOP_HOME
/Users/ashwin/bin/hadoop-3.1.2
Verify Hadoop executable binaries are added to $PATH.
1 |
$ hadoop version |
The output should look similar to the following.
$ hadoop version
Hadoop 3.1.2
Source code repository https://github.com/apache/hadoop.git -r 1019dde65bcf12e05ef48ac71e84550d589e5d9a
Compiled by sunilg on 2019-01-29T01:39Z
Compiled with protoc 2.5.0
From source with checksum 64b8bdd4ca6e77cce75a93eb09ab2a9
Configure site.xml files
Modify the following site.xml files with the properties shown below.
mapred-site.xml
$HADOOP_HOME/etc/hadoop/mapred-site.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>yarn.app.mapreduce.am.env</name> <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value> </property> <property> <name>mapreduce.map.env</name> <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value> </property> <property> <name>mapreduce.reduce.env</name> <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value> </property> <property> <name>mapred.job.tracker</name> <value>localhost:8021</value> </property> </configuration> |
yarn-site.xml
$HADOOP_HOME/etc/hadoop/yarn-site.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name> <value>98.5</value> </property> <property> <name>yarn.nodemanager.env-whitelist</name> <value>JAVA_HOME, HADOOP_COMMON_HOME, HADOOP_HDFS_HOME, HADOOP_CONF_DIR, CLASSPATH_PREPEND_DISTCACHE, HADOOP_YARN_HOME, HADOOP_MAPRED_HOME, HDFS_HOME</value> </property> </configuration> |
hdfs-site.xml
$HADOOP_HOME/etc/hadoop/hdfs-site.xml
1 2 3 4 5 6 |
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration> |
core-site.xml
$HADOOP_HOME/etc/hadoop/core-site.xml
1 2 3 4 5 6 |
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration> |
Start up the Hadoop cluster
Start the cluster
Format the HDFS filesystem.
1 |
$ hdfs namenode -format |
Start the Hadoop daemons.
1 |
$ start-all.sh |
Verify the cluster is up
Verify NameNode, DataNode, NodeManager, ResourceManager and SecondaryNameNode are running.
1 |
$ jps |
The output should look similar to the following.
33703 SecondaryNameNode
34376 ResourceManager
34537 Jps
34473 NodeManager
33466 NameNode
33567 DataNode
Information about the cluster
Browse the following web pages to find information about the Hadoop cluster.
Hadoop Health
Browse Hadoop Health web page at http://localhost:9870.

Yarn Resource Manager
Browse Yarn Resource Manager UI at http://localhost:8088/cluster.

Voila! You have a single node Hadoop cluster up and running on your mac.
2 Comments
Hi using the same procedure hadoop -version says
ERROR: -version is not COMMAND nor fully qualified CLASSNAME.
It’s supposed to be
hadoop version
. I have fixed the typo. Thank you.