Apache Hadoop: Single-node or Pseudo-distributed cluster on macOS

This article walks through setting up and configuring a single-node Hadoop Cluster or pseudo-distributed cluster on macOS. A single-node cluster is very useful for development as it reduces the need for an actual cluster for running quick tests.
At the end of this tutorial, you’ll have a single-node Hadoop cluster with all the essential Hadoop daemons such as NameNode, DataNode, NodeManager, ResourceManager, and SecondaryNameNode.
Prerequisites
The two prerequisites for setting up a single-node Hadoop cluster are Java and SSH.
Java
Java must be installed and $JAVA_HOME environment variable should be set.
Install Java
Install java from the official website – https://java.com/en/download/
Verify Java is installed
Check the version of java in the terminal
$ java -version
If java is installed, java version will be printed similarly to the below output.
$ java -version
java version “1.8.0_211”
Java(TM) SE Runtime Environment (build 1.8.0_211-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.211-b12, mixed mode)
Set $JAVA_HOME
Add $JAVA_HOME environment variable to .bash_profile file
$ export “JAVA_HOME=$(/usr/libexec/java_home)” >> ~/.bash_profile
Source the .bash_profile file
$ source ~/.bash_profile
Verify that $JAVA_HOME is set up properly
$ echo $JAVA_HOME
SSH
SSH (Remote Login) is disabled by default on MacOS. SSH should be enabled and SSH keys should be set up to manage remote Hadoop daemons.
Enable SSH
Open System Preferences and go to Sharing
Select the Remote Login checkbox to enable SSH

Setup SSH Key
Generate SSH Key
$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
Add the newly created public key to authorized SSH keys
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys
Verify SSH
Verify you can SSH to the localhost with a passphrase:
$ ssh localhost
Install and configure Hadoop
Download Hadoop Distribution
Download the latest Hadoop distribution from the official website – https://hadoop.apache.org/releases.html
hadoop-3.1.2 was the latest distribution at the time of writing.
Unpack and move
Unpack the tar file. Update the location in the command if the tar file is in a different directory.
$ tar xzvf ~/Downloads/hadoop-3.1.2.tar
Move the Hadoop distribution directory to a preferred directory. We are using /User/ash/bin/ directory to store the Hadoop distribution. You can use any directory of your preference.
$ mkdir /User/ash/bin/
$ mv -f ~/Downloads/hadoop-3.1.2 /User/ash/bin/
Set variables
hadoop-env.sh
Edit ~/bin/hadoop-3.1.2/etc/hadoop/hadoop-env.sh file to define the following parameters. Set HADOOP_HOME to the Hadoop distribution location in your machine.
export JAVA_HOME="$(/usr/libexec/java_home)"
export HADOOP_HOME=/User/ash/bin/hadoop-3.1.2
.bash_profile
Add the following properties to ~/.bash_profile file
export HADOOP_VERSION=3.1.2
export HADOOP_HOME=$HOME/bin/hadoop-$HADOOP_VERSION
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export YARN_HOME=$HADOOP_HOME
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME
export HADOOP_LIBEXEC_DIR=$HADOOP_HOME/libexec
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
Source the .bash_profile file.
$ source ~/.bash_profile
Verify the variables are set
Verify $HADOOP_HOME is set.
$ echo $HADOOP_HOME
The output should look similar to the following.
$ echo $HADOOP_HOME
/Users/ashwin/bin/hadoop-3.1.2
Verify Hadoop executable binaries are added to $PATH.
$ hadoop version
The output should look similar to the following.
$ hadoop version
Hadoop 3.1.2
Source code repository https://github.com/apache/hadoop.git -r 1019dde65bcf12e05ef48ac71e84550d589e5d9a
Compiled by sunilg on 2019-01-29T01:39Z
Compiled with protoc 2.5.0
From source with checksum 64b8bdd4ca6e77cce75a93eb09ab2a9
Configure site.xml files
Modify the following site.xml files with the properties shown below.
mapred-site.xml
$HADOOP_HOME/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>
</configuration>
yarn-site.xml
$HADOOP_HOME/etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
<value>98.5</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME, HADOOP_COMMON_HOME, HADOOP_HDFS_HOME, HADOOP_CONF_DIR, CLASSPATH_PREPEND_DISTCACHE, HADOOP_YARN_HOME, HADOOP_MAPRED_HOME, HDFS_HOME</value>
</property>
</configuration>
hdfs-site.xml
$HADOOP_HOME/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
core-site.xml
$HADOOP_HOME/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Start up the Hadoop cluster
Start the cluster
Format the HDFS filesystem.
$ hdfs namenode -format
Start the Hadoop daemons.
$ start-all.sh
Verify the cluster is up
Verify NameNode, DataNode, NodeManager, ResourceManager, and SecondaryNameNode are running.
$ jps
The output should look similar to the following.
33703 SecondaryNameNode
34376 ResourceManager
34537 Jps
34473 NodeManager
33466 NameNode
33567 DataNode
Information about the cluster
Browse the following web pages to find information about the Hadoop cluster.
Hadoop Health
Browse the Hadoop Health web page at http://localhost:9870.

Yarn Resource Manager
Browse Yarn Resource Manager UI at http://localhost:8088/cluster.

Voila! You have a single-node Hadoop cluster up and running on your Mac.
2 Comments
Hi using the same procedure hadoop -version says
ERROR: -version is not COMMAND nor fully qualified CLASSNAME.
It’s supposed to be `hadoop version`. I have fixed the typo. Thank you.