Basics of HDFS and installation steps on MacOS

Dhruv Saksena
3 min readAug 15, 2021

HDFS stands for Hadoop Distributed File System. In today’s world where a lot of data is churned out everyday, we need to build systems which which scale with enormous data and can achieve higher level of computation.

Storing data on a single file system works very well for small amount of data, but when you are dealing with TeraBytes/PetaBytes of data then having it on a distributed file system across multiple machines gives advantage of parallel computation.

Whenever HDFS loads a file into its file-system. It partitions that into multiple blocks, each block has a size of 128mb(default configuration). These blocks are stored on multiple data nodes and which dataset is located where is managed by NameNode.

HDFS contains a NameNode and DataNode. DataNodes are where data is stored in HDFS, whereas NameNode stores the metadata information about the whereabouts of the data-

In this article, we will setup Hadoop-3.2.2 on our local system in a single node cluster-

First create directory in your local-filesystem to keep Hadoop installations-

mkdir hadoop

Download and extract the hadoop installer from the link and place that in the hadoop folder. I’ve used hadoop-3.2.2 here.

Below, will be the folder structure-

Before you start working on the hadoop installation, allow remote login on your computer in the ‘System Preferences’. Also, you need Java installed in your local machine-

Now, go to the /etc/hadoop folder in the installation and open core-site.xml and add the below to configuration section-

<configuration>
<property>
<name>fs.defaultFS</name
<value>hdfs://localhost:9000</value>
</property>
</configuration>

Now, open the file hdfs-site.xml and make the following changes-

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

Now, open hadoop-env.sh and add JAVA_HOME into the file

export JAVA_HOME=/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home

Now, setup passphraseless ssh-

$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys

From the home directory of your hadoop installation, enter the following command to format the name node

$ bin/hadoop namenode -format

Now, start all the nodes with the below command-

$ sbin/start-all.sh

Now, execute the command jps to verify the deployment-

To, check the health of your system go to http://localhost:9870/dfshealth.html#tab-overview

To create a folder in HDFS use the following command-

$ bin/hdfs dfs -mkdir -p /user/WordCount

To put a file in the directory use the following command-

$ hdfs dfs -put ./ec.txt /user/wordCount/input

You, can see the same file in the web-view aswell now-

--

--