Resolving Permission Issue in Multi-node Hadoop Cluster

Sometimes it has been observed that when we configure and deploy multi-node Hadoop cluster or add new DataNodes, there is an SSH permission issue in communication with Hadoop daemons.

This short article aims to explain how we can resolve the permission issue among DataNodes with NameNode while trying to establish the Secure Shell/SSH without a passphrase. By using DataNode Protocol, all DataNodes talk to the NameNode. By design, the NameNode never initiates any RPCs (Remote Procedure Call). Instead, it only responds to RPC requests issued by DataNodes or clients.

HDFS Architecture and Functioning

First of all, thank you for the overwhelming response to my previous article (Big Data and Hadoop: An Introduction). In my previous article, I gave a brief overview of Hadoop and its benefits. If you have not read it yet, please spend some time to get a glimpse into this rapidly growing technology. In this article, we will be taking a deep dive into the file system used by Hadoop called HDFS (Hadoop Distributed File System).

HDFS is the storage part of the Hadoop System. It is a block-structured file system where each file is divided into blocks of a predetermined size. These blocks are stored across a cluster of one or several machines. HDFS works with two types of nodes: NameNode (master) and DataNodes (slave). So let's dive.