ARTH Task: 4.1,4.2

How to Limit storage of contribution as slave node

Hitesh
4 min readSep 7, 2022

Task Description:

Individual/Team task:
🔷In a Hadoop cluster, find how to
contribute limited/specific amount
of storage as slave to the cluster?
✴️Hint: Linux partitions

For Setup Hadoop Cluster we have to launch two EC2 Instances on the top of AWS first as NameNode second as DataNode which is also known as Master node and Slave node.

Configuration of Namenode:-

For this we can transfer JAVA JDK and Hadoop Software from windows to Namenode ec2 instance using WinSCP Software.

To check hadoop and java is installed or not we can run these commands.

Now go inside hadoop configuration directory of Name Node and create a new directory:-

mkdir /nncd /etc/hadoopvim hdfs-site.xmlvim core-site.xml

Format and start hadoop namenode using this commands :-

==> hadoop namenode -format==> hadoop-daemon.sh start namenode

To check report ,

hadoop dfsadmin -report

We done same configuration in datanode also.

Now go inside hadoop configuration directory of Data Node and create a new directory:-

==> mkdir /dn==> cd /etc/hadoop==> ls

Start the hadoop datanode services and check report using this commands:-

==> hadoop-daemon.sh start datanode==> jps==>hadoop dfsadmin -report

we can also done the configuration in Client side.

We can also go to Web UI of our cluster using master public ip : 50070 port no of Web UI of Master Node.

Now for creating a partition we have to create a Volume in AWS and attached to our Data Node and I have selected 20GB size of EBS Volume.

==> fdisk -l

For creating a partition we have three steps:-

1. Create a new partition.

2. format a new partition.

3. mount the partition.

fdsik /dev/xvdfmkfs.ext4 /dev/xvdf1
mkdir /dm1mount /dev/xvdf1 /dm1
==> hadoop-daemon.sh start datanode==> jps==> hadoop dfsadmin -report

we can see that Data Node is sharing only 2GB storage to Name Node successfully.

Task 4.2 :- Team task:

🔷According to popular articles:-

Hadoop uses the concept of parallelism to upload the split data while fulfilling Velocity problem.

👉🏻 Research with your teams and conclude this statement with proper proof.

✴️Hint: tcpdump

So Let’s Jump to our Pratical,

When We are doing this task with our team we can see the hadoop using concept of parallelism is a biggest Myth in the market.

But In Reality Hadoop store the data serial wise to their data node with millisecond difference of time.

So for this we can use tcp dump command to read the entire packet of our data in linux.

What is tcpdump Command in linux ?

tcpdump is a most powerful and widely used command-line packets sniffer or package analyzer tool which is used to capture or filter TCP/IP packets that received or transferred over a network on a specific interface. It is available under most of the Linux/Unix based operating systems.

tcpdump -i eth0 tcp port 9001 -ntcpdump -i eth0 tcp port 9001 -n -X

--

--