Specific Amount of Storage as a Data node in Hadoop Cluster

Ashish Mangal
4 min readOct 14, 2020

--

Hadoop Tutorial

During a practical using Hadoop cluster I saw that initially data node is providing its full storage to Name node and if we need very less amount of storage then we have it cause wastage of storage as storage cost us a lot.

So it is necessary to regularly monitoring the storage and providing more as we need rather buying a big amount at one time. During the research about this problem one solution I found is creating the partitions in hard drive according to storage we need and provide it to name node as needed. Now, before showing to how to achieve it let’s see the problem practically-:

Initially, My data node(slave node)has ~50GiB of space and when I started the services the detailed report coming out is-:

So it shows that data node is providing ~50GB to name node that is its full storage. But Suppose I have a file of 5GB and I want to upload my file in this name node. For this using a 50GB storage is not a profitable work So initially a little more than 5GB storage is sufficient.

Now the question is how to solve this problem as we have to pay some amount for a single GB infrastructure as it become a huge for big data. To make it cost effective the only way is to provide the storage as we need like in this case I need a little more than 5 GB.

To achieve it we have to divide the hard disk into different pieces or we can say that we need to do partition of the hard disk and then provide storage in parts to name node.

So for solving the above problem it is necessary to learn how to do partition in Redhat 8 Linux that is running inside the Oracle VM(I show here the practical in Redhat 8 Linux but you can do it in any OS the concept remain same). For this you can go through these YouTube videos.

https://www.youtube.com/watch?v=awaNgkMYLTE&t=1197s

These videos give you a great knowledge on partitions.

Firstly, Let’s see the details of hard disk and its partitions for this we use the command-:

fdisk -l

So in my case I have already done the partitions and create a disk /dev/sdb1 of size 10GB.

Now create a directory or folder using the command mkdir {filename} and mount it with the partitions using the command mount /dev/sdb1 {filename}.

After mounting everything stored in /dn6 is stored inside the hard disk of 10GB. Now configure the Hadoop cluster with name node and data node that store the client file in /dn6 file.

Name Node hdfs-site.xml
Name Node core-site.xml
Data node hdfs-site.xml
Data node core-site.xml
Starting Data node Services

After starting Name Node services and check the report let’s see what it shows-:

Miracle!!! Look what we have achieved! Now the storage is just 10 GB and we save 40 GB till we get more data and how cost effective it is for us.

So this is a small research I have done in the field of Big data using Hadoop Cluster to provide a specific amount of storage as a Data Node. I hope you received a great knowledge from this article. I would like to thank you all who helped me in this research. I am open for your suggestions to improve this article and knowledge.

Thank You 😊😊

--

--

Ashish Mangal
Ashish Mangal

Written by Ashish Mangal

0 Followers

My name is Ashish Mangal. Currently I am pursuing B.tech in Electronics and Communication. A Technology Enthusiast!!!

No responses yet