Posts

Showing posts from August, 2018

Fundamentals of MapReduce (New to MapReduce?)

Image
So people have been asking me to give some details on MapReduce concept. This is a very interesting topic to write about. If you have read my previous post, you would have seen my introduction to Big Data and Hadoop. Now I am going to talk about MapReduce has the heart of Hadoop. Some of you might be new to this, but do not worry, it is going to be described in a way you will quickly understand. To Java developers, it might be much easier, but if you do not have experience in java skills, you can still learn some basic java and master MapReduce. MapReduce is a programming framework that allows performance of distributed and parallel processing on large data sets in a distributed environment. I am talking massive scalability across hundreds or thousands of servers in a Hadoop cluster. Just imagine that for a second. If you see in the diagram above, we have the “Input, Map task, Reduce task ...