Lab 2 – Hadoop MapReduce

In the lecture we saw an example MapReduce process retrieving the maximum temperature per year from a large data set of weather recordings. Now, lets look at counting word occurrences in a huge corpus of text documents.

Part 1: MR Processes and WordCount Example (25min)

First we will discuss the Map, Group&Sort, and Reduce Phase and see what happens in each (I will draw this on the board). Now, it’s time for another example: the “hello-world” program for MapReduce: word count (check the lab slides).

Part 2: Lab Quiz (20min)

The quiz asks questions on the basic MapReduce processes we just discussed. For all other questions skim through the lab slides to find the answers. The lab slides cover the following topics:

  • word count example
  • how to compile and run MapReduce jobs in Hadoop
  • example Mappers and Reducers

Part 3: Running a MapReduce Job in Hadoop (35min)

Let’s count how often each word occurs in Shakespeare’s work! We will

  • look at the Java code of the word count Mapper and Reducer,
  • compile it, and
  • run the job using the command line.

Then, we will look at the YARN Resource Manager UI and the Hue Job Browser. Both come in handy when you want to monitor the status of your cluster and job submission.

Download the step by step lab instructions HERE.