TutorialsArena

Character Count Example using MapReduce in Hadoop: A Practical Tutorial

Learn the fundamentals of MapReduce with a practical character count example. This tutorial provides a step-by-step guide to implementing a MapReduce program in Hadoop for character counting, covering key concepts and setup instructions.



Character Count Example using MapReduce in Hadoop

This tutorial demonstrates a basic character count example using MapReduce in Hadoop. This illustrates the fundamental concepts of MapReduce: mapping input data to key-value pairs and then reducing those pairs to generate aggregate results. Before you start, ensure you have Java and Hadoop properly set up on your system. You should also be familiar with basic Java programming and MapReduce concepts.

Prerequisites

  • Java installed and configured correctly.
  • Hadoop installed and running.

You can find instructions for installing Java and Hadoop at https://www.javatpoint.com/hadoop-installation (replace with appropriate links if needed).

Steps to Execute MapReduce Char Count Example

  1. Create Input File: Create a text file (e.g., `info.txt`) containing the text you want to analyze.
  2. Sample info.txt
    
    This is a sample text file for a MapReduce character count example.
                    
  3. Create HDFS Directory: Create a directory in HDFS to store the input file: hdfs dfs -mkdir /charcount
  4. Upload Input File to HDFS: Upload the input file: hdfs dfs -put /path/to/info.txt /charcount
  5. Write MapReduce Program: Create three Java files: WC_Mapper.java, WC_Reducer.java, and WC_Runner.java (code provided below).
  6. Compile and Package: Compile your Java code and create a JAR (Java ARchive) file (e.g., `charcountdemo.jar`).
  7. Run the JAR: Execute the JAR file using the Hadoop command:
  8. Running the MapReduce Job
    
    hadoop jar /path/to/charcountdemo.jar com.javatpoint.WC_Runner /charcount/info.txt /charcount_output
                    
  9. View Output: The output will be in /charcount_output/part-r-00000. Use the following command to view it:
  10. Viewing Output
    
    hdfs dfs -cat /charcount_output/part-r-00000
                    

MapReduce Code (Java)

WC_Mapper.java

package com.javatpoint;
// ... imports ...

public class WC_Mapper extends MapReduceBase implements Mapper {
    public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException {
        String line = value.toString();
        String[] tokens = line.split("");
        for (String token : tokens) {
            output.collect(new Text(token), new IntWritable(1));
        }
    }
}
            
WC_Reducer.java

package com.javatpoint;
// ... imports ...

public class WC_Reducer extends MapReduceBase implements Reducer {
    public void reduce(Text key, Iterator values, OutputCollector output, Reporter reporter) throws IOException {
        int sum = 0;
        while (values.hasNext()) {
            sum += values.next().get();
        }
        output.collect(key, new IntWritable(sum));
    }
}
            
WC_Runner.java

package com.javatpoint;
// ... imports ...

public class WC_Runner {
    public static void main(String[] args) throws IOException {
        JobConf conf = new JobConf(WC_Runner.class);
        // ... job configuration ...
    }
}