Character Count Example using MapReduce in Hadoop: A Practical Tutorial
Learn the fundamentals of MapReduce with a practical character count example. This tutorial provides a step-by-step guide to implementing a MapReduce program in Hadoop for character counting, covering key concepts and setup instructions.
Character Count Example using MapReduce in Hadoop
This tutorial demonstrates a basic character count example using MapReduce in Hadoop. This illustrates the fundamental concepts of MapReduce: mapping input data to key-value pairs and then reducing those pairs to generate aggregate results. Before you start, ensure you have Java and Hadoop properly set up on your system. You should also be familiar with basic Java programming and MapReduce concepts.
Prerequisites
- Java installed and configured correctly.
- Hadoop installed and running.
You can find instructions for installing Java and Hadoop at https://www.javatpoint.com/hadoop-installation (replace with appropriate links if needed).
Steps to Execute MapReduce Char Count Example
- Create Input File: Create a text file (e.g., `info.txt`) containing the text you want to analyze.
- Create HDFS Directory: Create a directory in HDFS to store the input file:
hdfs dfs -mkdir /charcount
- Upload Input File to HDFS: Upload the input file:
hdfs dfs -put /path/to/info.txt /charcount
- Write MapReduce Program: Create three Java files:
WC_Mapper.java
,WC_Reducer.java
, andWC_Runner.java
(code provided below). - Compile and Package: Compile your Java code and create a JAR (Java ARchive) file (e.g., `charcountdemo.jar`).
- Run the JAR: Execute the JAR file using the Hadoop command:
- View Output: The output will be in
/charcount_output/part-r-00000
. Use the following command to view it:
Sample info.txt
This is a sample text file for a MapReduce character count example.
Running the MapReduce Job
hadoop jar /path/to/charcountdemo.jar com.javatpoint.WC_Runner /charcount/info.txt /charcount_output
Viewing Output
hdfs dfs -cat /charcount_output/part-r-00000
MapReduce Code (Java)
WC_Mapper.java
package com.javatpoint;
// ... imports ...
public class WC_Mapper extends MapReduceBase implements Mapper {
public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException {
String line = value.toString();
String[] tokens = line.split("");
for (String token : tokens) {
output.collect(new Text(token), new IntWritable(1));
}
}
}
WC_Reducer.java
package com.javatpoint;
// ... imports ...
public class WC_Reducer extends MapReduceBase implements Reducer {
public void reduce(Text key, Iterator values, OutputCollector output, Reporter reporter) throws IOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
}
}
WC_Runner.java
package com.javatpoint;
// ... imports ...
public class WC_Runner {
public static void main(String[] args) throws IOException {
JobConf conf = new JobConf(WC_Runner.class);
// ... job configuration ...
}
}