MapReduce APIs in Hadoop: Mapper, Reducer, and Job Classes (Legacy API)
Learn the core APIs of the legacy Hadoop MapReduce framework: Mapper, Reducer, and Job classes. This guide provides an overview of these classes and their methods, explaining how they work together to process large datasets in parallel. Note: For newer Hadoop versions, the YARN API is recommended.
MapReduce APIs in Hadoop: Mapper, Reducer, and Job Classes
MapReduce is a programming model for processing large datasets in parallel. Understanding the core APIs (Application Programming Interfaces) for MapReduce—the Mapper, Reducer, and Job classes—is essential for writing effective MapReduce programs. This guide provides an overview of these classes and their methods. Note that this uses the older Hadoop MapReduce API (org.apache.hadoop.mapred); for newer Hadoop versions, you should generally use the YARN API instead.
MapReduce Mapper Class
The Mapper
class is responsible for mapping input key-value pairs to intermediate key-value pairs. The mapper transforms the input records from the InputSplit into intermediate key-value pairs. These pairs are then passed to the reducer for aggregation.
Method | Description |
---|---|
void cleanup(Context context) |
Called once at the end of the map task. |
void map(KEYIN key, VALUEIN value, Context context) |
Called once for each key-value pair in the input split. |
void run(Context context) |
Can be overridden to control the mapper's execution. |
void setup(Context context) |
Called once at the beginning of the map task. |
MapReduce Reducer Class
The Reducer
class aggregates intermediate values. It receives a key and an iterable collection of values associated with that key. It combines these values and produces output key-value pairs.
Method | Description |
---|---|
void cleanup(Context context) |
Called once at the end of the reduce task. |
void reduce(KEYIN key, Iterable<VALUEIN> values, Context context) |
Called once for each unique key. |
void run(Context context) |
Can be overridden to customize the reducer's execution. |
void setup(Context context) |
Called once at the beginning of the reduce task. |
MapReduce Job Class
The Job
class configures and submits a MapReduce job. It controls the job's execution and allows you to query the job's status.
Method | Description |
---|---|
Counters getCounters() |
Gets the counters for the job. |
long getFinishTime() |
Gets the job's finish time. |
Job getInstance() |
Creates a new job instance. |
String getJobFile() |
Gets the path of the job configuration. |
String getJobName() |
Gets the job's name. |
void setJarByClass(Class<?> c) |
Sets the JAR file containing the job's code. |
void setJobName(String name) |
Sets the job's name. |
void setMapperClass(Class<? extends Mapper> cls) |
Sets the mapper class. |
void setReducerClass(Class<? extends Reducer> cls) |
Sets the reducer class. |