TutorialsArena

Apache Pig Run Modes: Local and MapReduce

Learn about the two main execution modes of Apache Pig: Local Mode for single JVM execution and MapReduce Mode for distributed processing on Hadoop clusters. Understand when to use each mode for developing and deploying your Pig scripts.



Apache Pig Run Modes

Understanding Local and MapReduce Modes

Apache Pig offers two primary execution modes: Local Mode and MapReduce Mode (also known as Hadoop Mode). Each mode is suited for different scenarios in the development and deployment of Pig scripts.

1. Local Mode

Local mode executes Pig scripts within a single Java Virtual Machine (JVM). This mode is ideal for:

  • Development and testing of Pig scripts
  • Experimentation with Pig Latin commands
  • Prototyping applications

In local mode, input and output data reside on the local file system. To run Pig in local mode, use this command:

Command to Run Pig in Local Mode

$ pig -x local

2. MapReduce Mode (Hadoop Mode)

MapReduce mode (the default mode) leverages the power of Hadoop to execute Pig scripts in a distributed manner across a cluster of machines. Pig translates Pig Latin scripts into MapReduce jobs for execution.

  • Suitable for processing large datasets that exceed the capacity of a single machine.
  • Works with both semi-distributed and fully distributed Hadoop installations.
  • Input and output data are typically stored in HDFS.

To run Pig in MapReduce mode, use either of these commands:

Commands to Run Pig in MapReduce Mode

$ pig

$ pig -x mapreduce

Ways to Execute Pig Programs

Pig programs can be run in several ways, regardless of the chosen mode (Local or MapReduce):

  1. Interactive Mode (Grunt Shell): Execute Pig commands interactively within the Grunt shell. This is useful for testing individual commands or small scripts.
  2. Batch Mode: Execute a Pig script (a .pig file containing Pig Latin commands) from a file. This is the preferred method for larger, more complex scripts.
  3. Embedded Mode: Integrate custom functions (User-Defined Functions or UDFs) written in languages like Java or Python into your Pig scripts.