Understanding Apache Hive Architecture: From Query to Execution
Explore the architecture of Apache Hive and learn how queries are processed from submission to execution. Discover its versatile client interfaces and how they enable interaction with Hive using various programming languages.
Apache Hive Architecture
Understanding the Hive Query Processing Flow
This section describes the architecture of Apache Hive and how a Hive query is processed from submission to execution.
Hive Client
Hive offers versatile client interfaces allowing you to interact with it using various programming languages like Java, Python, and C++. Supported clients include:
- Thrift Server: A cross-language service that handles requests from clients using the Thrift protocol.
- JDBC Driver: Enables Java applications to connect to and interact with Hive. The driver class is
org.apache.hadoop.hive.jdbc.HiveDriver
. - ODBC Driver: Allows applications supporting the ODBC protocol to connect to Hive.
Hive Services
Hive provides several key services:
- Hive CLI (Command Line Interface): A shell for executing Hive queries and commands.
- Hive Web UI (User Interface): A web-based alternative to the Hive CLI, offering a graphical interface for query execution.
- Hive Metastore: A central repository that stores metadata about tables, partitions, columns, data types, serializers/deserializers, and the locations of data files in HDFS.
- Hive Server (Apache Thrift Server): Receives requests from various clients and forwards them to the Hive Driver.
- Hive Driver: Receives queries from clients (CLI, Web UI, Thrift, JDBC/ODBC) and passes them to the compiler.
- Hive Compiler: Parses queries, performs semantic analysis, and translates HiveQL statements into MapReduce jobs.
- Hive Execution Engine (Optimizer): Generates an execution plan (a Directed Acyclic Graph or DAG) of MapReduce and HDFS tasks. The execution engine then executes these tasks in the correct order based on their dependencies.
This architecture shows how Hive processes queries, from client submission through compilation and execution using the underlying Hadoop infrastructure.