Using GROUP BY and HAVING Clauses in HiveQL
Learn how to use the `GROUP BY` and `HAVING` clauses in HiveQL to group and filter data efficiently. This guide provides clear explanations and examples for summarizing and filtering data based on aggregated values.
GROUP BY and HAVING Clauses in HiveQL
HiveQL (Hive Query Language) provides the GROUP BY
and HAVING
clauses, which function similarly to their SQL counterparts. These clauses are very useful for summarizing and filtering data based on groups. Understanding how to use them effectively is essential for working with Hive.
GROUP BY Clause
The GROUP BY
clause groups rows with matching values in one or more columns. It's typically used with aggregate functions (SUM()
, AVG()
, COUNT()
, MIN()
, MAX()
) to compute aggregate values for each group.
Example: Let's use a sample `emp` table (Id, Name, Salary, Department). Assume this table has been created and populated with data.
GROUP BY Example
SELECT Department, SUM(Salary) AS TotalSalary
FROM emp
GROUP BY Department;
HAVING Clause
The HAVING
clause filters groups created by GROUP BY
. It applies a condition to the results of the aggregate functions. It's crucial to remember that you cannot use aggregate functions in a `WHERE` clause; use `HAVING` instead.
HAVING Example
SELECT Department, SUM(Salary) AS TotalSalary
FROM emp
GROUP BY Department
HAVING SUM(Salary) > 35000;