Using GROUP BY and HAVING Clauses in HiveQL

Learn how to use the `GROUP BY` and `HAVING` clauses in HiveQL to group and filter data efficiently. This guide provides clear explanations and examples for summarizing and filtering data based on aggregated values.

GROUP BY and HAVING Clauses in HiveQL

HiveQL (Hive Query Language) provides the GROUP BY and HAVING clauses, which function similarly to their SQL counterparts. These clauses are very useful for summarizing and filtering data based on groups. Understanding how to use them effectively is essential for working with Hive.

GROUP BY Clause

The GROUP BY clause groups rows with matching values in one or more columns. It's typically used with aggregate functions (SUM(), AVG(), COUNT(), MIN(), MAX()) to compute aggregate values for each group.

Example: Let's use a sample `emp` table (Id, Name, Salary, Department). Assume this table has been created and populated with data.

GROUP BY Example


SELECT Department, SUM(Salary) AS TotalSalary
FROM emp
GROUP BY Department;

HAVING Clause

The HAVING clause filters groups created by GROUP BY. It applies a condition to the results of the aggregate functions. It's crucial to remember that you cannot use aggregate functions in a `WHERE` clause; use `HAVING` instead.

HAVING Example


SELECT Department, SUM(Salary) AS TotalSalary
FROM emp
GROUP BY Department
HAVING SUM(Salary) > 35000;

Follow On