HiveQL Built-in Functions: A Comprehensive Guide with Examples
Explore the power of HiveQL's built-in functions with this comprehensive guide. Learn how to use mathematical, aggregate, and string manipulation functions to perform data transformations and analysis within Hive.
HiveQL Built-in Functions
Using HiveQL Functions
HiveQL provides a rich set of built-in functions for performing various operations on data. This section demonstrates mathematical, aggregate, and string manipulation functions using a sample table.
Setting up the Example Table
- Select Database: Choose the database where you'll create the table:
- Create Table: Create a table named
employee_data
: - Load Data: Load data into the table. Replace
'/home/codegyani/hive/emp_details'
with your data file path: - View Data: Verify data loading:
HiveQL Command (Selecting Database)
hive> use hql;
HiveQL Command (Creating Table)
hive> CREATE TABLE employee_data (Id INT, Name STRING, Salary FLOAT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';
HiveQL Command (Loading Data)
hive> LOAD DATA LOCAL INPATH '/home/codegyani/hive/emp_details' INTO TABLE employee_data;
HiveQL Command (Viewing Data)
hive> SELECT * FROM employee_data;
Mathematical Functions
Hive offers various mathematical functions. Here are some examples:
Return Type | Function | Description |
---|---|---|
BIGINT | round(num) |
Rounds a DOUBLE to the nearest BIGINT. |
BIGINT | floor(num) |
Returns the largest BIGINT less than or equal to num. |
BIGINT | ceil(num), ceiling(num) |
Returns the smallest BIGINT greater than or equal to num. |
DOUBLE | exp(num) |
Returns the exponential of num. |
DOUBLE | ln(num) |
Returns the natural logarithm of num. |
DOUBLE | log10(num) |
Returns the base-10 logarithm of num. |
DOUBLE | sqrt(num) |
Returns the square root of num. |
DOUBLE | abs(num) |
Returns the absolute value of num. |
DOUBLE | sin(d), asin(d), cos(d), acos(d), tan(d), atan(d) |
Trigonometric functions (in radians). |
Example: Calculate the square root of salaries:
HiveQL Command (Mathematical Function Example)
hive> SELECT Id, Name, sqrt(Salary) FROM employee_data;
Aggregate Functions
Aggregate functions compute a single value from multiple rows:
Return Type | Function | Description |
---|---|---|
BIGINT | count(*) |
Counts the number of rows. |
DOUBLE | sum(col) |
Sums the values in a column. |
DOUBLE | sum(DISTINCT col) |
Sums distinct values in a column. |
DOUBLE | avg(col) |
Calculates the average of values in a column. |
DOUBLE | avg(DISTINCT col) |
Calculates the average of distinct values in a column. |
DOUBLE | min(col) |
Finds the minimum value in a column. |
DOUBLE | max(col) |
Finds the maximum value in a column. |
Examples:
HiveQL Commands (Aggregate Function Examples)
hive> SELECT max(Salary) FROM employee_data;
hive> SELECT min(Salary) FROM employee_data;
Other Built-in Functions
Hive includes many other useful functions:
Return Type | Function | Description |
---|---|---|
INT | length(str) |
Returns the length of a string. |
STRING | reverse(str) |
Reverses a string. |
STRING | concat(str1, str2, ...) |
Concatenates strings. |
STRING | substr(str, start_index) , substr(str, start, length) |
Extracts a substring. |
STRING | upper(str) |
Converts a string to uppercase. |
STRING | lower(str) |
Converts a string to lowercase. |
STRING | trim(str) , ltrim(str) , rtrim(str) |
Removes whitespace from a string (all, left, or right). |
Examples:
HiveQL Commands (Other Function Examples)
hive> SELECT Id, upper(Name) FROM employee_data;
hive> SELECT Id, lower(Name) FROM employee_data;