TutorialsArena

HiveQL Built-in Functions: A Comprehensive Guide with Examples

Explore the power of HiveQL's built-in functions with this comprehensive guide. Learn how to use mathematical, aggregate, and string manipulation functions to perform data transformations and analysis within Hive.



HiveQL Built-in Functions

Using HiveQL Functions

HiveQL provides a rich set of built-in functions for performing various operations on data. This section demonstrates mathematical, aggregate, and string manipulation functions using a sample table.

Setting up the Example Table

  1. Select Database: Choose the database where you'll create the table:
  2. HiveQL Command (Selecting Database)
    
    hive> use hql;
    
  3. Create Table: Create a table named employee_data:
  4. HiveQL Command (Creating Table)
    
    hive> CREATE TABLE employee_data (Id INT, Name STRING, Salary FLOAT)
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ',';
    
  5. Load Data: Load data into the table. Replace '/home/codegyani/hive/emp_details' with your data file path:
  6. HiveQL Command (Loading Data)
    
    hive> LOAD DATA LOCAL INPATH '/home/codegyani/hive/emp_details' INTO TABLE employee_data;
    
  7. View Data: Verify data loading:
  8. HiveQL Command (Viewing Data)
    
    hive> SELECT * FROM employee_data;
    

Mathematical Functions

Hive offers various mathematical functions. Here are some examples:

Return Type Function Description
BIGINT round(num) Rounds a DOUBLE to the nearest BIGINT.
BIGINT floor(num) Returns the largest BIGINT less than or equal to num.
BIGINT ceil(num), ceiling(num) Returns the smallest BIGINT greater than or equal to num.
DOUBLE exp(num) Returns the exponential of num.
DOUBLE ln(num) Returns the natural logarithm of num.
DOUBLE log10(num) Returns the base-10 logarithm of num.
DOUBLE sqrt(num) Returns the square root of num.
DOUBLE abs(num) Returns the absolute value of num.
DOUBLE sin(d), asin(d), cos(d), acos(d), tan(d), atan(d) Trigonometric functions (in radians).

Example: Calculate the square root of salaries:

HiveQL Command (Mathematical Function Example)

hive> SELECT Id, Name, sqrt(Salary) FROM employee_data;

Aggregate Functions

Aggregate functions compute a single value from multiple rows:

Return Type Function Description
BIGINT count(*) Counts the number of rows.
DOUBLE sum(col) Sums the values in a column.
DOUBLE sum(DISTINCT col) Sums distinct values in a column.
DOUBLE avg(col) Calculates the average of values in a column.
DOUBLE avg(DISTINCT col) Calculates the average of distinct values in a column.
DOUBLE min(col) Finds the minimum value in a column.
DOUBLE max(col) Finds the maximum value in a column.

Examples:

HiveQL Commands (Aggregate Function Examples)

hive> SELECT max(Salary) FROM employee_data;
hive> SELECT min(Salary) FROM employee_data;

Other Built-in Functions

Hive includes many other useful functions:

Return Type Function Description
INT length(str) Returns the length of a string.
STRING reverse(str) Reverses a string.
STRING concat(str1, str2, ...) Concatenates strings.
STRING substr(str, start_index), substr(str, start, length) Extracts a substring.
STRING upper(str) Converts a string to uppercase.
STRING lower(str) Converts a string to lowercase.
STRING trim(str), ltrim(str), rtrim(str) Removes whitespace from a string (all, left, or right).

Examples:

HiveQL Commands (Other Function Examples)

hive> SELECT Id, upper(Name) FROM employee_data;
hive> SELECT Id, lower(Name) FROM employee_data;