TutorialsArena

Loading Data into Hive Tables: A Practical Guide

Learn how to load data into your Hive tables using the `LOAD DATA` command. This guide covers loading data from various file formats and locations, providing practical examples and best practices for efficient data ingestion.



Loading Data into Hive Tables

After creating a table in Hive, you need to load data into it. Hive allows loading data from various file formats and locations. This guide demonstrates loading data using the LOAD DATA command. This command is used to load data from files into Hive tables.

Loading Data into a Hive Table

The basic syntax for loading data is:

LOAD DATA Syntax

LOAD DATA LOCAL INPATH 'filePath' INTO TABLE databaseName.tableName;
            

Where:

  • filePath: The path to the data file (local file system).
  • databaseName.tableName: The fully-qualified name of your Hive table.

Example: Loading Employee Data

Let's assume you have an `emp_details` file with employee data (Id, Name, Salary, Department). To load this data into the `demo.employee` table (assuming this table is already created and its schema matches the data):

Loading Data into Employee Table

LOAD DATA LOCAL INPATH '/path/to/emp_details.csv' INTO TABLE demo.employee;
            

To verify the data was loaded:

Checking Loaded Data

SELECT * FROM demo.employee;
            

Handling Mismatched Data

If your data file has mismatched data types or extra columns compared to your table schema, Hive will not throw an error. Instead, it inserts `NULL` values for mismatched fields. For example, if a column in your data file is of a different data type than what is specified for that column in the table’s schema, Hive will load `NULL` into that column for the rows containing that mismatched data.

Example: Loading a file (`emp_details2.csv`) with mismatched data:

Loading Mismatched Data

LOAD DATA LOCAL INPATH '/path/to/emp_details2.csv' INTO TABLE demo.employee;
SELECT * FROM demo.employee;