Loading Data into Hive Tables: A Practical Guide
Learn how to load data into your Hive tables using the `LOAD DATA` command. This guide covers loading data from various file formats and locations, providing practical examples and best practices for efficient data ingestion.
Loading Data into Hive Tables
After creating a table in Hive, you need to load data into it. Hive allows loading data from various file formats and locations. This guide demonstrates loading data using the LOAD DATA
command. This command is used to load data from files into Hive tables.
Loading Data into a Hive Table
The basic syntax for loading data is:
LOAD DATA Syntax
LOAD DATA LOCAL INPATH 'filePath' INTO TABLE databaseName.tableName;
Where:
filePath
: The path to the data file (local file system).databaseName.tableName
: The fully-qualified name of your Hive table.
Example: Loading Employee Data
Let's assume you have an `emp_details` file with employee data (Id, Name, Salary, Department). To load this data into the `demo.employee` table (assuming this table is already created and its schema matches the data):
Loading Data into Employee Table
LOAD DATA LOCAL INPATH '/path/to/emp_details.csv' INTO TABLE demo.employee;
To verify the data was loaded:
Checking Loaded Data
SELECT * FROM demo.employee;
Handling Mismatched Data
If your data file has mismatched data types or extra columns compared to your table schema, Hive will not throw an error. Instead, it inserts `NULL` values for mismatched fields. For example, if a column in your data file is of a different data type than what is specified for that column in the table’s schema, Hive will load `NULL` into that column for the rows containing that mismatched data.
Example: Loading a file (`emp_details2.csv`) with mismatched data:
Loading Mismatched Data
LOAD DATA LOCAL INPATH '/path/to/emp_details2.csv' INTO TABLE demo.employee;
SELECT * FROM demo.employee;