Sqoop WHERE Clause: Filtering Data During Import
Use Sqoop's WHERE clause to efficiently filter data during the import process from relational databases to Hadoop. This tutorial provides practical examples.
Using Sqoop's WHERE Clause for Data Filtering
Filtering Data During Sqoop Import
Sqoop's where
clause allows you to filter data during the import process, selecting only the rows that meet specific criteria. This significantly reduces the amount of data transferred and processed, improving efficiency.
Example: Filtering Data During Import
Here's an example showing how to import only rows from the cityByCountry
table where the state is 'Alaska'. This assumes the 'state' column is the 6th column in the table. Remember to replace the connection details, username, table name, and target directory with your specific settings.
Sqoop Import Command with WHERE Clause
sqoop import \
--connect "jdbc:mysql://localhost/training" \
--username training -P \
--table cityByCountry \
--target-dir /user/where_clause \
--where "state = 'Alaska'" \
-m 1
This command will only import rows matching the where
clause condition.
Listing Databases and Tables with Sqoop
Sqoop also provides helpful tools for listing databases and tables within a connected RDBMS. These are useful for exploring the available data sources before importing.
To list databases on a MySQL server:
Sqoop Command (Listing Databases)
$ sqoop list-databases --connect "jdbc:mysql://localhost" --username cloudera --password cloudera
For more information about the list-databases
command, use $ sqoop help list-databases
.
To list tables within a specific database:
Sqoop Command (Listing Tables)
$ sqoop list-tables --connect "jdbc:mysql://localhost/training" --username cloudera -P