TutorialsArena

Sqoop WHERE Clause: Filtering Data During Import

Use Sqoop's WHERE clause to efficiently filter data during the import process from relational databases to Hadoop. This tutorial provides practical examples.



Using Sqoop's WHERE Clause for Data Filtering

Filtering Data During Sqoop Import

Sqoop's where clause allows you to filter data during the import process, selecting only the rows that meet specific criteria. This significantly reduces the amount of data transferred and processed, improving efficiency.

Example: Filtering Data During Import

Here's an example showing how to import only rows from the cityByCountry table where the state is 'Alaska'. This assumes the 'state' column is the 6th column in the table. Remember to replace the connection details, username, table name, and target directory with your specific settings.

Sqoop Import Command with WHERE Clause

sqoop import \
--connect "jdbc:mysql://localhost/training" \
--username training -P \
--table cityByCountry \
--target-dir /user/where_clause \
--where "state = 'Alaska'" \
-m 1

This command will only import rows matching the where clause condition.

Listing Databases and Tables with Sqoop

Sqoop also provides helpful tools for listing databases and tables within a connected RDBMS. These are useful for exploring the available data sources before importing.

To list databases on a MySQL server:

Sqoop Command (Listing Databases)

$ sqoop list-databases --connect "jdbc:mysql://localhost" --username cloudera --password cloudera

For more information about the list-databases command, use $ sqoop help list-databases.

To list tables within a specific database:

Sqoop Command (Listing Tables)

$ sqoop list-tables --connect "jdbc:mysql://localhost/training" --username cloudera -P