TutorialsArena

Sqoop Import: Importing Data from RDBMS to HDFS

Efficiently import data from your relational database (RDBMS) to Hadoop's HDFS using Sqoop. This tutorial provides step-by-step instructions and examples for importing data with Sqoop.



Using Sqoop for Data Import

Importing Data from RDBMS to HDFS

Sqoop is a powerful tool for efficiently transferring data between relational databases (like MySQL) and Hadoop's distributed file system (HDFS). This section demonstrates a basic Sqoop import.

Viewing Data in the RDBMS

Before importing, it's helpful to review the data in your RDBMS table. Use your database client (e.g., the MySQL command-line client) to check the table's contents. For example, to view the first 10 rows of a table named table_name:

MySQL Command (Viewing Data)

mysql> SELECT * FROM table_name LIMIT 10;

Importing Data into HDFS

To import the countries table from a MySQL database into HDFS, use the following Sqoop command. Make sure to replace placeholders like connection string, username, password, table name, and target directory with your actual values:

Sqoop Import Command

$ sqoop import \
--connect "jdbc:mysql://localhost/training" \
--username cloudera -P \
--table countries \
--target-dir /user/country_imported

This command imports the data. The -m 1 option (if added) specifies the use of a single mapper. The default number of mappers is 4; you can adjust this using -m <number_of_mappers>.

Remember to enter the entire command on a single line in your terminal.