Sqoop Import: Importing Data from RDBMS to HDFS
Efficiently import data from your relational database (RDBMS) to Hadoop's HDFS using Sqoop. This tutorial provides step-by-step instructions and examples for importing data with Sqoop.
Using Sqoop for Data Import
Importing Data from RDBMS to HDFS
Sqoop is a powerful tool for efficiently transferring data between relational databases (like MySQL) and Hadoop's distributed file system (HDFS). This section demonstrates a basic Sqoop import.
Viewing Data in the RDBMS
Before importing, it's helpful to review the data in your RDBMS table. Use your database client (e.g., the MySQL command-line client) to check the table's contents. For example, to view the first 10 rows of a table named table_name
:
MySQL Command (Viewing Data)
mysql> SELECT * FROM table_name LIMIT 10;
Importing Data into HDFS
To import the countries
table from a MySQL database into HDFS, use the following Sqoop command. Make sure to replace placeholders like connection string, username, password, table name, and target directory with your actual values:
Sqoop Import Command
$ sqoop import \
--connect "jdbc:mysql://localhost/training" \
--username cloudera -P \
--table countries \
--target-dir /user/country_imported
This command imports the data. The -m 1
option (if added) specifies the use of a single mapper. The default number of mappers is 4; you can adjust this using -m <number_of_mappers>
.
Remember to enter the entire command on a single line in your terminal.