Import

The standard import syntax is shown below. Import has the many possible arguments but for this simple example we use just two.

In this example the command is import.

sqoop import --connect \
   "jdbc:db2://localhost:54332;database=bpm;username=devuser;password=1234" \
   --table Customers

Table is just the name of the table or view to be loaded. Sqoop works views the same way it works with tables.

--table [tablename]

When the job is submitted, Sqoop connects to source database via JDBC to retrieve the columns and their datatypes. The SQL datatypes are mapped to Java datatypes and there may be some datatype mapping differences. A MapReduce job is then started to retrieve the data and write it to HDFS.

For larger tables, it is possible to increase performance by splitting the job across multiple nodes. Sqoop uses the primary key to judge how to split the table. If no primary key exists, Sqoop will make a guess but this can be controlled and column specified with the --split-by command. Make sure to choose a column which is uniformly distributed to balance the workload.

When the process is complete, the session log will display a message

16/05/10 06:17:14 INFO mapreduce.ImportJobBase: Transferred 201.083 KB in 67.9396 seconds (2.58 KB/sec)
16/05/10 06:17:14 INFO mapreduce.ImportJobBase: Retrieved 1893 records.

Sqoop import has many possible arguments. Please refer to the Sqoop User Guide

Last updated