Apache Sqoop - Interview Questions

What is Apache Sqoop?

FAQ

Apache Sqoop is a tool used for transferring data between Apache Hadoop clusters and relational databases.

Sqoop was originally developed by Cloudera. The name 'Sqoop' is a short form for 'SQL-to-Hadoop'.

Sqoop can import full or partial tables from a SQL database into HDFS in a variety of formats. Sqoop can also export data from HDFS to a SQL database.

Describe the architecture of Apache Sqoop?

FAQ

Apache Sqoop uses Hadoop MapReduce to import data from SQL database to HDFS.

Sqoop orchestrates the following three tasks to import data from a SQL database to HDFS.

1. Examines the table details and identifies the number of records that will be imported.

2. Creates and submits MapReduce jobs to Hadoop cluster.

3. Fetch records from the SQL database and mports this data to HDFS.

What is the basic command-line syntax for using Apache Sqoop?

FAQ

Apache Sqoop is a command-line utility that has various commands to import data, export data, list data etc. These commands are called tools in Sqoop. Following is the basic command-line syntax for using Apache Scoop.

$ sqoop tool-name [tool-options]

Describe the process by which Apache Sqoop imports data from a relational database to HDFS?

FAQ

Hadoop follows the following process to import data from a relational database to HDFS.

1. Sqoop examines the relational database table from which the data has to be imported, and if possible determines the primary key.

2. Sqoop runs a boundary query on the table to determine how many data records will be imported.

3. Sqoop divides the number of records by the number of mappers, and uses this to configure the mapper tasks such that the load is distributed equally among the mappers.

How do you connect to a relational database using Sqoop?

FAQ

You connect to a relational database with Sqoop using the '--connect' command. For example, below code connects to a mysql database.

$ sqoop --connect jdbc:mysql://myhostname/interviewgrid 
--username myusername --password mypassword

Big Data Interview Guide has over 150+ interview questions and answers. Get the guide for $49.95 only.

BUY EBOOK

How do you import data into a specific directory instead of the default HDFS home directory?

FAQ

You can import the data from a relational database into a specific directory using the option '--warehouse-dir.

$ sqoop import-all-tables 
--connect jdbc:mysql://myhostname/interviewgrid 
--username myusername --password mypassword 
--warehouse-dir /interviewgrid

How do you import data from a single table 'customers' into HDFS directory 'customerdata'?

FAQ

You can import the data from a single table using the tool or command 'import --table'. You can use the option '--warehouse-dir' to import the data into 'customerdata' HDFS directory.

$ sqoop import --table customers 
--connect jdbc:mysql://myhostname/interviewgrid 
--username myusername --password mypassword 
--warehouse-dir /customerdata

How do you import data from specific columns of a relational database table into HDFS?

FAQ

You can import data from specific columns of a relational database table into HDFS using the option '--columns'.

Below code snipped imports the data from columns 'first_name' and 'last-name' into HDFS.

$ sqoop import --table customers 
--connect jdbc:mysql://myhostname/interviewgrid 
--username myusername --password mypassword 
--columns 'first_name, last_name' 
--warehouse-dir /interviewgrid

How do you specify a delimiter for the data imported by Sqoop?

FAQ

By default, Sqoop imports data as comma delimited text files. You can specify Sqoop to use a different delimiter by using the option '--fields-terminated-by'.

Below code snippet imports the data as tab delimited text files.

$ sqoop import --table customers 
--connect jdbc:mysql://myhostname/interviewgrid 
--username myusername --password mypassword 
--columns 'first_name, last_name' 
--warehouse-dir /interviewgrid 
--fields-terminated-by '	'

How do you compress the data imported by Sqoop?

FAQ

You can compress the data imported by Sqoop by using the command option '--compression-codec'.

$ sqoop import --table customers 
--connect jdbc:mysql://myhostname/interviewgrid 
--username myusername --password mypassword 
--compression-codec org.apache.hadoop.io.compress.SnappyCodec

Big Data Interview Guide has over 150+ interview questions and answers. Get the guide for $49.95 only.

BUY EBOOK

How do you specify Sqoop to import data in other formats other than text?

FAQ

In addition to text files, Sqoop supports importing data as Parquet files and Avro files.

You can specify Sqoop to import data as as Parquet files and Avro files by using the option '--as-parquetfile'

$ sqoop import --table customers 
--connect jdbc:mysql://myhostname/interviewgrid 
--username myusername --password mypassword 
--as-parquetfile

How do you export data from HDFS to a relational database using Sqoop?

FAQ

Similar to 'import' tool, Sqoop provides the 'export' tool that pushes data from HDFS to a relational database.

$ sqoop export 
 --connect jdbc:mysql://myhostname/interviewgrid 
--username myusername --password mypassword 
--export-dir /interviewgrid/output 
--update-mode allowinsert 
--table customers

$29.95

BUY EBOOK

SSL Secure Payment

$29.95

BUY EBOOK

SSL Secure Payment

You're also subscribing to the interviewgrid.com email newsletter for tips, updates & promotions. Unsubscribe any time.

Big Data - Interview Questions

Map Reduce Apache Flume Apache Kafka Apache Hive Apache Hue Apache Oozie Apache Sqoop

RECOMMENDED RESOURCES

Behaviorial Interview

Top resource to prepare for behaviorial and situational interview questions.