Sqoop Import from MySQL to Hive & HBase

In last post we discussed Sqoop concepts and demos on how to move data from RDBMS to HDFS and from HDFS back to RDBMS. Today we will extend the data movement from RDBMS to NoSQL database – HBase and Hadoop Data warehouse Hive.


Sqoop import command can be used to move data into both. We will need our source data in MySQL for both demos, so let's do that first.


Preparing Source data into MySQL

If you followed the last post you must be readily having the ‘databare_emp’ or ‘databare_employees’ tables in dataguy database. In case you don’t have, you may use the following scripts to create a table in MySQL which we will move using Sqoop to Hive & Hbase.

Create database dataguy;

CREATE TABLE `dataguy`.`databare_emp ` (
  `empid` INT NOT NULL,
  `fname` VARCHAR(45) NOT NULL,
  `lname` VARCHAR(45) NULL,
  `dob` DATE NOT NULL,
  `gender` CHAR NOT NULL,
  `hire_date` DATE NOT NULL,
  PRIMARY KEY (`empid`));

INSERT INTO `dataguy`.`databare_emp` (`empid`,`fname`,`lname`,`dob`,`gender`,`hire_date`)VALUES (1001,'Nitin','Khandelwal','1986-08-01','M','2017-10-20');
INSERT INTO `dataguy`.`databare_emp` (`empid`,`fname`,`lname`,`dob`,`gender`,`hire_date`)VALUES (1002,'Naveen','Khandelwal','1988-01-12','M','2018-09-19');
INSERT INTO `dataguy`.`databare_emp` (`empid`,`fname`,`lname`,`dob`,`gender`,`hire_date`)VALUES (1003,'Ravi','Aggarwal','1997-12-21','M','2020-06-03');
INSERT INTO `dataguy`.`databare_emp` (`empid`,`fname`,`lname`,`dob`,`gender`,`hire_date`)VALUES (1004,'Rupika','Kaur','1992-05-07','F','2019-05-14');
INSERT INTO `dataguy`.`databare_emp` (`empid`,`fname`,`lname`,`dob`,`gender`,`hire_date`)VALUES (1005,'Anurag','Pandiya','1977-03-21','M','2000-08-11');

Select * from `dataguy`.`databare_emp` ;

Source records:


Now, when we have our source ready, let's start our demos.


Sqoop import data from MySQL to Hive

We will now use following Sqoop command to perform this import to Hive data warehouse table hive_emp within hiveDB database.

sqoop import --connect jdbc:mysql://dbserver.databare.com/dataguy --table databare_emp --username dataguyAD --password mypassword -m 1 --hive-import --create-hive-table --hive-table hiveDB.hive_emp --fields-terminated-by '\t'--delete-target-dir

Here we have used more parameters in addition to the ones we used in Sqoop import command, each of these parameters means:

--hive-import         : for importing data into hive rather than HDFS
--create-hive-table   : create a new table into hive
--hive-table          : hive table name (databasename.tablename format)
--delete-target-dir   : as you can depict from name itself, this arguments makes sqoop delete the target directory first if it already exists

We will now run this to start the import. You can see the results as under:



Once the operation is complete we can check the Hive table for our data. Run following commands to start hive and check hive table data.

Hive
use hiveDB
select * from hive_emp;

Sqoop import data from MySQL to HBase

We will now use following Sqoop command to perform the import to HBase table.


HBase is a NOSQL database thus we don’t have regular database and table with rows and columns but it follows a column-oriented data storage approach. We have column families which uniquely identify the records using row id.


In our case we will have a table hbase_emp having a column family employee and row id as empid which will identify a record for employee data.

sqoop import --connect jdbc:mysql://dbserver.databare.com/dataguy --table databare_emp --username dataguyAD --password mypassword -m 1 --hbase-create-table --hbase-table hbase_emp --column-family employee --hbase-row-key empid

Similar to hive import we have arguments for hbase import as well, each of these parameters means:

--hbase-create -table : create a new table into hbase
--hbase-table : hbase table name
--column-family : column family name
--hbase-row-key : hbase row key identifier

We will now run this to start the import. You can see the results as under:

Once the operation is complete we can check the Hive table for our data. Run following commands to start hive and check hive table data.

Hbase shell
scan 'hbase_emp'

You can see how different a NOSQL is from the usual SQL table with rows and columns. NOSQL world is definitely exciting with no boundaries at all.


Hope you enjoyed today’s topic. Happy learning friends!