In this article, we will do following
2.
Load consumer complaints dataset to a table in
MySQL database
3.
Import MySQL database table into Hive using
Sqoop
Download consumer
complaints dataset
1.
Download consumer complaints dataset from https://www.data.gov/consumer/ to
unix box using “wget https://data.consumerfinance.gov/api/views/s6ew-h6mp/rows.csv?accessType=DOWNLOAD”
2.
Then rename the file to Consumer_Complaints.csv using
“mv
"rows.csv?accessType=DOWNLOAD" Consumer_Complaints.csv”
3.
The downloaded file is comma separated and ready
for loading to MySQL table.
Load consumer
complaints dataset to a table in MySQL database
1.
Connect to mysql from terminal in linux server
using “mysql --user="username" –password”
2.
Type “use databasename” to use the database in
which we are creating a table
3.
Create a table in MySQL using a following create
table structure
create table if not exists consumer_complaints
( data_received
varchar( 10),
product varchar( 100),
sub_product varchar(
50),
issue varchar( 100),
sub_issue varchar( 100),
narrative varchar(5000),
company_Public_response varchar(
150),
company varchar( 100),
state varchar( 2),
zipcode varchar( 12),
tags varchar( 50),
consent_provided varchar( 20),
submitted_channel
varchar( 25),
date_sentto_company
varchar( 10),
company_consumer_response
varchar( 150),
timely_response varchar(10),
consumer_disputed
varchar( 10),
complaint_id varchar(
50)
);
4.
load data local infile 'Consumer_Complaints.csv'
into table consumer_complaints fields terminated by "," optionally
enclosed by "\"" ignore 1 lines;
Import MySQL database
table into Hive using Sqoop
1.
Check connection to target database working or
not using list tables
a.
List tables in the target database using “sqoop
list-tables …..”
b.
Check target hive database and table exist or
not
c.
As show
in the above picture, I do not have a table to import therefore I will create a
table using Sqoop during import
d.
sqoop import --connect
"jdbc:mysql://192.168.56.102/rajendra"
--username="cloudera" -password="cloudera" --table
"consumer_complaints" --split-by "complaint_id"
--create-hive-table --hive-database "rajendra" –hive-import
e.
Check table created in Hive
f.
Check Hive table metadata, it will be same as
MySQL table metadata