Search This Blog

Monday 22 January 2018

BigData - Sqoop Import exercise

In this article, we will do following
1.       Download consumer complaints dataset from https://www.data.gov/consumer/
2.       Load consumer complaints dataset to a table in MySQL database
3.       Import MySQL database table into Hive using Sqoop

Download consumer complaints dataset
1.       Download consumer complaints dataset from https://www.data.gov/consumer/ to unix box using “wget https://data.consumerfinance.gov/api/views/s6ew-h6mp/rows.csv?accessType=DOWNLOAD”
2.       Then rename the file to Consumer_Complaints.csv  using
“mv "rows.csv?accessType=DOWNLOAD" Consumer_Complaints.csv”

3.       The downloaded file is comma separated and ready for loading to MySQL table.



Load consumer complaints dataset to a table in MySQL database
1.       Connect to mysql from terminal in linux server using “mysql --user="username" –password”

2.       Type “use databasename” to use the database in which we are creating a table


3.       Create a table in MySQL using a following create table structure
create table if not exists consumer_complaints
( data_received     varchar(  10),
  product           varchar( 100),
  sub_product       varchar(  50),
  issue             varchar( 100),
  sub_issue         varchar( 100),
  narrative         varchar(5000),
  company_Public_response  varchar( 150),
  company           varchar( 100),
  state             varchar(   2),
  zipcode           varchar(  12),
  tags              varchar(  50),
  consent_provided  varchar(  20),
  submitted_channel varchar(  25),
  date_sentto_company varchar(  10),
  company_consumer_response  varchar(  150),
  timely_response   varchar(10),
  consumer_disputed varchar( 10),
  complaint_id      varchar(  50)
 );


4.       load data local infile 'Consumer_Complaints.csv' into table consumer_complaints fields terminated by "," optionally enclosed by "\"" ignore 1 lines;




Import MySQL database table into Hive using Sqoop
1.       Check connection to target database working or not using list tables
a.       List tables in the target database using “sqoop list-tables …..”

b.       Check target hive database and table exist or not

c.        As show in the above picture, I do not have a table to import therefore I will create a table using Sqoop during import
d.       sqoop import --connect "jdbc:mysql://192.168.56.102/rajendra" --username="cloudera" -password="cloudera" --table "consumer_complaints" --split-by "complaint_id" --create-hive-table --hive-database "rajendra" –hive-import

e.       Check table created in Hive

f.        Check Hive table metadata, it will be same as MySQL table metadata



1 comment: