在Hadoop集群环境中为MySQL安装配置Sqoop的教程
Sqoop是一个用来将Hadoop和关系型数据库中的数据相互转移的工具,可以将一个关系型数据库(例如:MySQL,Oracle,Postgres等)中的数据导进到Hadoop的HDFS中,也可以将HDFS的数据导进到关系型数据库中。
Sqoop中一大亮点就是可以通过hadoop的mapreduce把数据从关系型数据库中导入数据到HDFS。
一、安装sqoop
1、下载sqoop压缩包,并解压
压缩包分别是:sqoop-1.2.0-CDH3B4.tar.gz,hadoop-0.20.2-CDH3B4.tar.gz,MysqlJDBC驱动包mysql-connector-java-5.1.10-bin.jar
[root@node1~]#ll
drwxr-xr-x15rootroot4096Feb222011hadoop-0.20.2-CDH3B4 -rw-r--r--1rootroot724225Sep1506:46mysql-connector-java-5.1.10-bin.jar drwxr-xr-x11rootroot4096Feb222011sqoop-1.2.0-CDH3B4
2、将sqoop-1.2.0-CDH3B4拷贝到/home/hadoop目录下,并将MysqlJDBC驱动包和hadoop-0.20.2-CDH3B4下的hadoop-core-0.20.2-CDH3B4.jar至sqoop-1.2.0-CDH3B4/lib下,最后修改一下属主。
[root@node1~]#cpmysql-connector-java-5.1.10-bin.jarsqoop-1.2.0-CDH3B4/lib [root@node1~]#cphadoop-0.20.2-CDH3B4/hadoop-core-0.20.2-CDH3B4.jarsqoop-1.2.0-CDH3B4/lib [root@node1~]#chown-Rhadoop:hadoopsqoop-1.2.0-CDH3B4 [root@node1~]#mvsqoop-1.2.0-CDH3B4/home/hadoop [root@node1~]#ll/home/hadoop
total35748 -rw-rw-r--1hadoophadoop343Sep1505:13derby.log drwxr-xr-x13hadoophadoop4096Sep1416:16hadoop-0.20.2 drwxr-xr-x9hadoophadoop4096Sep1420:21hive-0.10.0 -rw-r--r--1hadoophadoop36524032Sep1420:20hive-0.10.0.tar.gz drwxr-xr-x8hadoophadoop4096Sep252012jdk1.7 drwxr-xr-x12hadoophadoop4096Sep1500:25mahout-distribution-0.7 drwxrwxr-x5hadoophadoop4096Sep1505:13metastore_db -rw-rw-r--1hadoophadoop406Sep1416:02scp.sh drwxr-xr-x11hadoophadoop4096Feb222011sqoop-1.2.0-CDH3B4 drwxrwxr-x3hadoophadoop4096Sep1416:17temp drwxrwxr-x3hadoophadoop4096Sep1415:59user
3、配置configure-sqoop,注释掉对于HBase和ZooKeeper的检查
[root@node1bin]#pwd
/home/hadoop/sqoop-1.2.0-CDH3B4/bin
[root@node1bin]#viconfigure-sqoop
#!/bin/bash # #LicensedtoCloudera,Inc.underoneormore #contributorlicenseagreements.SeetheNOTICEfiledistributedwith #thisworkforadditionalinformationregardingcopyrightownership. . . . #Check:Ifwecan'tfindourdependencies,giveuphere. if[!-d"${HADOOP_HOME}"];then echo"Error:$HADOOP_HOMEdoesnotexist!" echo'Pleaseset$HADOOP_HOMEtotherootofyourHadoopinstallation.' exit1 fi #if[!-d"${HBASE_HOME}"];then #echo"Error:$HBASE_HOMEdoesnotexist!" #echo'Pleaseset$HBASE_HOMEtotherootofyourHBaseinstallation.' #exit1 #fi #if[!-d"${ZOOKEEPER_HOME}"];then #echo"Error:$ZOOKEEPER_HOMEdoesnotexist!" #echo'Pleaseset$ZOOKEEPER_HOMEtotherootofyourZooKeeperinstallation.' #exit1 #fi
4、修改/etc/profile和.bash_profile文件,添加Hadoop_Home,调整PATH
[hadoop@node1~]$vi.bash_profile
#.bash_profile #Getthealiasesandfunctions if[-f~/.bashrc];then .~/.bashrc fi #Userspecificenvironmentandstartupprograms HADOOP_HOME=/home/hadoop/hadoop-0.20.2 PATH=$HADOOP_HOME/bin:$PATH:$HOME/bin exportHIVE_HOME=/home/hadoop/hive-0.10.0 exportMAHOUT_HOME=/home/hadoop/mahout-distribution-0.7 exportPATHHADOOP_HOME
二、测试Sqoop
1、查看mysql中的数据库:
[hadoop@node1bin]$./sqooplist-databases--connectjdbc:mysql://192.168.1.152:3306/--usernamesqoop--passwordsqoop
13/09/1507:17:16WARNtool.BaseSqoopTool:Settingyourpasswordonthecommand-lineisinsecure.Considerusing-Pinstead. 13/09/1507:17:17INFOmanager.MySQLManager:ExecutingSQLstatement:SHOWDATABASES information_schema mysql performance_schema sqoop test
2、将mysql的表导入到hive中:
[hadoop@node1bin]$./sqoopimport--connectjdbc:mysql://192.168.1.152:3306/sqoop--usernamesqoop--passwordsqoop--tabletest--hive-import-m1
13/09/1508:15:01WARNtool.BaseSqoopTool:Settingyourpasswordonthecommand-lineisinsecure.Considerusing-Pinstead. 13/09/1508:15:01INFOtool.BaseSqoopTool:UsingHive-specificdelimitersforoutput.Youcanoverride 13/09/1508:15:01INFOtool.BaseSqoopTool:delimiterswith--fields-terminated-by,etc. 13/09/1508:15:01INFOtool.CodeGenTool:Beginningcodegeneration 13/09/1508:15:01INFOmanager.MySQLManager:ExecutingSQLstatement:SELECTt.*FROM`test`AStLIMIT1 13/09/1508:15:02INFOmanager.MySQLManager:ExecutingSQLstatement:SELECTt.*FROM`test`AStLIMIT1 13/09/1508:15:02INFOorm.CompilationManager:HADOOP_HOMEis/home/hadoop/hadoop-0.20.2/bin/.. 13/09/1508:15:02INFOorm.CompilationManager:Foundhadoopcorejarat:/home/hadoop/hadoop-0.20.2/bin/../hadoop-0.20.2-core.jar 13/09/1508:15:03INFOorm.CompilationManager:Writingjarfile:/tmp/sqoop-hadoop/compile/a71936fd2bb45ea6757df22751a320e3/test.jar 13/09/1508:15:03WARNmanager.MySQLManager:Itlookslikeyouareimportingfrommysql. 13/09/1508:15:03WARNmanager.MySQLManager:Thistransfercanbefaster!Usethe--direct 13/09/1508:15:03WARNmanager.MySQLManager:optiontoexerciseaMySQL-specificfastpath. 13/09/1508:15:03INFOmanager.MySQLManager:SettingzeroDATETIMEbehaviortoconvertToNull(mysql) 13/09/1508:15:03INFOmapreduce.ImportJobBase:Beginningimportoftest 13/09/1508:15:04INFOmanager.MySQLManager:ExecutingSQLstatement:SELECTt.*FROM`test`AStLIMIT1 13/09/1508:15:05INFOmapred.JobClient:Runningjob:job_201309150505_0009 13/09/1508:15:06INFOmapred.JobClient:map0%reduce0% 13/09/1508:15:34INFOmapred.JobClient:map100%reduce0% 13/09/1508:15:36INFOmapred.JobClient:Jobcomplete:job_201309150505_0009 13/09/1508:15:36INFOmapred.JobClient:Counters:5 13/09/1508:15:36INFOmapred.JobClient:JobCounters 13/09/1508:15:36INFOmapred.JobClient:Launchedmaptasks=1 13/09/1508:15:36INFOmapred.JobClient:FileSystemCounters 13/09/1508:15:36INFOmapred.JobClient:HDFS_BYTES_WRITTEN=583323 13/09/1508:15:36INFOmapred.JobClient:Map-ReduceFramework 13/09/1508:15:36INFOmapred.JobClient:Mapinputrecords=65536 13/09/1508:15:36INFOmapred.JobClient:SpilledRecords=0 13/09/1508:15:36INFOmapred.JobClient:Mapoutputrecords=65536 13/09/1508:15:36INFOmapreduce.ImportJobBase:Transferred569.6514KBin32.0312seconds(17.7842KB/sec) 13/09/1508:15:36INFOmapreduce.ImportJobBase:Retrieved65536records. 13/09/1508:15:36INFOhive.HiveImport:Removingtemporaryfilesfromimportprocess:test/_logs 13/09/1508:15:36INFOhive.HiveImport:LoadinguploadeddataintoHive 13/09/1508:15:36INFOmanager.MySQLManager:ExecutingSQLstatement:SELECTt.*FROM`test`AStLIMIT1 13/09/1508:15:36INFOmanager.MySQLManager:ExecutingSQLstatement:SELECTt.*FROM`test`AStLIMIT1 13/09/1508:15:41INFOhive.HiveImport:Logginginitializedusingconfigurationinjar:file:/home/hadoop/hive-0.10.0/lib/hive-common-0.10.0.jar!/hive-log4j.properties 13/09/1508:15:41INFOhive.HiveImport:Hivehistoryfile=/tmp/hadoop/hive_job_log_hadoop_201309150815_1877092059.txt 13/09/1508:16:10INFOhive.HiveImport:OK 13/09/1508:16:10INFOhive.HiveImport:Timetaken:28.791seconds 13/09/1508:16:11INFOhive.HiveImport:Loadingdatatotabledefault.test 13/09/1508:16:12INFOhive.HiveImport:Tabledefault.teststats:[num_partitions:0,num_files:1,num_rows:0,total_size:583323,raw_data_size:0] 13/09/1508:16:12INFOhive.HiveImport:OK 13/09/1508:16:12INFOhive.HiveImport:Timetaken:1.704seconds 13/09/1508:16:12INFOhive.HiveImport:Hiveimportcomplete.
三、Sqoop命令
Sqoop大约有13种命令,和几种通用的参数(都支持这13种命令),这里先列出这13种命令。
接着列出Sqoop的各种通用参数,然后针对以上13个命令列出他们自己的参数。Sqoop通用参数又分Commonarguments,Incrementalimportarguments,Outputlineformattingarguments,Inputparsingarguments,Hivearguments,HBasearguments,GenericHadoopcommand-linearguments,下面说明一下几个常用的命令:
1.Commonarguments
通用参数,主要是针对关系型数据库链接的一些参数
1)列出mysql数据库中的所有数据库
sqooplist-databases–connectjdbc:mysql://localhost:3306/–usernameroot–password123456
2)连接mysql并列出test数据库中的表
sqooplist-tables–connectjdbc:mysql://localhost:3306/test–usernameroot–password123456
命令中的test为mysql数据库中的test数据库名称usernamepassword分别为mysql数据库的用户密码
3)将关系型数据的表结构复制到hive中,只是复制表的结构,表中的内容没有复制过去。
sqoopcreate-hive-table–connectjdbc:mysql://localhost:3306/test –tablesqoop_test–usernameroot–password123456–hive-table test
其中–tablesqoop_test为mysql中的数据库test中的表–hive-table
test为hive中新建的表名称
4)从关系数据库导入文件到hive中
sqoopimport–connectjdbc:mysql://localhost:3306/zxtest–username root–password123456–tablesqoop_test–hive-import–hive-table s_test-m1
5)将hive中的表数据导入到mysql中,在进行导入之前,mysql中的表
hive_test必须已经提起创建好了。
sqoopexport–connectjdbc:mysql://localhost:3306/zxtest–username root–passwordroot–tablehive_test–export-dir /user/hive/warehouse/new_test_partition/dt=2012-03-05
6)从数据库导出表的数据到HDFS上文件
./sqoopimport–connect jdbc:mysql://10.28.168.109:3306/compression–username=hadoop –password=123456–tableHADOOP_USER_INFO-m1–target-dir /user/test
7)从数据库增量导入表数据到hdfs中
./sqoopimport–connectjdbc:mysql://10.28.168.109:3306/compression –username=hadoop–password=123456–tableHADOOP_USER_INFO-m1 –target-dir/user/test–check-columnid–incrementalappend –last-value3