sqoop數(shù)據(jù)導入導出應用案例

1 sqoop導入數(shù)據(jù)

將RDBMS中的一個表數(shù)據(jù)導入到hdfs。表中的每一行被視為hdfs的記錄宦言。所有記錄都存儲為文本文件的文本數(shù)據(jù)（或者Avro换薄、sequence文件等二進制數(shù)據(jù)）。

1.1 語法

下面的命令用于將數(shù)據(jù)導入到hdfs上跨算。

$sqoop import (generic-args) (import-args)

1.2 測試數(shù)據(jù)

在MySQL有一個userdb的數(shù)據(jù)庫，其中有一張usertable表椭懊，該表結構如下：

id	name	age
2	tom	15
3	toms	25
4	tomslee	17
5	bob	16

1.3 導入表中數(shù)據(jù)到HDFS

下面的命令用于從MySQL數(shù)據(jù)庫服務器中的usertable表導入數(shù)據(jù)到hdfs诸蚕。

sqoop import \
--connect jdbc:mysql://mysqlhost:3306/userdb \
--username root \
--password root \
--table usertable \
--m 1

上面的命令中--connect指的是連接地址，這里面是mysql服務器的地址；--table usertable是MySQL數(shù)據(jù)庫的數(shù)據(jù)表背犯；--m 1是指定MapReduce的數(shù)量坏瘩。

如果執(zhí)行成功，會顯示如下輸出：

14/12/22 15:24:54 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5
14/12/22 15:24:56 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/cebe706d23ebb1fd99c1f063ad51ebd7/emp.jar
-----------------------------------------------------
O mapreduce.Job: map 0% reduce 0%
14/12/22 15:28:08 INFO mapreduce.Job: map 100% reduce 0%
14/12/22 15:28:16 INFO mapreduce.Job: Job job_1419242001831_0001 completed successfully
-----------------------------------------------------
-----------------------------------------------------
14/12/22 15:28:17 INFO mapreduce.ImportJobBase: Transferred 145 bytes in 177.5849 seconds (0.8165 bytes/sec)
14/12/22 15:28:17 INFO mapreduce.ImportJobBase: Retrieved 5 records.

上述的過程由于沒有指定hdfs的保存位置漠魏，所以系統(tǒng)會分配一個默認的地址桑腮，該地址根據(jù)當前的用戶名和表名生成的。

為了驗證在hdfs導入的數(shù)據(jù)蛉幸，使用下面的命令可以查看：

hadoop fs -cat /user/hadoop/userdb/part-m-00000

默認情況下hdfs上面的數(shù)據(jù)字段之間用逗號（,）分割破讨。

sqoop1

1.4 導入到hive表中

sqoop import \
--connect jdbc:mysql://mysqlhost:3306/userdb \
--username root \
--password root \
--table usertable \
--hive-import \
--m 1;

調用上述命令之前不用先建立hive數(shù)據(jù)表，由于沒有指定hive的數(shù)據(jù)庫奕纫，所以系統(tǒng)會在hive的default數(shù)據(jù)庫下面建立一張usertable數(shù)據(jù)表提陶。

數(shù)據(jù)傳輸過程:

1 從MySQL到hdfs上（通過MapReduce）
2 從hdfs遷移到hive中

hive> select * from usertable;
OK
2       tom     15
3       toms    25
4       tomslee 17
5       bob     16
Time taken: 0.191 seconds, Fetched: 4 row(s)
hive> dfs -cat /user/hive/warehouse/usertable/part-m-00000;
2tom15
3toms25
4tomslee17
5bob16
hive>

通過查看hdfs上的數(shù)據(jù)可知hive中的字段之間默認是用'\001'進行分割的，所以字段之間看起來緊挨著匹层。

1.5 導入到hdfs指定的目錄上

在導入表數(shù)據(jù)到hdfs使用sqoop工具隙笆，我們可以指定目標目錄。

以下是指定目標目錄選項的sqoop導入命令到的語法升筏。

--target-dir<new directory in HDFS>

下面的命令是用來導入MySQL數(shù)據(jù)庫的user數(shù)據(jù)表到hdfs的/user/test目錄撑柔。

sqoop import \
--connect jdbc:mysql://mysqlhost:3306/userdb \
--username root \
--password root \
--target-dir /uer/test \
--table usertable \
--m 1;

注意指定的hdfs的目錄不能存在，因為sqoop會將這個目錄作為MapReduce的輸出目錄您访。
導入到hdfs上的輸出數(shù)據(jù)格式如下：

2,tom,15
3,toms,25
4,tomslee,17
5,bob,16

1.6 導入表數(shù)據(jù)子集

我們可以導入表的"where"子句的一個子集通過sqoop工具铅忿。它執(zhí)行在各自的數(shù)據(jù)庫服務器相應的sql查詢中，并將結果儲存在hdfs的目標目錄上灵汪。

where子句的語法如下：

--where <condition>

下面的命令用來導入usertable表的數(shù)據(jù)子集檀训。子集查詢用戶的姓名和年齡。

sqoop import \
--connect jdbc:mysql://mysqlhost:3306/userdb \
--username root \
--password root \
--table usertable \
--where "name='tom' and age=15" \
--target-dir /user/test \
--m 1;

注意指定的hdfs的目錄不能存在享言，因為sqoop會將這個目錄作為MapReduce的輸出目錄峻凫。

導入到hdfs上的輸出數(shù)據(jù)格式如下：

2,tom,15

1.7 增量導入數(shù)據(jù)

增量導入是僅導入新添加的表中的行的技術。

它需要添加'incremental','check-column','last-value'選項來執(zhí)行增量導入览露。

下面的語法用于sqoop導入命令增量的選項荧琼。

--incremental <mode>
--check_column <column name>
--last-value <last check column value>

下面命令用于在user表執(zhí)行增量導入。

sqoop import \
--connect jdbc:mysql://mysqlhost:3306/userdb \
--username root \
--password root \
--table usertable \
--incremental append \
--check-column id \
--last-value 2 \
--target-dir /user/test \
--m 1;

注意: 這里面指定的hdfs路徑不但可以存在而且在該目錄下還可以有文件存在差牛。

2 sqoop數(shù)據(jù)導出

將數(shù)據(jù)從hdfs導出到RDBMS數(shù)據(jù)庫命锄。

導出前达罗，目標表必須存在于目標數(shù)據(jù)庫中谴餐。

默認操作是從將文件中的數(shù)據(jù)使用insert語句插入到mysql數(shù)據(jù)表中。
更新模式下，是生成update語句更新表數(shù)據(jù)夹孔。

2.1 語法

以下是export命令的語法。

sqoop export (generic-args) (export-args)

2.2 案例一

將hdfs中的數(shù)據(jù)導出到MySQL的usertable表中。

sqoop export \
--connect jdbc:mysql://mysqlhost:3306/userdb \
--username root \
--password root \
--table usertable \
-export-dir /user/hive/warehouse/usertable \
--input-fields-terminated-by '\001'
--m 1;

上述命令中的input-fields-terminated-by '\001'指的是輸入的字段之間的分隔符搭伤，在hive中的默認分隔符為'\001'只怎。

驗證：在MySQL中輸入select * from usertable;

3 sqoop數(shù)據(jù)導入導出命令詳解

3.1 sqoop import導入數(shù)據(jù)命令參數(shù)詳解

輸入sqoop import --help命令可以查看所有導入命令的參數(shù)詳解：


常用命令：
   --connect <jdbc-uri>                         JDBC連接字符串                                         
   --connection-manager <class-name>            連接管理者                                             
   --driver <class-name>                        驅動類
   --hadoop-home <dir>                          指定$HADOOP_HOME路徑
   -P                                           從命令行輸入密碼（這樣可以保證數(shù)據(jù)庫密碼的安全性）
   --password <password>                        密碼
   --username <username>                        用戶名
   --verbose                                    打印信息


Import control arguments:
   --append                        添加到hdfs中已經存在的dataset上
                                   直接使用該參數(shù)就可向一個已經存在的目錄追加內容了
   --as-avrodatafile               導入數(shù)據(jù)作為avrodata
   --as-sequencefile               導入數(shù)據(jù)作為SequenceFiles
   --as-textfile                   默認導入數(shù)據(jù)為文本
   --boundary-query <statement>    Set boundary query for retrieving max
                                   and min value of the primary key
   --columns <col,col,col...>      選擇導入的列
   --compression-codec <codec>     壓縮方式，默認是gzip
   --direct                        使用直接導入快速路徑
   --direct-split-size <n>         在快速模式下每n字節(jié)使用一個split
-e,--query <statement>             通過查詢語句導入
   --fetch-size <n>                一次讀入的數(shù)量                            
   --inline-lob-limit <n>          Set the maximum size for an inline LOB
-m,--num-mappers <n>               通過實行多少個map怜俐，默認是4個身堡，某些數(shù)據(jù)庫8 or 16性能不錯
   --split-by <column-name>        創(chuàng)建split的列，默認是主鍵
   --table <table-name>            導入的數(shù)據(jù)表
   --target-dir <dir>              HDFS 目標路徑
   --warehouse-dir <dir>           HDFS parent for table destination
   --where <where clause>          WHERE clause to use during import
-z,--compress                      Enable compression


增量導入?yún)?shù):
   --check-column <column>        Source column to check for incremental
                                  change
   --incremental <import-type>    Define an incremental import of type
                                  'append' or 'lastmodified'
   --last-value <value>           Last imported value in the incremental
                                  check column

輸出行格式參數(shù):
   --enclosed-by <char>               設置字段結束符號
   --escaped-by <char>                用哪個字符來轉義
   --fields-terminated-by <char>      輸出字段之間的分隔符
   --lines-terminated-by <char>       輸出行分隔符
   --mysql-delimiters                 使用mysql的默認分隔符: , lines: \n escaped-by: \ optionally-enclosed-by: '

   --optionally-enclosed-by <char>    Sets a field enclosing character

輸入?yún)?shù)解析:
   --input-enclosed-by <char>               Sets a required field encloser
   --input-escaped-by <char>                Sets the input escape
                                            character
   --input-fields-terminated-by <char>      輸入字段之間的分隔符
   --input-lines-terminated-by <char>       輸入行分隔符
                                            char
   --input-optionally-enclosed-by <char>    Sets a field enclosing
                                            character

Hive arguments:
   --create-hive-table                         創(chuàng)建hive表,如果目標表存在則失敗

   --hive-delims-replacement <arg>             導入到hive時用自定義的字符替換掉 \n, \r, and \001

   --hive-drop-import-delims                   導入到hive時刪除 \n, \r, and \001
   --hive-home <dir>                           重寫$HIVE_HOME
   --hive-import                               Import tables into Hive
                                               (Uses Hive's default
                                               delimiters if none are
                                               set.)
   --hive-overwrite                            Overwrite existing data in
                                               the Hive table
   --hive-partition-key <partition-key>        hive分區(qū)的key
   --hive-partition-value <partition-value>    hive分區(qū)的值
                                              
   --hive-table <table-name>                   Sets the table name to use
                                               when importing to hive
   --map-column-hive <arg>                     類型匹配拍鲤，sql類型對應到hive類型

3.2 sqoop export導出數(shù)據(jù)命令參數(shù)詳解

輸入sqoop export --help命令可以查看所有導入命令的參數(shù)詳解：

export主要參數(shù)
--direct     快速導入
--export-dir <dir>                            HDFS到處數(shù)據(jù)的目錄
-m,--num-mappers <n>                          都少個map線程
--table <table-name>                          導出哪個表
--call <stored-proc-name>                     存儲過程
--update-key <col-name>                       通過哪個字段來判斷更新
--update-mode <mode>                          插入模式贴谎，默認是只更新，可以設置為allowinsert.
--input-null-string <null-string>             字符類型null處理
--input-null-non-string <null-string>         非字符類型null處理
--staging-table <staging-table-name>          臨時表
--clear-staging-table                         清空臨時表
--batch                                       批量模式

4 參考博文

http://www.cnblogs.com/cenyuhai/p/3306056.html

http://www.aboutyun.com/thread-9983-1-1.html

http://blog.csdn.net/wangyang1354/article/details/52936400

最后編輯于：2017.12.10 02:51:35

?著作權歸作者所有,轉載或內容合作請聯(lián)系作者

人面猴
序言：七十年代末季稳，一起剝皮案震驚了整個濱河市擅这，隨后出現(xiàn)的幾起案子，更是在濱河造成了極大的恐慌景鼠，老刑警劉巖仲翎，帶你破解...
沈念sama閱讀 217,084評論 6贊 503
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件，死亡現(xiàn)場離奇詭異铛漓，居然都是意外死亡溯香，警方通過查閱死者的電腦和手機，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 92,623評論 3贊 392
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進店門浓恶，熙熙樓的掌柜王于貴愁眉苦臉地迎上來玫坛，“玉大人，你說我怎么就攤上這事包晰“和海” “怎么了？”我有些...
開封第一講書人閱讀 163,450評論 0贊 353
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵杜窄，是天一觀的道長肠骆。經常有香客問我，道長塞耕，這世上最難降的妖魔是什么蚀腿？我笑而不...
開封第一講書人閱讀 58,322評論 1贊 293
?港島之戀（遺憾婚禮）
正文為了忘掉前任，我火速辦了婚禮扫外，結果婚禮上莉钙，老公的妹妹穿的比我還像新娘。我一直安慰自己筛谚，他們只是感情好磁玉，可當我...
茶點故事閱讀 67,370評論 6贊 390
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布。她就那樣靜靜地躺著驾讲，像睡著了一般蚊伞。火紅的嫁衣襯著肌膚如雪席赂。梳的紋絲不亂的頭發(fā)上，一...
開封第一講書人閱讀 51,274評論 1贊 300
城市分裂傳說
那天时迫，我揣著相機與錄音颅停，去河邊找鬼。笑死掠拳，一個胖子當著我的面吹牛癞揉，可吹牛的內容都是我干的。我是一名探鬼主播溺欧，決...
沈念sama閱讀 40,126評論 3贊 418
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼喊熟，長吁一口氣：“原來是場噩夢啊……” “哼！你這毒婦竟也來了姐刁？” 一聲冷哼從身側響起逊移，我...
開封第一講書人閱讀 38,980評論 0贊 275
萬榮殺人案實錄
序言：老撾萬榮一對情侶失蹤，失蹤者是張志新（化名）和其女友劉穎龙填，沒想到半個月后胳泉，有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體，經...
沈念sama閱讀 45,414評論 1贊 313
?護林員之死
正文獨居荒郊野嶺守林人離奇死亡岩遗，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內容為張勛視角年9月15日...
茶點故事閱讀 37,599評論 3贊 334
?白月光啟示錄
正文我和宋清朗相戀三年扇商，在試婚紗的時候發(fā)現(xiàn)自己被綠了。大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片宿礁。...
茶點故事閱讀 39,773評論 1贊 348
活死人
序言：一個原本活蹦亂跳的男人離奇死亡案铺，死狀恐怖，靈堂內的尸體忽然破棺而出梆靖，到底是詐尸還是另有隱情控汉，我是刑警寧澤，帶...
沈念sama閱讀 35,470評論 5贊 344
?日本核電站爆炸內幕
正文年R本政府宣布返吻，位于F島的核電站姑子，受9級特大地震影響，放射性物質發(fā)生泄漏街佑。R本人自食惡果不足惜，卻給世界環(huán)境...
茶點故事閱讀 41,080評論 3贊 327
男人毒藥：我在死后第九天來索命
文/蒙蒙一沐旨、第九天我趴在偏房一處隱蔽的房頂上張望榨婆。院中可真熱鬧，春花似錦良风、人聲如沸闷供。這莊子的主人今日做“春日...
開封第一講書人閱讀 31,713評論 0贊 22
一樁弒父案吊档，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽。三九已至鬼贱，卻和暖如春香璃，著一層夾襖步出監(jiān)牢的瞬間这难，已是汗流浹背葡秒。一陣腳步聲響...
開封第一講書人閱讀 32,852評論 1贊 269
情欲美人皮
我被黑心中介騙來泰國打工眯牧，沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留蹋岩，地道東北人学少。一個月前我還...
沈念sama閱讀 47,865評論 2贊 370
代替公主和親
正文我出身青樓，卻偏偏與公主長得像扣囊，于是被迫代替她去往敵國和親绒疗。傳聞我的和親對象是個殘疾皇子侵歇，可洞房花燭夜當晚...
茶點故事閱讀 44,689評論 2贊 354

sqoop數(shù)據(jù)導入導出應用案例

sqoop數(shù)據(jù)導入導出應用案例

1 sqoop導入數(shù)據(jù)

1.1 語法

1.2 測試數(shù)據(jù)

1.3 導入表中數(shù)據(jù)到HDFS

1.4 導入到hive表中

1.5 導入到hdfs指定的目錄上

1.6 導入表數(shù)據(jù)子集

1.7 增量導入數(shù)據(jù)

2 sqoop數(shù)據(jù)導出

2.1 語法

2.2 案例一

3 sqoop數(shù)據(jù)導入導出命令詳解

3.1 sqoop import導入數(shù)據(jù)命令參數(shù)詳解

3.2 sqoop export導出數(shù)據(jù)命令參數(shù)詳解

4 參考博文

推薦閱讀更多精彩內容