在項(xiàng)目中需要將一個(gè)80w+的csv數(shù)據(jù)(200+m)導(dǎo)入到oracle庫(kù),一開(kāi)始使用的是Navicat for Oracle的導(dǎo)入工具。跑了五六分鐘之后絕望了苍糠,因?yàn)椴排芰?%木张,按這樣的速度跑半天都跑不完。
回想了一下老本行mysql商佑,覺(jué)得Oracle應(yīng)該也會(huì)有一個(gè)批量插入的高效率語(yǔ)句的寫(xiě)法。于是翻看了oracle的官方文檔精通 Oracle+Python厢塘,第 1 部分:查詢(xún)最佳應(yīng)踐
一次多行
大型的插入操作不需求多次的單獨(dú)插入茶没,這是因?yàn)?Python 通過(guò) cx_Oracle.Cursor.executemany 方法完全支持一次插入多行。
限制執(zhí)行操作的數(shù)量極大地改善了程序性能晚碾,因此在編寫(xiě)存在大量插入操作的應(yīng)用程序時(shí)應(yīng)首先考慮這一功能抓半。
呲!這就是我想要的格嘁。
迅速仿造官方的demo寫(xiě)了一個(gè)簡(jiǎn)易的導(dǎo)入腳本笛求,跑了一下,138.7s也就是將近2分多鐘就跑完了這200多m糕簿,差不多88w行的數(shù)據(jù)探入。附上性能測(cè)試
總行數(shù): 885640
2017-05-17 14:11:24
===begin===
2017-05-17 14:11:24
prepare end
2017-05-17 14:11:26 => 1 - 10000 ( 9999 ) finish
2017-05-17 14:11:27 => 10001 - 20000 ( 9999 ) finish
2017-05-17 14:11:28 => 20001 - 30000 ( 9999 ) finish
2017-05-17 14:11:30 => 30001 - 40000 ( 9999 ) finish
2017-05-17 14:11:31 => 40001 - 50000 ( 9999 ) finish
2017-05-17 14:11:33 => 50001 - 60000 ( 9999 ) finish
2017-05-17 14:11:34 => 60001 - 70000 ( 9999 ) finish
2017-05-17 14:11:35 => 70001 - 80000 ( 9999 ) finish
2017-05-17 14:11:37 => 80001 - 90000 ( 9999 ) finish
2017-05-17 14:11:38 => 90001 - 100000 ( 9999 ) finish
2017-05-17 14:11:39 => 100001 - 110000 ( 9999 ) finish
2017-05-17 14:11:42 => 110001 - 120000 ( 9999 ) finish
2017-05-17 14:11:44 => 120001 - 130000 ( 9999 ) finish
2017-05-17 14:11:45 => 130001 - 140000 ( 9999 ) finish
2017-05-17 14:11:46 => 140001 - 150000 ( 9999 ) finish
2017-05-17 14:11:47 => 150001 - 160000 ( 9999 ) finish
2017-05-17 14:11:49 => 160001 - 170000 ( 9999 ) finish
2017-05-17 14:11:50 => 170001 - 180000 ( 9999 ) finish
2017-05-17 14:11:51 => 180001 - 190000 ( 9999 ) finish
2017-05-17 14:11:53 => 190001 - 200000 ( 9999 ) finish
2017-05-17 14:11:56 => 200001 - 210000 ( 9999 ) finish
2017-05-17 14:11:58 => 210001 - 220000 ( 9999 ) finish
2017-05-17 14:12:00 => 220001 - 230000 ( 9999 ) finish
2017-05-17 14:12:02 => 230001 - 240000 ( 9999 ) finish
2017-05-17 14:12:04 => 240001 - 250000 ( 9999 ) finish
2017-05-17 14:12:06 => 250001 - 260000 ( 9999 ) finish
2017-05-17 14:12:10 => 260001 - 270000 ( 9999 ) finish
2017-05-17 14:12:14 => 270001 - 280000 ( 9999 ) finish
2017-05-17 14:12:16 => 280001 - 290000 ( 9999 ) finish
2017-05-17 14:12:17 => 290001 - 300000 ( 9999 ) finish
2017-05-17 14:12:19 => 300001 - 310000 ( 9999 ) finish
2017-05-17 14:12:25 => 310001 - 320000 ( 9999 ) finish
2017-05-17 14:12:26 => 320001 - 330000 ( 9999 ) finish
2017-05-17 14:12:28 => 330001 - 340000 ( 9999 ) finish
2017-05-17 14:12:29 => 340001 - 350000 ( 9999 ) finish
2017-05-17 14:12:30 => 350001 - 360000 ( 9999 ) finish
2017-05-17 14:12:51 => 360001 - 370000 ( 9999 ) finish
2017-05-17 14:12:53 => 370001 - 380000 ( 9999 ) finish
2017-05-17 14:12:54 => 380001 - 390000 ( 9999 ) finish
2017-05-17 14:12:56 => 390001 - 400000 ( 9999 ) finish
2017-05-17 14:12:58 => 400001 - 410000 ( 9999 ) finish
2017-05-17 14:13:36 => 410001 - 420000 ( 9999 ) finish
2017-05-17 14:13:37 => 420001 - 430000 ( 9999 ) finish
2017-05-17 14:13:39 => 430001 - 440000 ( 9999 ) finish
2017-05-17 14:13:40 => 440001 - 450000 ( 9999 ) finish
2017-05-17 14:13:42 => 450001 - 460000 ( 9999 ) finish
2017-05-17 14:13:43 => 460001 - 470000 ( 9999 ) finish
2017-05-17 14:13:45 => 470001 - 480000 ( 9999 ) finish
2017-05-17 14:13:47 => 480001 - 490000 ( 9999 ) finish
2017-05-17 14:13:48 => 490001 - 500000 ( 9999 ) finish
2017-05-17 14:13:50 => 500001 - 510000 ( 9999 ) finish
2017-05-17 14:13:51 => 510001 - 520000 ( 9999 ) finish
2017-05-17 14:13:52 => 520001 - 530000 ( 9999 ) finish
2017-05-17 14:13:54 => 530001 - 540000 ( 9999 ) finish
2017-05-17 14:13:56 => 540001 - 550000 ( 9999 ) finish
2017-05-17 14:13:57 => 550001 - 560000 ( 9999 ) finish
2017-05-17 14:13:59 => 560001 - 570000 ( 9999 ) finish
2017-05-17 14:14:01 => 570001 - 580000 ( 9999 ) finish
2017-05-17 14:14:03 => 580001 - 590000 ( 9999 ) finish
2017-05-17 14:14:05 => 590001 - 600000 ( 9999 ) finish
2017-05-17 14:14:06 => 600001 - 610000 ( 9999 ) finish
2017-05-17 14:14:07 => 610001 - 620000 ( 9999 ) finish
2017-05-17 14:14:09 => 620001 - 630000 ( 9999 ) finish
2017-05-17 14:14:10 => 630001 - 640000 ( 9999 ) finish
2017-05-17 14:14:11 => 640001 - 650000 ( 9999 ) finish
2017-05-17 14:14:13 => 650001 - 660000 ( 9999 ) finish
2017-05-17 14:14:14 => 660001 - 670000 ( 9999 ) finish
2017-05-17 14:14:16 => 670001 - 680000 ( 9999 ) finish
2017-05-17 14:14:17 => 680001 - 690000 ( 9999 ) finish
2017-05-17 14:14:19 => 690001 - 700000 ( 9999 ) finish
2017-05-17 14:14:20 => 700001 - 710000 ( 9999 ) finish
2017-05-17 14:14:23 => 710001 - 720000 ( 9999 ) finish
2017-05-17 14:14:28 => 720001 - 730000 ( 9999 ) finish
2017-05-17 14:14:34 => 730001 - 740000 ( 9999 ) finish
2017-05-17 14:14:37 => 740001 - 750000 ( 9999 ) finish
2017-05-17 14:14:40 => 750001 - 760000 ( 9999 ) finish
2017-05-17 14:14:43 => 760001 - 770000 ( 9999 ) finish
2017-05-17 14:14:46 => 770001 - 780000 ( 9999 ) finish
2017-05-17 14:14:49 => 780001 - 790000 ( 9999 ) finish
2017-05-17 14:14:51 => 790001 - 800000 ( 9999 ) finish
2017-05-17 14:14:55 => 800001 - 810000 ( 9999 ) finish
2017-05-17 14:15:15 => 810001 - 820000 ( 9999 ) finish
2017-05-17 14:15:17 => 820001 - 830000 ( 9999 ) finish
2017-05-17 14:15:19 => 830001 - 840000 ( 9999 ) finish
2017-05-17 14:15:20 => 840001 - 850000 ( 9999 ) finish
2017-05-17 14:15:22 => 850001 - 860000 ( 9999 ) finish
2017-05-17 14:15:24 => 860001 - 870000 ( 9999 ) finish
2017-05-17 14:15:27 => 870001 - 880000 ( 9999 ) finish
2017-05-17 14:15:28 => 880001 - 890000 ( 5639 ) finish
2017-05-17 14:15:28
execute end
請(qǐng)按任意鍵繼續(xù). . .
[Finished in 138.7s]
從這里可以看出,1w行的數(shù)據(jù)1s就可以搞定了懂诗,這效率杠杠的蜂嗽。文末獻(xiàn)上粗糙的python導(dǎo)入腳本。
困難回顧
在實(shí)際操作時(shí)响禽,可能會(huì)遇到以下問(wèn)題
1.ORA-12899: value too large for column
大意就是值越界的徒爹,原來(lái)的數(shù)據(jù)庫(kù)是沒(méi)有這個(gè)問(wèn)題的,于是猜測(cè)是編碼集的原因芋类,由于我用的是utf8隆嗅,與庫(kù)的編碼集不一致,于是乎一個(gè)漢字被轉(zhuǎn)成\uxxxx之類(lèi)的侯繁。
解決方法:在腳本加入os.environ['NLS_LANG'] = 'SIMPLIFIED CHINESE_CHINA.UTF8'
胖喳,讓數(shù)據(jù)庫(kù)自行轉(zhuǎn)換編碼。
2. ORA-03135: connection lost contact
意思就是說(shuō)與數(shù)據(jù)庫(kù)的連接斷開(kāi)的贮竟,原因是我的執(zhí)行時(shí)間太長(zhǎng)丽焊,超過(guò)了Oracle設(shè)置允許的最大時(shí)間
解決方法:在Oracle安裝目錄下找到/network/admin/sqlnet.ora较剃,如博主的目錄/app/oracle/product/11.2.0/dbhome_1/network/admin/sqlnet.ora。在該文件追加
sqlnet.expire_time = 60
時(shí)間單位:分鐘
取值范圍:大于0
默認(rèn)取值:無(wú)
以下是python demo
# -*- coding: utf8 -*-
import cx_Oracle
import csv
import time
import os
# 編碼轉(zhuǎn)換
os.environ['NLS_LANG'] = 'SIMPLIFIED CHINESE_CHINA.UTF8'
M = []
csvfile = file('C:\Users\Administrator\Desktop\Table.csv', 'rb')
reader = csv.reader(csvfile)
for line in reader:
try:
M.append((line[0],line[1],line[2],line[3]))
except AttributeError:
pass
csvfile.close()
#創(chuàng)建數(shù)據(jù)庫(kù)連接
conn = cx_Oracle.connect('oracle', '123456', '192.168.102.126:1521/ORCL')
#獲取操作游標(biāo)
cursor = conn.cursor()
print len(M)
print time.strftime('%Y-%m-%d %H:%M:%S',time.localtime(time.time()))
print '===begin==='
cursor.prepare("INSERT INTO MY_TABLE (ID, COMPANY, DEPARTMENT, NAME) VALUES (:1,:2,:3,:4)")
print time.strftime('%Y-%m-%d %H:%M:%S',time.localtime(time.time()))
print 'prepare end'
for i in range(1, 31):
begin = (i - 1) * 30000
end = i * 30000
cursor.executemany(None, M[begin:end])
print time.strftime('%Y-%m-%d %H:%M:%S',time.localtime(time.time())), '=>', begin, '-', end, '(', len(M[begin:end]), ')','finish'
print time.strftime('%Y-%m-%d %H:%M:%S',time.localtime(time.time()))
print 'execute end'
conn.commit()
#885640
print time.strftime('%Y-%m-%d %H:%M:%S',time.localtime(time.time()))
print 'end'
r = cursor.execute("SELECT COUNT(*) FROM MY_TABLE")
print cursor.fetchone()
#關(guān)閉連接技健,釋放資源
cursor.close()
conn.close()
聽(tīng)說(shuō)使用數(shù)據(jù)泵的導(dǎo)入導(dǎo)出效率更高写穴,有機(jī)會(huì)嘗試一下。