Python版本要為2
cmd亂碼解決:輸入CHCP 65001
數(shù)據(jù)庫(kù)中的數(shù)據(jù)中文亂碼解決:在json文件中jdbcUrl項(xiàng)加上:?characterEncoding=utf8
DataX介紹
安裝DataX
DataX下載地址
下載完成解壓至某個(gè)路徑下即可
查看配置模板
python datax.py -r {YOUR_READER} -w {YOUR_WRITER}
例如mysql:
C:\DataX\bin>python datax.py -r mysqlreader -w mysqlwriter
DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
Please refer to the mysqlreader document:
https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md
Please refer to the mysqlwriter document:
https://github.com/alibaba/DataX/blob/master/mysqlwriter/doc/mysqlwriter.md
Please save the following configuration as a json file and use
python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json
to run the job.
{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"column": [],
"connection": [
{
"jdbcUrl": [],
"table": []
}
],
"password": "",
"username": "",
"where": ""
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"column": [],
"connection": [
{
"jdbcUrl": "",
"table": []
}
],
"password": "",
"preSql": [],
"session": [],
"username": "",
"writeMode": ""
}
}
}
],
"setting": {
"speed": {
"channel": ""
}
}
}
}
配置mongodb2mysql.json文件
{
"job": {
"setting": {
"speed": {
"channel": 1
}
},
"content": [{
"reader": {
"name": "mongodbreader",
"parameter": {
"address": ["*.*.*.*:27017"],
"userName": "root",
"userPassword": "123456",
"dbName": "weixin",
"collectionName": "fileids_wxpy",
"column": [{
"index":0,
"name": "_id",
"type": "string"
}, {
"index":1,
"name": "crawler_time",
"type": "string"
}, {
"index":2,
"name": "file_url",
"type": "string"
}, {
"index":3,
"name": "flag",
"type": "string"
}, {
"index":4,
"name": "logo_url",
"type": "string"
}, {
"index":5,
"name": "source",
"type": "string"
}, {
"index":6,
"name": "update_date",
"type": "string"
}, {
"index":7,
"name": "update_time",
"type": "long"
}, {
"index":8,
"name": "wx_id",
"type": "string"
}, {
"index":9,
"name": "wx_name",
"type": "string"
}]
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"column": [
"id",
"crawler_time",
"file_url",
"flag",
"logo_url",
"source",
"update_date",
"update_time",
"wx_id",
"wx_name"
],
"connection": [
{
"jdbcUrl": "jdbc:mysql://*.*.*.*:3306/weixin?characterEncoding=utf8",
"table": ["fileids_wxpy"]
}
],
"password": "123456",
"username": "root"
}
}
}]
}
}
mysql新建數(shù)據(jù)庫(kù) 官疲、表
create DATABASE weixin;
use weixin;
DROP TABLE IF EXISTS `fileids_wxpy`;
CREATE TABLE `fileids_wxpy` (
`id` bigint(20) unsigned NOT NULL,
`crawler_time` int(10) unsigned NOT NULL,
`file_url` varchar(255) NOT NULL,
`flag` varchar(255) NOT NULL,
`logo_url` varchar(255) NOT NULL,
`source` varchar(255) NOT NULL,
`update_date` int(10) unsigned NOT NULL,
`update_time` int(10) unsigned NOT NULL,
`wx_id` varchar(255) NOT NULL,
`wx_name` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
啟動(dòng)
C:\DataX\bin>python datax.py mongodb2mysql.json
報(bào)錯(cuò)
Caused by: com.mongodb.MongoTimeoutException: Timed out after 30000 ms while waiting for a server that matches ReadPreferenceServerSelector{readPreference=primary}. Client view of cluster state is {type=UNKNOWN, servers=[{address=*.*.*.*:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoSecurityException: Exception authenticating MongoCredential{mechanism=null, userName='root', source='weixin', password=<hidden>, mechanismProperties={}}}, caused by {com.mongodb.MongoCommandException: Command failed with error 18: 'Authentication failed.' on server*.*.*.*:27017. The full response is { "ok" : 0.0, "errmsg" : "Authentication failed.", "code" : 18, "codeName" : "AuthenticationFailed" }}}]
at com.mongodb.connection.BaseCluster.createTimeoutException(BaseCluster.java:369)
at com.mongodb.connection.BaseCluster.selectServer(BaseCluster.java:101)
at com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.<init>(ClusterBinding.java:75)
at com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.<init>(ClusterBinding.java:71)
at com.mongodb.binding.ClusterBinding.getReadConnectionSource(ClusterBinding.java:63)
at com.mongodb.operation.OperationHelper.withConnection(OperationHelper.java:201)
at com.mongodb.operation.CountOperation.execute(CountOperation.java:206)
at com.mongodb.operation.CountOperation.execute(CountOperation.java:53)
at com.mongodb.Mongo.execute(Mongo.java:772)
at com.mongodb.Mongo$2.execute(Mongo.java:759)
at com.mongodb.MongoCollectionImpl.count(MongoCollectionImpl.java:185)
at com.mongodb.MongoCollectionImpl.count(MongoCollectionImpl.java:165)
at com.alibaba.datax.plugin.reader.mongodbreader.util.CollectionSplitUtil.doSplitInterval(CollectionSplitUtil.java:55)
at com.alibaba.datax.plugin.reader.mongodbreader.util.CollectionSplitUtil.doSplit(CollectionSplitUtil.java:37)
at com.alibaba.datax.plugin.reader.mongodbreader.MongoDBReader$Job.split(MongoDBReader.java:37)
at com.alibaba.datax.core.job.JobContainer.doReaderSplit(JobContainer.java:732)
at com.alibaba.datax.core.job.JobContainer.split(JobContainer.java:393)
at com.alibaba.datax.core.job.JobContainer.start(JobContainer.java:117)
... 3 more
原因
MongoDB中每個(gè)數(shù)據(jù)庫(kù)之間是相互獨(dú)立的去团,都有獨(dú)立的權(quán)限,正確的做法是使用root賬號(hào)在【將要操作的數(shù)據(jù)庫(kù)】中創(chuàng)建一個(gè)【子賬號(hào)】,在用這個(gè)子賬號(hào)連接mongo
解決辦法
>use admin
switched to db admin
>db.auth("root","******")
1
>show dbs
admin
local
weixin
>use weixin
switched to db weixin
>db.createUser(
{
user:"DataXTest",
pwd:"123456",
roles:[{role:"dbOwner",db:"weixin"}]
}
)
Successfully added user: {
"user" : "DataXTest",
"roles" : [
{
"role" : "dbOwner",
"db" : "weixin"
}
]
}
使用DataXTest來(lái)替換jsono配置文件中mongodb的賬號(hào)root后,再次運(yùn)行
C:\DataX\bin>python datax.py mongodb2mysql.json
2019-06-13 14:39:40.218 [job-0] INFO JobContainer - PerfTrace not enable!
2019-06-13 14:39:40.219 [job-0] INFO StandAloneJobContainerCommunicator - Total 50115 records, 17716504 bytes | Speed 36.04KB/s, 104 records/s | Error 12 records, 3513 bytes | All Task WaitWriterTime 259.684s | All Task WaitReaderTime 207.041s | Percentage 100.00%
2019-06-13 14:39:40.221 [job-0] INFO JobContainer -
任務(wù)啟動(dòng)時(shí)刻 : 2019-06-13 14:31:33
任務(wù)結(jié)束時(shí)刻 : 2019-06-13 14:39:40
任務(wù)總計(jì)耗時(shí) : 487s
任務(wù)平均流量 : 36.04KB/s
記錄寫入速度 : 104rec/s
讀出記錄總數(shù) : 50115
讀寫失敗總數(shù) : 12
注: 此處錯(cuò)誤的12條記錄是由于id 長(zhǎng)度超過(guò)19位