1.數(shù)據(jù)采集流程
2. 環(huán)境準(zhǔn)備
2.1 MySQL主從數(shù)據(jù)庫環(huán)境準(zhǔn)備
組件 | hdp01(主機(jī)) | hdp02(主機(jī)) |
---|---|---|
MySQL數(shù)據(jù)庫(主) | √ | |
MySQL數(shù)據(jù)庫(從) | √ | |
Canal Server | √ |
說明:
- DBus-0.6.1使用Canal-v1.1.4,支持MySQL5.6和5.7
- 被同步的MySQL bin-log需要是row模式
- 考慮到Kafka的message大小不宜太大娃豹,目前設(shè)置的是最大10MB式曲,因此不支持同步MySQL MEDIUUMTEXT/MediumBlob和LongTEXT/LongBlob類型的數(shù)據(jù),如果表中有這樣類型的數(shù)據(jù)會(huì)直接被替換為空
MySQL主從配置:這里不再說明MySQL如何安裝,只說明主從如何配置
主庫配置:
# 主庫配置
[client]
default-character-set=utf8mb4
[mysql]
socket=/var/lib/mysql/mysql.sock
default-character-set=utf8mb4
[mysqld]
socket=/var/lib/mysql/mysql.sock
symbolic-links=0
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
datadir=/data/mysql
character_set_server=utf8mb4
character-set-client-handshake=FALSE
collation-server=utf8mb4_unicode_ci
max_connections=800
max_connect_errors=1000
############################### 以下為主從配置以及binlog配置,新增這些配置 ###############################
# 用于標(biāo)識(shí)唯一的數(shù)據(jù)庫,不能和別的服務(wù)器重復(fù)隶债,建議使用ip的最后一段,默認(rèn)值0代表不允許任何從庫同步數(shù)據(jù)跑筝,不可以使用
server-id=105
# 用于指定binlog日志文件名前綴
log-bin=mysql-bin
binlog-format=Row
# 這些是表示同步的時(shí)候忽略的數(shù)據(jù)庫
binlog-ignore-db=information_schema
binlog-ignore-db=ambari
binlog-ignore-db=dbusmgr
binlog-ignore-db=hive
binlog-ignore-db=mysql
binlog-ignore-db=performance_schema
binlog-ignore-db=sys
重啟主庫服務(wù):
[admin@hdp01 ~]$ sudo systemctl restart mysqld
# 登錄MySQL死讹,創(chuàng)建一個(gè)用戶,從庫使用此用戶連接主庫進(jìn)行bin-log同步
mysql> set global validate_password_policy=0;
mysql> set global validate_password_mixed_case_count=0;
mysql> set global validate_password_number_count=3;
mysql> set global validate_password_special_char_count=0;
mysql> set global validate_password_length=3;
mysql> CREATE USER 'repl'@'%' IDENTIFIED BY '123456';
mysql> GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%' identified by '123456';
mysql> flush privileges;
# 查看主庫狀態(tài)
mysql> show master status;
+------------------+----------+--------------+---------------------------------------------------------------------+-------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+---------------------------------------------------------------------+-------------------+
| mysql-bin.000001 | 834 | | information_schema,ambari,dbusmgr,hive,mysql,performance_schema,sys | |
+------------------+----------+--------------+---------------------------------------------------------------------+-------------------+
1 row in set (0.00 sec)
從庫配置:
# 從庫配置
[client]
default-character-set=utf8mb4
[mysql]
socket=/var/lib/mysql/mysql.sock
default-character-set=utf8mb4
[mysqld]
socket=/var/lib/mysql/mysql.sock
symbolic-links=0
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
datadir=/data/mysql
character_set_server=utf8mb4
character-set-client-handshake=FALSE
collation-server=utf8mb4_unicode_ci
max_connections=800
max_connect_errors=1000
############################### 以下為主從配置以及binlog配置曲梗,新增這些配置 ###############################
# 用于標(biāo)識(shí)唯一的數(shù)據(jù)庫赞警,一定不能和主庫的server-id一樣
server-id=106
# 用于指定binlog日志文件名前綴
log-bin=mysql-bin
# 這個(gè)必須加上,因?yàn)閺膸焐系腗ySQL可以是slave也可以是master虏两,加上該選項(xiàng)才會(huì)生成級(jí)聯(lián)binlog愧旦,Canal才可以從從庫采集數(shù)據(jù)
log_slave_updates
binlog-format=Row
# 這些是表示同步的時(shí)候忽略的數(shù)據(jù)庫
binlog-ignore-db=information_schema
binlog-ignore-db=ambari
binlog-ignore-db=dbusmgr
binlog-ignore-db=hive
binlog-ignore-db=mysql
binlog-ignore-db=performance_schema
binlog-ignore-db=sys
重啟從庫服務(wù):
[admin@hdp02 ~]$ sudo systemctl restart mysqld
# 登錄MySQL進(jìn)行如下操作
# 注意這里的MASTER_LOG_FILE='mysql-bin.000001',MASTER_LOG_POS=834是在主庫通過show master status查詢到的信息
mysql> CHANGE MASTER TO MASTER_HOST='hdp01', MASTER_USER='repl', MASTER_PASSWORD='123456', MASTER_LOG_FILE='mysql-bin.000001', MASTER_LOG_POS=834;
mysql> START SLAVE;
# 查看從庫狀態(tài)
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: hdp01
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 834
Relay_Log_File: hdp02-relay-bin.000002
Relay_Log_Pos: 320
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
# Slave_IO_Running和Slave_SQL_Running應(yīng)該都為Yes
2.2 數(shù)據(jù)庫源端配置
在業(yè)務(wù)庫主庫中執(zhí)行以下操作:
# 1.創(chuàng)建dbus庫和dbus用戶及相應(yīng)權(quán)限
mysql> set global validate_password_policy=0;
mysql> set global validate_password_mixed_case_count=0;
mysql> set global validate_password_number_count=3;
mysql> set global validate_password_special_char_count=0;
mysql> set global validate_password_length=3;
mysql> create database dbus;
mysql> CREATE USER 'dbus'@'%' IDENTIFIED BY '123456';
mysql> GRANT ALL ON dbus.* TO dbus@'%' IDENTIFIED BY '123456';
mysql> flush privileges;
# 2.創(chuàng)建dbus庫中需要包含的1張表,創(chuàng)建細(xì)節(jié)如下
mysql> use dbus;
mysql> DROP TABLE IF EXISTS `db_heartbeat_monitor`;
mysql> CREATE TABLE `db_heartbeat_monitor` (
-> `ID` bigint(19) NOT NULL AUTO_INCREMENT COMMENT '',
-> `DS_NAME` varchar(64) NOT NULL COMMENT '',
-> `SCHEMA_NAME` varchar(64) NOT NULL COMMENT '',
-> `TABLE_NAME` varchar(64) NOT NULL COMMENT '',
-> `PACKET` varchar(256) NOT NULL COMMENT '',
-> `CREATE_TIME` datetime NOT NULL COMMENT '',
-> `UPDATE_TIME` datetime NOT NULL COMMENT '',
-> PRIMARY KEY (`ID`)
-> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
mysql> SET FOREIGN_KEY_CHECKS=0;
2.3 準(zhǔn)備測試用業(yè)務(wù)數(shù)據(jù)庫
在主庫執(zhí)行以下操作:
# 創(chuàng)建測試庫
mysql> set global validate_password_policy=0;
mysql> set global validate_password_mixed_case_count=0;
mysql> set global validate_password_number_count=3;
mysql> set global validate_password_special_char_count=0;
mysql> set global validate_password_length=3;
mysql> create database test;
# 創(chuàng)建測試用戶
mysql> CREATE USER 'test'@'%' IDENTIFIED BY '123456';
mysql> GRANT ALL ON test.* TO test@'%' IDENTIFIED BY '123456';
mysql> flush privileges;
# 創(chuàng)建測試表
mysql> use test;
mysql> create table t1(a int, b varchar(50));
Query OK, 0 rows affected (0.01 sec)
2.4 開啟dbus用戶拉備庫權(quán)限
在主庫執(zhí)行以下操作:
mysql> GRANT select on test.t1 TO dbus;
mysql> flush privileges;
2.5 手動(dòng)Canal部署
創(chuàng)建Canal專用用戶定罢,在從庫執(zhí)行以下操作忘瓦,因?yàn)閏anal用戶要去從庫拉取數(shù)據(jù):
mysql> set global validate_password_policy=0;
mysql> set global validate_password_mixed_case_count=0;
mysql> set global validate_password_number_count=3;
mysql> set global validate_password_special_char_count=0;
mysql> set global validate_password_length=3;
mysql> CREATE USER 'canal'@'%' IDENTIFIED BY '123456';
mysql> GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%';
mysql> FLUSH PRIVILEGES;
下載canal安裝包:https://github.com/BriData/DBus/releases,官網(wǎng)給了百度網(wǎng)盤下載鏈接引颈,文件為v0.6.1/deployer-0.6.1/zip/dbus-canal-auto-0.6.1.zip和v0.6.1/deployer-0.6.1/zip/canal.zip,注意不要使用canal官方的安裝包
# 將dbus-canal-auto-0.6.1.zip上傳到dbus安裝目錄下境蜕,就是/opt/apps/dbus目錄下蝙场,然后解壓
[admin@hdp02 dbus]$ pwd
/opt/apps/dbus
[admin@hdp02 dbus]$ ll
total 12
drwxr-xr-x 2 admin admin 155 2020-11-05 22:54 bin
drwxr-xr-x 9 admin admin 195 2020-11-10 23:23 conf
drwxr-xr-x 4 admin admin 82 2020-11-03 21:15 dbus-canal-auto-0.6.1
drwxr-xr-x 2 admin admin 310 2020-11-05 22:04 extlib
drwxr-xr-x 2 admin admin 213 2020-11-05 22:54 lib
drwxr-xr-x 2 admin admin 97 2020-11-05 22:54 logs
-rw-r--r-- 1 admin admin 8194 2020-11-05 22:04 README.md
drwxr-xr-x 2 admin admin 134 2020-11-12 20:44 zip
# 將canal.zip上傳到上一步解壓的dbus-canal-auto-0.6.1目錄下并解壓
[admin@hdp02 dbus-canal-auto-0.6.1]$ ll
total 16
-rwxr-xr-x 1 admin admin 651 2019-12-18 21:09 addLine.sh
drwxrwxr-x 6 admin admin 52 2020-02-24 14:50 canal
drwxr-xr-x 2 admin admin 35 2020-11-12 21:06 conf
-rwxr-xr-x 1 admin admin 654 2019-12-18 21:09 delLine.sh
-rwxr-xr-x 1 admin admin 1103 2019-12-18 21:09 deploy.sh
drwxr-xr-x 2 admin admin 4096 2020-08-21 17:55 lib
安裝canal:
# 修改dbus-canal-auto-0.6.1/conf/canal-auto.properties文件
[admin@hdp02 dbus-canal-auto-0.6.1]$ vim conf/canal-auto.properties
# 數(shù)據(jù)源的名稱,后續(xù)在頁面填寫的時(shí)候粱年,需要填這個(gè)名稱
dsname=dbus_mysql_test
# zk address
zk.path=hdp02:2181,hdp03:2181,hdp04:2181
# mysql address:從庫的地址
canal.address=hdp02:3306
# mysql canal user
canal.user=canal
# mysql canal password
canal.pwd=123456
# means mysql server-id:需要與mysql的主從庫的server-id不同
canal.slaveId=1050
# kafka address:kafka集群地址
bootstrap.servers=hdp02:9092,hdp03:9092,hdp04:9092
# 執(zhí)行deploy.sh
[admin@hdp02 dbus-canal-auto-0.6.1]$ sh deploy.sh
......
********************************* CANAL DEPLOY SCCESS! **************************************
report文件: canal_deploy_dbus_mysql_test_20201112211012.txt
# 執(zhí)行完成后售滤,生成了兩個(gè)日志
lrwxrwxrwx 1 admin admin 79 2020-11-12 21:10 dbus_mysql_test_canal.log -> /opt/apps/dbus/dbus-canal-auto-0.6.1/canal-dbus_mysql_test/logs/canal/canal.log
lrwxrwxrwx 1 admin admin 99 2020-11-12 21:10 dbus_mysql_test.log -> /opt/apps/dbus/dbus-canal-auto-0.6.1/canal-dbus_mysql_test/logs/dbus_mysql_test/dbus_mysql_test.log
# 檢查一下這兩個(gè)日志,如果其中沒有報(bào)錯(cuò)台诗,就是部署成功了完箩!
# 同時(shí)生成了一個(gè)目錄canal-dbus_mysql_test,后續(xù)canal的啟停腳本在此目錄的bin下
在DBus頁面刪除自動(dòng)部署canal的配置信息:
3. 在DBus平臺(tái)中采集MySQL數(shù)據(jù)
使用admin用戶登錄DBus平臺(tái):
MySQL URL如下:
jdbc:mysql://hdp01:3306/dbus?useUnicode=true&characterEncoding=utf8&autoReconnect=true&failOverReadOnly=false&noAccessToProcedureBodies=true&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false
在Storm UI中確認(rèn)一下兩個(gè)任務(wù)是夠提交成功:
驗(yàn)證:
MySQL中插入數(shù)據(jù):
mysql> use test;
mysql> INSERT INTO t1(a, b) VALUES (101, "Tom");
mysql> INSERT INTO t1(a, b) VALUES (102, "Kate");
mysql> INSERT INTO t1(a, b) VALUES (103, "Jerry");
Kafka的dbus_mysql_test
拉队、dbus_mysql_test.dbus
和dbus_mysql_test.dbus.result
這3個(gè)Topic中應(yīng)該有數(shù)據(jù)進(jìn)入弊知。
至此,使用DBus平臺(tái)收集MySQL bin-log日志就成功了粱快!