參考:https://blog.csdn.net/weixin_43857576/article/details/121843701
https://cloud.tencent.com/developer/article/1812592
http://it.ckcest.cn/article-4007002-1.html
https://hudi.apache.org/docs/0.9.0/flink-quick-start-guide
- 環(huán)境信息
在集成之前首先你的服務(wù)器必須具有jdk,hadoop,scala,flink,maven環(huán)境借宵。
其中jdk1.8以上向挖,
hadoop最好用3.0以上
至于scala和flink的版本受限于hudi的版本杯聚,我這里使用的是0.9版本,對(duì)應(yīng)flink-1.12.2,scala-2.11.12
-
需要將hudi-flink-bundle_2.12-0.9.0.jar 放到flink的lib目錄下
3.修改配置文件flink的conf下的配置文件褪尝,taskmanager.numberOfTaskSlots: 1 默認(rèn)為1,修改
啟動(dòng)集群
export HADOOP_CLASSPATH='$HADOOP_HOME/bin/hadoop classpath'
./bin/start-cluster.sh
啟動(dòng)客戶(hù)端
export HADOOP_CLASSPATH='$HADOOP_HOME/bin/hadoop classpath'
./bin/sql-client.sh embedded
創(chuàng)建表
CREATE TABLE t1(
uuid VARCHAR(20),
name VARCHAR(10),
age INT,
ts TIMESTAMP(3),
partition
VARCHAR(20)
)
PARTITIONED BY (partition
)
WITH (
'connector'= 'hudi',
'path'= 'hdfs://hadoop01:9000/tmp/t1',
'table.type'= 'MERGE_ON_READ'
);
注意path的部分,這是我自己服務(wù)器hdfs呻顽,在配置hadoop集群時(shí)已經(jīng)配置過(guò),改成自己的就行丹墨,如果忘記了可以在hadoop安裝路徑etc/hadoop下的core-site.xml下查看你自己的配置廊遍,我就只因?yàn)檫@個(gè)配置錯(cuò)了,造成一直報(bào)錯(cuò)贩挣,卡了好幾天喉前,真坑。
插入數(shù)據(jù)
INSERT INTO t1 VALUES
('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'),
('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'),
('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'),
('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'),
('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'),
('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'),
('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'),
('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4');
查詢(xún)數(shù)據(jù)
select * from t1;
在flink的ui上查看任務(wù):http://ip:8081/#/job/completed
在hdfs的目錄上查看生產(chǎn)的文件
hdfs dfs -ls /tmp/t1