本文為《Flink大數(shù)據(jù)項目實戰(zhàn)》學(xué)習(xí)筆記,想通過視頻系統(tǒng)學(xué)習(xí)Flink這個最火爆的大數(shù)據(jù)計算框架的同學(xué)共屈,推薦學(xué)習(xí)課程:
Flink大數(shù)據(jù)項目實戰(zhàn):http://t.cn/EJtKhaz
1. 創(chuàng)建Flink項目及依賴管理
1.1創(chuàng)建Flink項目
官網(wǎng)創(chuàng)建Flink項目有兩種方式:
https://ci.apache.org/projects/flink/flink-docs-release-1.6/quickstart/java_api_quickstart.html
方式一:
mvn archetype:generate \
-DarchetypeGroupId=org.apache.flink \
-DarchetypeArtifactId=flink-quickstart-java\
-DarchetypeVersion=1.6.2
方式二
$ curlhttps://flink.apache.org/q/quickstart.sh | bash -s 1.6.2
這里我們?nèi)匀皇褂玫谝环N方式創(chuàng)建Flink項目输硝。
打開終端今瀑,切換到對應(yīng)的目錄,通過maven創(chuàng)建flink項目
mvn archetype:generate-DarchetypeGroupId=org.apache.flink -DarchetypeArtifactId=flink-quickstart-java?-DarchetypeVersion=1.6.2
項目構(gòu)建過程中需要輸入groupId,artifactId橘荠,version和package
Flink項目創(chuàng)建成功
打開IDEA工具屿附,點擊open。
選擇剛剛創(chuàng)建的flink項目
Flink項目已經(jīng)成功導(dǎo)入IDEA開發(fā)工具
通過maven打包測試運行
mvn clean package
刷新target目錄可以看到剛剛打包的flink項目
1.2. Flink依賴
Core Dependencies(核心依賴):
1.核心依賴打包在flink-dist*.jar里
2.包含coordination,
networking, checkpoints, failover, APIs, operations (such as windowing),
resource management等必須的依賴
注意:核心依賴不會隨著應(yīng)用打包(provided)
3.核心依賴項盡可能小哥童,并避免依賴項沖突
Pom文件中添加核心依賴
org.apache.flink
flink-java
1.6.2
provided
org.apache.flink
flink-streaming-java_2.11
1.6.2
provided
注意:不會隨著應(yīng)用打包挺份。
User Application Dependencies(應(yīng)用依賴):
connectors, formats, or libraries(CEP, SQL,
ML)、
注意:應(yīng)用依賴會隨著應(yīng)用打包(scope保持默認(rèn)值就好)
Pom文件中添加應(yīng)用依賴
org.apache.flink
flink-connector-kafka-0.10_2.11
1.6.2
注意:應(yīng)用依賴按需選擇贮懈,會隨著應(yīng)用打包匀泊,可以通過Maven Shade插件進行打包。
1.3. 關(guān)于Scala版本
Scala各版本之間是不兼容的(你基于Scala2.12開發(fā)Flink應(yīng)用就不能依賴Scala2.11的依賴包)朵你。
只使用Java的開發(fā)人員可以選擇任何Scala版本各聘,Scala開發(fā)人員需要選擇與他們的應(yīng)用程序的Scala版本匹配的Scala版本。
1.4. Hadoop依賴
不要把Hadoop依賴直接添加到Flink application抡医,而是:
export HADOOP_CLASSPATH=`hadoop classpath`
Flink組件啟動時會使用該環(huán)境變量的
特殊情況:如果在Flink application中需要用到Hadoop的input-/output format躲因,只需引入Hadoop兼容包即可(Hadoop compatibility wrappers)
org.apache.flink
flink-hadoop-compatibility_2.11
1.6.2
1.5 Flink項目打包
Flink 可以使用maven-shade-plugin對Flink maven項目進行打包,具體打包命令為mvn clean package忌傻。
2. 自己編譯Flink
2.1安裝maven
1.下載
到maven官網(wǎng)下載安裝包大脉,這里我們可以選擇使用apache-maven-3.3.9-bin.tar.gz。
2.解壓
將apache-maven-3.3.9-bin.tar.gz安裝包上傳至主節(jié)點的水孩,然后使用tar命令進行解壓
tar -zxvf apache-maven-3.3.9-bin.tar.gz
3.創(chuàng)建軟連接
ln -s apache-maven-3.3.9 maven
4.配置環(huán)境變量
vi ~/.bashrc
export MAVEN_HOME=/home/hadoop/app/maven
export PATH=$MAVEN_HOME/bin:$PATH
5.生效環(huán)境變量
source ~/.bashrc
6.查看maven版本
mvn –version
7. settings.xml配置阿里鏡像
添加阿里鏡像
?????????????????? nexus-osc
?????????????????? *
?????????????????? Nexusosc
?????????????????? http://maven.aliyun.com/nexus/content/repositories/central
2.2安裝jdk
編譯flink要求jdk8或者以上版本镰矿,這里已經(jīng)提前安裝好jdk1.8,具體安裝配置不再贅敘俘种,查看版本如下:
[hadoop@cdh01 conf]$ java -version
java version "1.8.0_51"
Java(TM) SE Runtime Environment (build1.8.0_51-b16)
Java HotSpot(TM) 64-Bit Server VM (build25.51-b03, mixed mode)
2.3下載源碼
登錄github:https://github.com/apache/flink秤标,獲取flink下載地址:https://github.com/apache/flink.git
打開Flink主節(jié)點終端,進入/home/hadoop/opensource目錄安疗,通過git
clone下載flink源碼:
gitclonehttps://github.com/apache/flink.git
錯誤1:如果Linux沒有安裝git抛杨,會報如下錯誤:
bash:git: command not found
解決:git安裝步驟如下所示:
1.安裝編譯git時需要的包(注意需要在root用戶下安裝)
yuminstall curl-devel expat-devel gettext-devel openssl-devel zlib-devel
yuminstall? gcc perl-ExtUtils-MakeMaker
2.刪除已有的git
yumremove git
3.下載git源碼
先安裝wget
yum -yinstall wget
使用wget下載git源碼
wgethttps://www.kernel.org/pub/software/scm/git/git-2.0.5.tar.gz
解壓git
tar xzfgit-2.0.5.tar.gz
編譯安裝git
cdgit-2.0.5
makeprefix=/usr/local/git all
sudomake prefix=/usr/local/git install
echo"export PATH=$PATH:/usr/local/git/bin" >> ~/.bashrc
source~/.bashrc
查看git版本
git –version
錯誤2:git clonehttps://github.com/apache/flink.git
Cloninginto 'flink'...
fatal:unable to access 'https://github.com/apache/flink.git/': SSL connect error
解決:
升級 nss 版本:yum updatenss
2.4切換對應(yīng)flink版本
使用如下命令查看flink版本分支
git tag
切換到flink對應(yīng)版本(這里我們使用flink1.6.2)
git checkout release-1.6.2
2.5編譯flink
進入flink 源碼根目錄:/home/hadoop/opensource/flink,通過maven編譯flink
mvn clean install -DskipTests-Dhadoop.version=2.6.0
報錯:
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 06:58 min
[INFO] Finished at:2019-01-18T22:11:54-05:00
[INFO] Final Memory: 106M/454M
[INFO]------------------------------------------------------------------------
[ERROR] Failed to execute goal on projectflink-mapr-fs: Could not resolve dependencies for projectorg.apache.flink:flink-mapr-fs:jar:1.6.2: Could not find artifactcom.mapr.hadoop:maprfs:jar:5.2.1-mapr in nexus-osc(http://maven.aliyun.com/nexus/content/repositories/central) -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of theerrors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch toenable full debug logging.
[ERROR]
[ERROR] For more information about theerrors and possible solutions, please read the following articles:
[ERROR] [Help 1]http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[ERROR]
[ERROR] After correcting the problems, youcan resume the build with the command
[ERROR]??mvn -rf :flink-mapr-fs
報錯缺失flink-mapr-fs荐类,需要手動下載安裝怖现。
解決:
1.下載maprfs jar包
通過手動下載maprfs-5.2.1-mapr.jar包,下載地址地址:https://repository.mapr.com/nexus/content/groups/mapr-public/com/mapr/hadoop/maprfs/5.2.1-mapr/
2.上傳至主節(jié)點
將下載的maprfs-5.2.1-mapr.jar包上傳至主節(jié)點的/home/hadoop/downloads目錄下玉罐。
3.手動安裝
手動安裝缺少的包到本地倉庫
mvn install:install-file-DgroupId=com.mapr.hadoop -DartifactId=maprfs -Dversion=5.2.1-mapr-Dpackaging=jar?-Dfile=/home/hadoop/downloads/maprfs-5.2.1-mapr.jar
4.繼續(xù)編譯
使用maven繼續(xù)編譯flink(可以排除剛剛已經(jīng)安裝的包)
mvn clean install -Dmaven.test.skip=true-Dhadoop.version=2.7.3? -rf:flink-mapr-fs
報錯:
[INFO] BUILD FAILURE
[INFO]------------------------------------------------------------------------
[INFO] Total time: 05:51 min
[INFO] Finished at:2019-01-18T22:39:20-05:00
[INFO] Final Memory: 108M/480M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goalorg.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) onproject flink-mapr-fs: Compilation failure: Compilation failure:
[ERROR] /home/hadoop/opensource/flink/flink-filesystems/flink-mapr-fs/src/main/java/org/apache/flink/runtime/fs/maprfs/MapRFileSystem.java:[70,44]package org.apache.hadoop.fs does not exist
[ERROR]/home/hadoop/opensource/flink/flink-filesystems/flink-mapr-fs/src/main/java/org/apache/flink/runtime/fs/maprfs/MapRFileSystem.java:[73,45]cannot find symbol
[ERROR] symbol:?? class Configuration
[ERROR] location: packageorg.apache.hadoop.conf
[ERROR]/home/hadoop/opensource/flink/flink-filesystems/flink-mapr-fs/src/main/java/org/apache/flink/runtime/fs/maprfs/MapRFileSystem.java:[73,93]cannot find symbol
[ERROR] symbol:?? class Configuration
缺失org.apache.hadoop.fs包屈嗤,報錯找不到。
解決:
flink-mapr-fs模塊的pom文件中添加如下依賴:
???????? org.apache.hadoop
???????? hadoop-common
???????? ${hadoop.version}
繼續(xù)往后編譯:
mvn clean install -Dmaven.test.skip=true-Dhadoop.version=2.7.3? -rf:flink-mapr-fs
又報錯:
[ERROR] Failed to execute goal on projectflink-avro-confluent-registry: Could not resolve dependencies for projectorg.apache.flink:flink-avro-confluent-registry:jar:1.6.2: Could not findartifact io.confluent:kafka-schema-registry-client:jar:3.3.1 in nexus-osc(http://maven.aliyun.com/nexus/content/repositories/central) -> [Help 1]
[ERROR]
報錯缺少kafka-schema-registry-client-3.3.1.jar 包
解決:
手動下載kafka-schema-registry-client-3.3.1.jar包吊输,下載地址如下:
將下載的kafka-schema-registry-client-3.3.1.jar上傳至主節(jié)點的目錄下/home/hadoop/downloads
手動安裝缺少的kafka-schema-registry-client-3.3.1.jar包
mvninstall:install-file -DgroupId=io.confluent-DartifactId=kafka-schema-registry-client -Dversion=3.3.1 -Dpackaging=jar?-Dfile=/home/hadoop/downloads/kafka-schema-registry-client-3.3.1.jar
繼續(xù)往后編譯
mvnclean install -Dmaven.test.skip=true -Dhadoop.version=2.7.3? -rf :flink-mapr-fs