Apache Flink 初識
Apache Flink作為Apache的頂級項目勇哗,固然集眾多優(yōu)點于一身。Flink具有分布式MR一類平臺的高效性辜伟,靈活性和擴展性菊霜。同時坚冀,F(xiàn)link 還支持批量和局域流的數(shù)據(jù)分析,而且提供基于Java和Scala的API鉴逞〖悄常總的來說,F(xiàn)link是一個分布式构捡,高性能液南,高可用,準確的勾徽,基于Java實現(xiàn)的通用大數(shù)據(jù)分析引擎滑凉。引用官網(wǎng)的一句話介紹Flink:基于數(shù)據(jù)流的有狀態(tài)計算-Stateful Computations over Data Streams。
flink.png
Flink運行模式
1:Flink和Spark一樣有三種部署模式喘帚,分別是Local,Standalone Cluster和Yarn Cluster畅姊。本文主要是介紹在Yarn Cluster模式下,F(xiàn)link任務(wù)的執(zhí)行和資源分配是如何的吹由!
flink 運行方式.png
2:啟動yarn-session時需要指定的參數(shù):
Usage:
Required
-n,--container <arg> Number of YARN container to allocate (=Number of Task Managers)
Optional
-D <property=value> use value for given property
-d,--detached If present, runs the job in detached mode
-h,--help Help for the Yarn session CLI.
-id,--applicationId <arg> Attach to running YARN session
-j,--jar <arg> Path to Flink jar file
-jm,--jobManagerMemory <arg> Memory for JobManager Container with optional unit (default: MB)
-m,--jobmanager <arg> Address of the JobManager (master) to which to connect. Use this flag to connect to a different JobManager than the one specified in the configuration.
-n,--container <arg> Number of YARN container to allocate (=Number of Task Managers)
-nl,--nodeLabel <arg> Specify YARN node label for the YARN application
-nm,--name <arg> Set a custom name for the application on YARN
-q,--query Display available YARN resources (memory, cores)
-qu,--queue <arg> Specify YARN queue.
-s,--slots <arg> Number of slots per TaskManager
-sae,--shutdownOnAttachedExit If the job is submitted in attached mode, perform a best-effort cluster shutdown when the CLI is terminated abruptly, e.g., in response to a user interrupt, such
as typing Ctrl + C.
-st,--streaming Start Flink in streaming mode
-t,--ship <arg> Ship files in the specified directory (t for transfer)
-tm,--taskManagerMemory <arg> Memory per TaskManager Container with optional unit (default: MB)
-yd,--yarndetached If present, runs the job in detached mode (deprecated; use non-YARN specific option instead)
-z,--zookeeperNamespace <arg> Namespace to create the Zookeeper sub-paths for high availability mode
第一種方式提交flink程序:
首先我們啟動yarn-session
./bin/yarn-session.sh -n 2 -s 2 -jm 2048 -tm 4096 -nm flink_session_cluster_20190320
yarn-session-方式1.png
然后我們在這個flink集群中提交任務(wù)若未,在我們提交任務(wù)的時候需要指定yid,yid就是我們上面開啟集群所屬的ID:application_1546585584446_0144倾鲫,這個時候提交的任務(wù)就會到我們開啟的集群中粗合。
flink run \
-yid application_1546585584446_0144 \
-c com.hfjy.bigdata.ls.nginx.parsenginx.AliyunOnlineParseNginx /opt/jars /online_aliyun_ls_parse_nginx_test.jar \
--output ${elasticsearch} \
--ipDbPath /opt/lib/ \
--windowSize 10
yarn-session-job.png
第二種方式提交flink程序
./bin/flink 提交任務(wù)所需參數(shù):
"run" action options:
-c,--class <classname> Class with the program entry point
("main" method or "getPlan()" method.
Only needed if the JAR file does not
specify the class in its manifest.
-C,--classpath <url> Adds a URL to each user code
classloader on all nodes in the
cluster. The paths must specify a
protocol (e.g. file://) and be
accessible on all nodes (e.g. by means
of a NFS share). You can use this
option multiple times for specifying
more than one URL. The protocol must
be supported by the {@link
java.net.URLClassLoader}.
-d,--detached If present, runs the job in detached
mode
-n,--allowNonRestoredState Allow to skip savepoint state that
cannot be restored. You need to allow
this if you removed an operator from
your program that was part of the
program when the savepoint was
triggered.
-p,--parallelism <parallelism> The parallelism with which to run the
program. Optional flag to override the
default value specified in the
configuration.
-q,--sysoutLogging If present, suppress logging output to
standard out.
-s,--fromSavepoint <savepointPath> Path to a savepoint to restore the job
from (for example
hdfs:///flink/savepoint-1537).
-sae,--shutdownOnAttachedExit If the job is submitted in attached
mode, perform a best-effort cluster
shutdown when the CLI is terminated
abruptly, e.g., in response to a user
interrupt, such as typing Ctrl + C.
Options for yarn-cluster mode:
-d,--detached If present, runs the job in detached
mode
-m,--jobmanager <arg> Address of the JobManager (master) to
which to connect. Use this flag to
connect to a different JobManager than
the one specified in the
configuration.
-sae,--shutdownOnAttachedExit If the job is submitted in attached
mode, perform a best-effort cluster
shutdown when the CLI is terminated
abruptly, e.g., in response to a user
interrupt, such as typing Ctrl + C.
-yD <property=value> use value for given property
-yd,--yarndetached If present, runs the job in detached
mode (deprecated; use non-YARN
specific option instead)
-yh,--yarnhelp Help for the Yarn session CLI.
-yid,--yarnapplicationId <arg> Attach to running YARN session
-yj,--yarnjar <arg> Path to Flink jar file
-yjm,--yarnjobManagerMemory <arg> Memory for JobManager Container with
optional unit (default: MB)
-yn,--yarncontainer <arg> Number of YARN container to allocate
(=Number of Task Managers)
-ynl,--yarnnodeLabel <arg> Specify YARN node label for the YARN
application
-ynm,--yarnname <arg> Set a custom name for the application
on YARN
-yq,--yarnquery Display available YARN resources
(memory, cores)
-yqu,--yarnqueue <arg> Specify YARN queue.
-ys,--yarnslots <arg> Number of slots per TaskManager
-yst,--yarnstreaming Start Flink in streaming mode
-yt,--yarnship <arg> Ship files in the specified directory
(t for transfer)
-ytm,--yarntaskManagerMemory <arg> Memory per TaskManager Container with
optional unit (default: MB)
-yz,--yarnzookeeperNamespace <arg> Namespace to create the Zookeeper
sub-paths for high availability mode
-z,--zookeeperNamespace <arg> Namespace to create the Zookeeper
sub-paths for high availability mode
Options for default mode:
-m,--jobmanager <arg> Address of the JobManager (master) to which
to connect. Use this flag to connect to a
different JobManager than the one specified
in the configuration.
-z,--zookeeperNamespace <arg> Namespace to create the Zookeeper sub-paths
for high availability mode
提交flink任務(wù)
flink run \
-m yarn-cluster \
-ynm AliyunNginxStudy2 \
-yn 1 \
-ys 3 \
-p 3 \
-yjm 2048m \
-ytm 8192m \
-c com.hfjy.bigdata.ls.nginx.parsenginx.AliyunOnlineParseNginx /opt/jars/online_aliyun_ls_parse_nginx_test.jar \
--output ${elasticsearch} \
--ipDbPath /opt/lib/ \
--windowSize 10
yarn-session-方式2.png