這種方式的好處是一個(gè)任務(wù)會(huì)對(duì)應(yīng)一個(gè)job,即每提交一個(gè)作業(yè)會(huì)根據(jù)自身的情況,向yarn申請(qǐng)資源穴豫,直到作業(yè)執(zhí)行完成铺峭,并不會(huì)影響下一個(gè)作業(yè)的正常運(yùn)行冀自,除非是yarn上面沒(méi)有任何資源的情況下。
注意:client端必須要設(shè)置YARN_CONF_DIR或者HADOOP_CONF_DIR或者HADOOP_HOME環(huán)境變量厨相,通過(guò)這個(gè)環(huán)境變量來(lái)讀取YARN和HDFS的配置信息领曼,否則啟動(dòng)會(huì)失敗
不需要在yarn當(dāng)中啟動(dòng)任何集群,直接提交任務(wù)即可
第一步:直接執(zhí)行命令提交任務(wù)
cd /kkb/install/flink-1.8.1/
bin/flink run -m yarn-cluster -yn 2 -yjm 1024 -ytm 1024 ./examples/batch/WordCount.jar -input hdfs://node01:8020/flink_input -output hdfs://node01:8020/out_result/out_count.txt
第二步:查看輸出結(jié)果
hdfs執(zhí)行以下命令查看輸出結(jié)果
hdfs dfs -text hdfs://node01:8020/out_result/out_count.txt
第三步:查看flink run幫助文檔
我們可以使用--help 來(lái)查看幫助文檔可以添加哪些參數(shù)
cd /kkb/install/flink-1.8.1/
bin/flink run --help
得到結(jié)果內(nèi)容如下
Action "run" compiles and runs a program.
Syntax: run [OPTIONS] <jar-file> <arguments>
"run" action options:
-c,--class <classname> Class with the program entry point
("main" method or "getPlan()" method.
Only needed if the JAR file does not
specify the class in its manifest.
-C,--classpath <url> Adds a URL to each user code
classloader on all nodes in the
cluster. The paths must specify a
protocol (e.g. file://) and be
accessible on all nodes (e.g. by means
of a NFS share). You can use this
option multiple times for specifying
more than one URL. The protocol must
be supported by the {@link
java.net.URLClassLoader}.
-d,--detached If present, runs the job in detached
mode
-n,--allowNonRestoredState Allow to skip savepoint state that
cannot be restored. You need to allow
this if you removed an operator from
your program that was part of the
program when the savepoint was
triggered.
-p,--parallelism <parallelism> The parallelism with which to run the
program. Optional flag to override the
default value specified in the
configuration.
-q,--sysoutLogging If present, suppress logging output to
standard out.
-s,--fromSavepoint <savepointPath> Path to a savepoint to restore the job
from (for example
hdfs:///flink/savepoint-1537).
-sae,--shutdownOnAttachedExit If the job is submitted in attached
mode, perform a best-effort cluster
shutdown when the CLI is terminated
abruptly, e.g., in response to a user
interrupt, such as typing Ctrl + C.
Options for yarn-cluster mode:
-d,--detached If present, runs the job in detached
mode
-m,--jobmanager <arg> Address of the JobManager (master) to
which to connect. Use this flag to
connect to a different JobManager than
the one specified in the
configuration.
-sae,--shutdownOnAttachedExit If the job is submitted in attached
mode, perform a best-effort cluster
shutdown when the CLI is terminated
abruptly, e.g., in response to a user
interrupt, such as typing Ctrl + C.
-yD <property=value> use value for given property
-yd,--yarndetached If present, runs the job in detached
mode (deprecated; use non-YARN
specific option instead)
-yh,--yarnhelp Help for the Yarn session CLI.
-yid,--yarnapplicationId <arg> Attach to running YARN session
-yj,--yarnjar <arg> Path to Flink jar file
-yjm,--yarnjobManagerMemory <arg> Memory for JobManager Container with
optional unit (default: MB)
-yn,--yarncontainer <arg> Number of YARN container to allocate
(=Number of Task Managers)
-ynl,--yarnnodeLabel <arg> Specify YARN node label for the YARN
application
-ynm,--yarnname <arg> Set a custom name for the application
on YARN
-yq,--yarnquery Display available YARN resources
(memory, cores)
-yqu,--yarnqueue <arg> Specify YARN queue.
-ys,--yarnslots <arg> Number of slots per TaskManager
-yst,--yarnstreaming Start Flink in streaming mode
-yt,--yarnship <arg> Ship files in the specified directory
(t for transfer)
-ytm,--yarntaskManagerMemory <arg> Memory per TaskManager Container with
optional unit (default: MB)
-yz,--yarnzookeeperNamespace <arg> Namespace to create the Zookeeper
sub-paths for high availability mode
-z,--zookeeperNamespace <arg> Namespace to create the Zookeeper
sub-paths for high availability mode
Options for default mode:
-m,--jobmanager <arg> Address of the JobManager (master) to which
to connect. Use this flag to connect to a
different JobManager than the one specified
in the configuration.
-z,--zookeeperNamespace <arg> Namespace to create the Zookeeper sub-paths
for high availability mode
3蛮穿、flink run腳本分析
我們提交flink任務(wù)的時(shí)候庶骄,可以加以下這些參數(shù)
1、默認(rèn)查找當(dāng)前yarn集群中已有的yarn-session信息中的jobmanager【/tmp/.yarn-properties-root】:
bin/flink run ./examples/batch/WordCount.jar -input hdfs://hostname:port/hello.txt -output hdfs://hostname:port/result1
2践磅、連接指定host和port的jobmanager:
bin/flink run -m node01:8081 ./examples/batch/WordCount.jar -input hdfs://hostname:port/hello.txt -output hdfs://hostname:port/result1
3单刁、啟動(dòng)一個(gè)新的yarn-session:
bin/flink run -m yarn-cluster -yn 2 ./examples/batch/WordCount.jar -input hdfs://hostname:port/hello.txt -output hdfs://hostname:port/result1
注意:yarn session命令行的選項(xiàng)也可以使用./bin/flink 工具獲得。它們都有一個(gè)y或者yarn的前綴
例如:bin/flink run -m yarn-cluster -yn 2 ./examples/batch/WordCount.jar