74.1 演示環(huán)境介紹
- Kerberos環(huán)境的CDH集群
- CM和CDH版本:5.13.1
74.2 操作演示
jar包上傳到HDFS目錄
[root@ip-186-31-16-68 ~]# kinit fayson
Password for fayson@FAYSON.COM:
[root@ip-186-31-16-68 ~]# klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: fayson@FAYSON.COM
Valid starting Expires Service principal
02/22/2018 21:12:41 02/23/2018 21:12:41 krbtgt/FAYSON.COM@FAYSON.COM
renew until 03/01/2018 21:12:41
[root@ip-186-31-16-68 ~]#
hadoop fs -mkdir -p /fayson/jars
hadoop fs -put /opt/cloudera/parcels/CDH/jars/spark-examples-1.6.0-cdh5.13.1-hadoop2.6.0-cdh5.13.1.jar /fayson/jars
hadoop fs -ls /fayson/jars
- 定義一個(gè)Spark Action的workflow.xml文件:
- workflow.xml文件中使用的參數(shù)配置為動(dòng)態(tài)參數(shù)纵搁,會(huì)在后面的代碼中指定該參數(shù)的值
<workflow-app name="My Workflow" xmlns="uri:oozie:workflow:0.5">
<start to="spark-989b"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="spark-989b">
<spark xmlns="uri:oozie:spark-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<master>${master}</master>
<mode>${mode}</mode>
<name>${name}</name>
<class>${class}</class>
<jar>${jar}</jar>
<spark-opts>${sparkOpts}</spark-opts>
<arg>${arg}</arg>
<file>${file}</file>
</spark>
<ok to="End"/>
<error to="Kill"/>
</action>
<end name="End"/>
</workflow-app>
定義好的workflow.xml文件上傳至HDFS的/user/fayson/oozie/testoozie目錄下
hadoop fs -mkdir -p /user/fayson/oozie/testoozie
hadoop fs -put workflow.xml /user/fayson/oozie/testoozie
hadoop fs -ls /user/fayson/oozie/testoozie
準(zhǔn)備JAAS文件oozie-login.conf:
com.sun.security.jgss.initiate {
com.sun.security.auth.module.Krb5LoginModule required
storeKey=true
useKeyTab=true
debug=true
keyTab="/Volumes/Transcend/keytab/fayson.keytab"
principal="fayson@FAYSON.COM";
};
- Maven創(chuàng)建Java工程
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<artifactId>cdh-project</artifactId>
<groupId>com.cloudera</groupId>
<version>1.0-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>
<artifactId>oozie-demo</artifactId>
<packaging>jar</packaging>
<name>oozie-demo</name>
<url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.5.4</version>
</dependency>
<dependency>
<groupId>net.sourceforge.spnego</groupId>
<artifactId>spnego</artifactId>
<version>7.0</version>
</dependency>
<dependency>
<groupId>org.apache.oozie</groupId>
<artifactId>oozie-client</artifactId>
<version>4.1.0</version>
</dependency>
</dependencies>
</project>
編寫SparkWorkflowDemo.java
package com.cloudera.kerberos;
import org.apache.oozie.client.*;
import java.util.List;
import java.util.Properties;
/**
* package: com.cloudera.nokerberos
* describe: 使用Oozie-client的API接口向Kerberos集群提交Java程序
* creat_user: Fayson
* email: htechinfo@163.com
* creat_date: 2018/2/23
* creat_time: 上午10:20
* 公眾號(hào):碧茂大數(shù)據(jù)
*/
public class SparkWorkflowDemo {
private static String oozieURL = "http://ip-186-31-16-68.ap-southeast-1.compute.internal:11000/oozie";
public static void main(String[] args) {
System.setProperty("java.security.krb5.conf", "/Volumes/Transcend/keytab/krb5.conf");
System.setProperty("javax.security.auth.useSubjectCredsOnly", "false");
System.setProperty("ssun.security.jgss.debug", "true"); //Kerberos Debug模式
System.setProperty("java.security.auth.login.config", "/Volumes/Transcend/keytab/oozie-login.conf");
System.setProperty("user.name", "fayson");
AuthOozieClient oozieClient = new AuthOozieClient(oozieURL, AuthOozieClient.AuthType.KERBEROS.name());
oozieClient.setDebugMode(1);
try {
System.out.println(oozieClient.getServerBuildVersion());
Properties properties = oozieClient.createConfiguration();
properties.put("oozie.wf.application.path", "${nameNode}/user/fayson/oozie/testoozie");
properties.put("name", "MyfirstSpark");
properties.put("nameNode", "hdfs://ip-186-31-16-68.ap-southeast-1.compute.internal:8020");
properties.put("oozie.use.system.libpath", "True");
properties.put("master", "yarn-cluster");
properties.put("mode", "cluster");
properties.put("class", "org.apache.spark.examples.SparkPi");
properties.put("arg", "100");
properties.put("sparkOpts", "--num-executors 4 --driver-memory 2g --driver-cores 1 --executor-memory 2g --executor-cores 1");
properties.put("jar", "${nameNode}/fayson/jars/spark-examples-1.6.0-cdh5.13.1-hadoop2.6.0-cdh5.13.1.jar");
properties.put("oozie.libpath", "${nameNode}/fayson/jars");
properties.put("jobTracker", "ip-186-31-16-68.ap-southeast-1.compute.internal:8032");
properties.put("file", "${nameNode}/fayson/jars");
//運(yùn)行workflow
String jobid = oozieClient.run(properties);
System.out.println(jobid);
//等待10s
new Thread(){
public void run() {
try {
Thread.sleep(10000l);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}.start();
//根據(jù)workflow id獲取作業(yè)運(yùn)行情況
WorkflowJob workflowJob = oozieClient.getJobInfo(jobid);
//獲取作業(yè)日志
System.out.println(oozieClient.getJobLog(jobid));
//獲取workflow中所有ACTION
List<WorkflowAction> list = workflowJob.getActions();
for (WorkflowAction action : list) {
//輸出每個(gè)Action的 Appid 即Yarn的Application ID
System.out.println(action.getExternalId());
}
} catch (OozieClientException e) {
e.printStackTrace();
}
}
}
總結(jié):
1.需要先定義好workflow.xml文件
2.參數(shù)傳遞通過在代碼里面調(diào)用oozieClient.createConfiguration()創(chuàng)建一個(gè)Properties對象將K,V值存儲(chǔ)并傳入oozieClient.run(properties)中
3.在指定HDFS上運(yùn)行的jar或workflow的路徑時(shí)需要帶上HDFS的路徑豺撑,否則默認(rèn)會(huì)找到本地的目錄
4.向Kerberos集群提交作業(yè)需要在程序中加載JAAS配置
5.Oozie-client提供了Kerberos認(rèn)證的AuthOozieClient API接口
大數(shù)據(jù)視頻推薦:
騰訊課堂
CSDN
大數(shù)據(jù)語音推薦:
企業(yè)級(jí)大數(shù)據(jù)技術(shù)應(yīng)用
大數(shù)據(jù)機(jī)器學(xué)習(xí)案例之推薦系統(tǒng)
自然語言處理
大數(shù)據(jù)基礎(chǔ)
人工智能:深度學(xué)習(xí)入門到精通