本文為工作中的一些小積累宣谈。
一尔破,udf含義&作用
UDF指用戶定義函數(shù),即在hive中進(jìn)行客戶化的函數(shù)莺匠。在實(shí)際應(yīng)用中金吗,我們需要根據(jù)實(shí)際的業(yè)務(wù)需求自定義開發(fā)這種函數(shù),自定義函數(shù)的例子參考文檔: Hive Plugins
UDF函數(shù)一共分為三種類型: ?
- UDF-一進(jìn)一出趣竣,給定一個(gè)參數(shù)摇庙,輸出一個(gè)處理后的數(shù)據(jù) ?
- UDAF-多進(jìn)一出,屬于聚合函數(shù)遥缕,類似于count卫袒、sum等函數(shù) ?
- UDTF-一進(jìn)多出,屬于一個(gè)參數(shù)单匣,返回一個(gè)列表作為結(jié)果
#在sql中可以show functions;展示目前系統(tǒng)已有的udf
>show functions;
+------------------------------+--+
| tab_name |
+------------------------------+--+
| ! |
| != |
| % |
| & |
| * |
| + |
| - |
| / |
| < |
| <= |
| <=> |
| <> |
| = |
| == |
| > |
| >= |
二夕凝,udf制作流程
1烤蜕,pom中添加對(duì)應(yīng)的依賴
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.xxx</groupId>
<artifactId>xxx</artifactId>
<version>1.0.0-RELEASE</version>
<properties>
<java.version>1.8</java.version>
<lombok.version>1.16.18</lombok.version>
<fastjson.version>1.2.4</fastjson.version>
<commons.version>3.5</commons.version>
<slf4j.version>1.7.7</slf4j.version>
<log4j.version>1.2.17</log4j.version>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
<scala.version>2.11.7</scala.version>
<hadoop.version>2.7.0</hadoop.version>
<spark.version>2.2.0</spark.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>2.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-metastore</artifactId>
<version>2.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>2.3.1</version>
</dependency>
<dependency>
<groupId>dom4j</groupId>
<artifactId>dom4j</artifactId>
<version>1.5.2</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.3</version>
<configuration>
<compilerVersion>${java.version}</compilerVersion>
<source>${java.version}</source>
<target>${java.version}</target>
<encoding>UTF-8</encoding>
<!-- prevents endPosTable exception for maven compile -->
<useIncrementalCompilation>false</useIncrementalCompilation>
</configuration>
</plugin>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>assembly</goal>
</goals>
</execution>
</executions>
<configuration>
<archive>
<manifest>
<mainClass>com.dazhen.udf.encryption.EncrHive</mainClass>
</manifest>
</archive>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
<outputDirectory>${session.executionRootDirectory}/target/</outputDirectory>
<!--<destName>${project.name}</destName>-->
</configuration>
</plugin>
</plugins>
</build>
</project>
2,實(shí)現(xiàn)hive中的udf
import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDF;
import utils.AESUtil;
/**
* @author water
* @desc AES 加密
*/
@Description(name = "AESEncr", value = "AESEncr context")
public class AESEncr extends UDF {
/**
* 根據(jù)password迹冤,對(duì)content進(jìn)行AES加密,最后返回結(jié)果
* @param password
* @param content
* @return
* @throws Exception
*/
public String evaluate(String password, String content) throws Exception {
if(StringUtils.isBlank(content)){
return null;
}
String result;
try {
result = AESUtil.aesEncrypt(password, content);
return result;
} catch (Exception e) {
throw e;
}
}
}
3虎忌,打包
mvn clean assembly:assembly -DskipTests
#打包后在target文件夾會(huì)出現(xiàn) xxx-1.0.0-RELEASE-jar-with-dependencies.jar
4泡徙,部署jar
#1將上面打包的jar放到hdfs上某路徑下
# hdfs dfs -put -f(強(qiáng)制更新) xxx-1.0.0-RELEASE-jar-with-dependencies.jar(jar位置) /user/xxx/(hdfs路徑)
> hdfs dfs -put -f xxx-1.0.0-RELEASE-jar-with-dependencies.jar /user/xxx/
#2 進(jìn)入hive或者sparkSQL,將函數(shù)注冊(cè)
#注冊(cè)有分為永久函數(shù)膜蠢,和臨時(shí)函數(shù)
##永久方法堪藐,需要指定一個(gè)數(shù)據(jù)庫(kù)名
##CREATE FUNCTION [db_name.]function_name AS class_name
## [USING JAR|FILE|ARCHIVE'file_uri' [, JAR|FILE|ARCHIVE'file_uri'] ];
create function udf.aesEncrypt as 'com.xxx.xxx.xxx.AESEncr' using jar 'hdfs:///user/xxx/xxx-1.0.0-RELEASE-jar-with-dependencies.jar';
##臨時(shí)方法,作用域?yàn)楫?dāng)前session
create temporary function encr as 'EncrHive'
三挑围,效果展示
> select udf.aesEncrypt('zmbd','4646');
+-----------------------------------+--+
| _c0 |
+-----------------------------------+--+
| B91F918A6FA7C191BD8C465A888EE828 |
+-----------------------------------+--+