時(shí)間格式 :04/Jun/2021:22:11:18 +0800
這個(gè)時(shí)間是個(gè)并非Nginx的標(biāo)準(zhǔn)時(shí)間格式 ,原始的為 :[04/Jun/2021:22:11:18 +0800],可以在代碼里面自行截取知允,使用subString方法.
1勇哗,自己實(shí)現(xiàn)UDF函數(shù)
2皆警,打包上傳到HDFS
3,創(chuàng)建UDF
1.1 導(dǎo)入maven依賴(lài)
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>8</source>
<target>8</target>
</configuration>
</plugin>
</plugins>
</build>
<repositories>
<repository>
<id>hive-udf</id>
<url>https://maven.aliyun.com/repository/spring-plugin/</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>2.3.7</version>
</dependency>
</dependencies>
1.2 創(chuàng)建Java類(lèi)
import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDF;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.Locale;
@Description(
name = "formatNginxDate",
value = "_FUNC_(strNginxDate) - 輸入格式為 04/Jun/2021:22:11:18 +0800",
extended = "格式化nginx日志時(shí)間:格式為 yyyy-MM-dd"
)
public class format_nginxDate extends UDF {
public String evaluate(String nginxTime){
String resTime = null;
try {
SimpleDateFormat dateFormat = null;
SimpleDateFormat resFormat = null;
if (nginxTime.trim().length() > 24) {
dateFormat = new SimpleDateFormat("dd/MMM/yyyy:HH:mm:ss Z", Locale.ENGLISH);
// 轉(zhuǎn)化為 Date
Date parseDate = dateFormat.parse(nginxTime);
//進(jìn)行格式化
resFormat = new SimpleDateFormat("yyyy-MM-dd");
resTime = resFormat.format(parseDate);
}
}catch ( Exception e ){
return null;
}finally {
return resTime;
}
}
}
2.1 打包 mvn package
hive-udf-1.0-SNAPSHOT.jar
2.2 上傳到HDFS
su hdfs
hdfs dfs -mkdir /lib
linux本地包路徑 /developers/hive-udf-1.0-SNAPSHOT.jar
hdfs dfs -put /developers/hive-udf-1.0-SNAPSHOT.jar /lib
3,創(chuàng)建UDF函數(shù)
登錄到Hive命令行
3.1 創(chuàng)建臨時(shí)函數(shù)
add jar hdfs://nameservice1:8020/lib/hive-udf-1.0-SNAPSHOT.jar
如果添加不成功昂利,去掉兩頭的單引號(hào)再次進(jìn)行嘗試
create temporary function format_nginxDate as 'com.tiens.hiveUDF.format_nginxDate';
3.2 創(chuàng)建永久函數(shù)
CREATE FUNCTION format_nginxDate AS 'com.tiens.hiveUDF.format_nginxDate' USING JAR 'hdfs://nameservice1:8020/lib/hive-udf-1.0-SNAPSHOT.jar';
3.3 刪除函數(shù)
DROP FUNCTION IF EXISTS format_nginxDate
下一章:Nginx原始數(shù)據(jù)寫(xiě)入HDFS如何使用正則跟Hive表做映射