對(duì)日志進(jìn)行分析思路
日志內(nèi)容格式
"27.38.5.159" "-" "31/Aug/2015:00:04:37 +0800" "GET /course/view.php?id=27 HTTP/1.1" "303" "440" - "http://www.ibeifeng.com/user.php?act=mycourse" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36" "-" "learn.ibeifeng.com"
注意440 后面的- 沒有被引號(hào)包圍哪痰。創(chuàng)建
創(chuàng)建利用正則格式化數(shù)據(jù)的表
CREATE TABLE bf_log_src (
host STRING,
identity STRING,
user STRING,
time STRING,
request STRING,
status STRING,
size STRING,
referer STRING,
agent STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "([^]) ([^]) ([^]) (-|\[^\]\]) ([^ "]|"[^"]") (-|[0-9]) (-|[0-9])(?: ([^ "]|".") ([^ "]|"."))?"
)
STORED AS TEXTFILE;
導(dǎo)入數(shù)據(jù)
針對(duì)不同業(yè)務(wù)創(chuàng)建子表
向子表導(dǎo)入數(shù)據(jù)
執(zhí)行測(cè)試語句
用戶自定義時(shí)間轉(zhuǎn)換函數(shù)(UDF)
evaluate
打jar并加入到hive中
創(chuàng)建日期格式轉(zhuǎn)換函數(shù)
image.png
使用日期格式轉(zhuǎn)換函數(shù)摔敛,并重新覆蓋數(shù)據(jù)
分析時(shí)間
分析用戶訪問網(wǎng)站的時(shí)間段
針對(duì)銷售來說攘烛,合理安排值班细疚,銷售課程
-
執(zhí)行結(jié)果