從朋友那里拿到一個需求灸叼,根據(jù)日志分析統(tǒng)計并發(fā)情況汉额,統(tǒng)計自定義時間段的用戶流量,具體需求如下:
- 曬選某個時間點的數(shù)據(jù):查找某天下午1:30到2:30這個時間段或者自己定義际跪,看看到底有多少
- 同時篩選出兩個項目標號的文檔商佛,統(tǒng)計和去重統(tǒng)計17/18
- babyhealth:[2016-05-03 19:21:23] INFO orderinfo:472 - v41/nbCode/getauth||1508031:17
上面第三行的內(nèi)容就是日志的格式,簡單分析了一下需求姆打,shell腳本、Excel 分列篩選透視肠虽、Python都 可以解決幔戏,考慮到日志數(shù)量較大,通過 Excel 效率可能偏低税课,因此這里采用 Python 進行處理闲延。
處理的主要思路是:
- 對日志文件內(nèi)容進行分解。每條日志的關鍵信息有日期(2016-05-03)韩玩、時間(19:21:23)垒玲、值(1508031)、塊(17)找颓,應用正則對日志進行逐行的匹配合愈,提取關鍵信息,放到數(shù)據(jù)庫中
- 在數(shù)據(jù)庫中通過SQL語句進行統(tǒng)計
對于第一部分,代碼如下:
#! /usr/bin/env python
# -*- coding: utf-8 -*-
import re
import sqlite3
#說明
print u''
print u'-----------------使用說明-----------------'
print u'將腳本放入日志所在目錄佛析,運行腳本益老,輸入日志\n文件名(含擴展名),將在相同目錄生成log.db\n數(shù)據(jù)庫'
print u'------------------------------------------'
print u''
#讀取日志文件寸莫、連接數(shù)據(jù)庫
name = raw_input('Please input the log name:')
log_db = sqlite3.connect('./log.db')
cursor = log_db.cursor()
#建表
cursor.execute('DROP TABLE IF EXISTS log')
cursor.execute('CREATE TABLE log (front, log_date, log_time, log_path, value, level)')
#設定正則規(guī)則
data_re = re.compile(r'正則規(guī)則不放了捺萌,會和代碼編輯器沖突',re.S|re.M)
#讀取日志文件
log = open('./%s' %name)
index = 0
#進行轉(zhuǎn)換
try:
for line in log:
found = data_re.findall(line)
if found != None and len(found) != 0:
index = index + 1
cursor.execute('INSERT INTO log (front, log_date, log_time, log_path, value, level) VALUES (?,?,?,?,?,?)', (found[0][0], found[0][1], found[0][2], found[0][3], found[0][4], found[0][5]))
if index == 100:
log_db.commit()
#print '100 records has been submitted.'
log_db.commit()
print 'job done!'
input()
except:
pass
以上程序在 Linux 和 Windows 下都可以執(zhí)行,生成 log.db 數(shù)據(jù)庫膘茎, Linux 下可以直接使用 Sqlite3 對數(shù)據(jù)庫進行操作桃纯, Windows 下可以安裝 SQLite Expert 進行操作和查詢,查詢語句如下:
SELECT COUNT(*) FROM log WHERE log_date = '2016-05-03' and log_time between '13:30:00' and '14:30:00';
SELECT COUNT(*) FROM log WHERE log_date = '2016-05-03' and log_time between '13:30:00' and '14:30:00' and level = '17';
SELECT COUNT(*) FROM log WHERE log_date = '2016-05-03' and log_time between '13:30:00' and '14:30:00' and level = '18';
SELECT COUNT(DISTINCT value)FROM log WHERE log_date = '2016-05-03' and log_time between '13:30:00' and '14:30:00';
SELECT COUNT(DISTINCT value) FROM log WHERE log_date = '2016-05-03' and log_time between '13:30:00' and '14:30:00' and level = '17';
SELECT COUNT(DISTINCT value) FROM log WHERE log_date = '2016-05-03' and log_time between '13:30:00' and '14:30:00' and level = '18';
為了提高方便程度披坏,可以用 Python 操作 SQLite3 進行查詢:
#! /usr/bin/env python
# -*- coding: utf-8 -*-
import sqlite3
date = raw_input('Please input the date(eg.2016-05-03):')
start_time = raw_input('Please input the start time(eg.08:30:00):')
end_time = raw_input('Please input the end time(eg.09:30:00):')
stat_db = sqlite3.connect('./log.db')
cursor = stat_db.cursor()
cursor.execute('SELECT COUNT(*) FROM log WHERE log_date = \'%s\' and log_time between \'%s\' and \'%s\'' %(date,start_time,end_time))
amount_all = cursor.fetchall()[0][0]
cursor.execute('SELECT COUNT(DISTINCT value) FROM log WHERE log_date = \'%s\' and log_time between \'%s\' and \'%s\'' %(date,start_time,end_time))
amount_na = cursor.fetchall()[0][0]
cursor.execute('SELECT COUNT(*) FROM log WHERE log_date = \'%s\' and log_time between \'%s\' and \'%s\' and level = \'17\'' %(date,start_time,end_time))
month_all = cursor.fetchall()[0][0]
cursor.execute('SELECT COUNT(DISTINCT value) FROM log WHERE log_date = \'%s\' and log_time between \'%s\' and \'%s\' and level = \'17\'' %(date,start_time,end_time))
month_na = cursor.fetchall()[0][0]
cursor.execute('SELECT COUNT(*) FROM log WHERE log_date = \'%s\' and log_time between \'%s\' and \'%s\' and level = \'18\'' %(date,start_time,end_time))
year_all = cursor.fetchall()[0][0]
cursor.execute('SELECT COUNT(DISTINCT value) FROM log WHERE log_date = \'%s\' and log_time between \'%s\' and \'%s\' and level = \'18\'' %(date,start_time,end_time))
year_na = cursor.fetchall()[0][0]
print(u'')
print(u'------------統(tǒng)計------------')
print(u'%s %s~%s\n----------------------------\n總點擊次數(shù)/人:%s|%s\n年用戶:%s|%s\n月用戶:%s|%s' % (date, start_time, end_time, amount_all, amount_na, year_all, year_na, month_all, month_na))
print(u'')
input('Press any key to exit...')
以上程序均基于 Python 2.7+ 環(huán)境态坦。