前言
Elasticsearch + Logstash + Kibana(ELK)是一套開源的日志管理方案彼棍,分析網(wǎng)站的訪問情況時我們一般會借助Google/百度/CNZZ等方式嵌入JS做數(shù)據(jù)統(tǒng)計片仿,但是當網(wǎng)站訪問異成薜撸或者被攻擊時我們需要在后臺分析如Nginx的具體日志速客,而Nginx日志分割/GoAccess/Awstats都是相對簡單的單節(jié)點解決方案预皇,針對分布式集群或者數(shù)據(jù)量級較大時會顯得心有余而力不足袋马,而ELK的出現(xiàn)可以使我們從容面對新的挑戰(zhàn)非凌。
- Logstash:負責日志的收集,處理和儲存
- Elasticsearch:負責日志檢索和分析
- Kibana:負責日志的可視化
ELK(Elasticsearch + Logstash + Kibana)
更新記錄
2019年07月02日 - 轉(zhuǎn)載同事整理的ELK Stack進行重構(gòu)
2015年08月31日 - 初稿
閱讀原文 - https://wsgzao.github.io/post/elk/
擴展閱讀
elastic - https://www.elastic.co/cn/
ELK - https://fainyang.github.io/post/elk/
ELK簡介
ELK 官方文檔 是一個分布式瀑焦、可擴展腌且、實時的搜索與數(shù)據(jù)分析引擎。目前我在工作中只用來收集 server 的 log, 開發(fā)鍋鍋們 debug 的好助手榛瓮。
安裝設置單節(jié)點 ELK
如果你想快速的搭建單節(jié)點 ELK, 那么使用 docker 方式肯定是你的最佳選擇铺董。使用三合一的鏡像,文檔詳情
注意:安裝完 docker, 記得設置 mmap counts
大小至少 262144
什么是 mmap
# 設置 mmap 命令
# 臨時添加法
sysctl -w vm.max_map_count=262144
# 寫入 sysctl.conf 文件里
vim /etc/sysctl.conf
vm.max_map_count=262144
# 保存好文件執(zhí)行以下命令
sysctl -p
# 安裝 docker
sudo yum install -y yum-utils device-mapper-persistent-data lvm2
sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
sudo yum install -y docker-ce
sudo systemctl start docker
單節(jié)點的機器禀晓,不必暴露 9200(Elasticsearch JSON interface) 和 9300(Elasticsearch transport interface) 端口精续。
如果想在 docker 上暴露端口,用 -p 如果沒有填寫監(jiān)聽的地址粹懒,默認是 0.0.0.0 所有的網(wǎng)卡重付。建議還是寫明確監(jiān)聽的地址,安全性更好凫乖。
-p 監(jiān)聽的IP:宿主機端口:容器內(nèi)的端口
-p 192.168.10.10:9300:9300
命令行啟動一個 ELK
sudo docker run -p 5601:5601 -p 5044:5044 \
-v /data/elk-data:/var/lib/elasticsearch \
-v /data/elk/logstash:/etc/logstash/conf.d \
-it -e TZ="Asia/Singapore" -e ES_HEAP_SIZE="20g" \
-e LS_HEAP_SIZE="10g" --name elk-ubuntu sebp/elk
將配置和數(shù)據(jù)掛載出來确垫,即使 docker container 出現(xiàn)了問題∶毖浚可以立即銷毀再重啟一個森爽,服務受影響的時間很短。
# 注意掛載出來的文件夾的權(quán)限問題
chmod 755 /data/elk-data
chmod 755 /data/elk/logstash
chown -R root:root /data
-v /data/elk-data:/var/lib/elasticsearch # 將 elasticsearch 存儲的數(shù)據(jù)掛載出來嚣镜,數(shù)據(jù)持久化爬迟。
-v /data/elk/logstash:/etc/logstash/conf.d # 將 logstash 的配置文件掛載出來,方便在宿主機上修改菊匿。
elasticsearch 重要的參數(shù)調(diào)優(yōu)
- ES_HEAP_SIZE Elasticsearch will assign the entire heap specified in jvm.options via the Xms (minimum heap size) and Xmx (maximum heap size) settings. You should set these two settings to be equal to each other. Set Xmx and Xms to no more than 50% of your physical RAM.the exact threshold varies but is near 32 GB. the exact threshold varies but 26 GB is safe on most systems, but can be as large as 30 GB on some systems.
利弊關(guān)系: The more heap available to Elasticsearch, the more memory it can use for its internal caches, but the less memory it leaves available for the operating system to use for the filesystem cache. Also, larger heaps can cause longer garbage collection pauses. - LS_HEAP_SIZE 如果 heap size 過低付呕,會導致 CPU 利用率到達瓶頸,造成 JVM 不斷的回收垃圾跌捆。 不能設置 heap size 超過物理內(nèi)存徽职。 至少留 1G 給操作系統(tǒng)和其他的進程。
只需要配置logstash
接下來佩厚,我們再來看一看 logstash.conf 記得看注釋
參考鏈接:
input {
beats {
port => 5044
#ssl => true
#ssl_certificate => "/etc/logstash/logstash.crt"
#ssl_key => "/etc/logstash/logstash.key"
# 1. SSL詳情可參考
}
}
# filter 模塊主要是數(shù)據(jù)預處理姆钉,提取一些信息,方便 elasticsearch 好歸類存儲。
# 2. grok 正則捕獲
# 3. grok插件語法介紹
# 4. logstash 配置語法
# 5. grok 內(nèi)置 pattern
filter {
grok {
match => {"message" => "%{EXIM_DATE:timestamp}\|%{LOGLEVEL:log_level}\|%{INT:pid}\|%{GREEDYDATA}"}
# message 字段是 log 的內(nèi)容潮瓶,例如 2018-12-11 23:46:47.051|DEBUG|3491|helper.py:85|helper._save_to_cache|shop_session
# 在這里我們提取出了 timestamp log_level pid陶冷,grok 有內(nèi)置定義好的patterns: EXIM_DATE, EXIM_DATE, INT
# GREEDYDATA 貪婪數(shù)據(jù),代表任意字符都可以匹配
}
# 我們在 filebeat 里面添加了這個字段[fields][function]的話毯辅,那就會執(zhí)行對應的 match 規(guī)則去匹配 path
# source 字段就是 log 的來源路徑埂伦,例如 /var/log/nginx/feiyang233.club.access.log
# match 后我們就可以得到 path=feiyang233.club.access
if [fields][function]=="nginx" {
grok {
match => {"source" => "/var/log/nginx/%{GREEDYDATA:path}.log%{GREEDYDATA}"}
}
}
# 例如 ims 日志來源是 /var/log/ims_logic/debug.log
# match 后我們就可以得到 path=ims_logic
else if [fields][function]=="ims" {
grok {
match => {"source" => "/var/log/%{GREEDYDATA:path}/%{GREEDYDATA}"}
}
}
else {
grok {
match => {"source" => "/var/log/app/%{GREEDYDATA:path}/%{GREEDYDATA}"}
}
}
# filebeat 有定義 [fields][function] 時,我們就添加上這個字段思恐,例如 QA
if [fields][function] {
mutate {
add_field => {
"function" => "%{[fields][function]}"
}
}
}
# 因為線上的機器更多沾谜,線上的我默認不在 filebeat 添加 function,所以 else 我就添加上 live
else {
mutate {
add_field => {
"function" => "live"
}
}
}
# 在之前 filter message 時胀莹,我們得到了 timestamp基跑,這里我們修改一下格式,添加上時區(qū)描焰。
date {
match => ["timestamp" , "yyyy-MM-dd HH:mm:ss Z"]
target => "@timestamp"
timezone => "Asia/Singapore"
}
# 將之前獲得的 path 替換其中的 / 替換為 - , 因為 elasticsearch index name 有要求
# 例如 feiyang/test feiyang_test
mutate {
gsub => ["path","/","-"]
add_field => {"host_ip" => "%{[fields][host]}"}
remove_field => ["tags","@version","offset","beat","fields","exim_year","exim_month","exim_day","exim_time","timestamp"]
}
# remove_field 去掉一些多余的字段
}
# 單節(jié)點 output 就在本機媳否,也不需要 SSL, 但 index 的命名規(guī)則還是需要非常的注意
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "sg-%{function}-%{path}-%{+xxxx.ww}"
# sg-nginx-feiyang233.club.access-2019.13 ww代表周數(shù)
}
}
最終的流程圖如下所示
index 的規(guī)則 參考鏈接
- Lowercase only
- Cannot include , /, *, ?, ", <, >, |, ` ` (space character), ,, #
- Indices prior to 7.0 could contain a colon (:), but that’s been deprecated and won’t be supported in 7.0+
- Cannot start with -, _, +
- Cannot be . or ..
- Cannot be longer than 255 bytes (note it is bytes, so multi-byte characters will count towards the 255 limit faster)
filebeat 配置
在 client 端,我們需要安裝并且配置 filebeat 請參考
Filebeat 模塊與配置
配置文件 filebeat.yml
filebeat.inputs:
- type: log
enabled: true
paths: # 需要收集的日志
- /var/log/app/** ## ** need high versiob filebeat can support recursive
fields: #需要添加的字段
host: "{{inventory_hostname}}"
function: "xxx"
multiline: # 多行匹配
match: after
negate: true # pay attention the format
pattern: '^\[[0-9]{4}-[0-9]{2}-[0-9]{2}' #\[
ignore_older: 24h
clean_inactive: 72h
output.logstash:
hosts: ["{{elk_server}}:25044"]
# ssl:
# certificate_authorities: ["/etc/filebeat/logstash.crt"]
批量部署 filebeat.yml 最好使用 ansible
---
- hosts: all
become: yes
gather_facts: yes
tasks:
- name: stop filebeat
service:
name: filebeat
state: stopped
enabled: yes
- name: upload filebeat.yml
template:
src: filebeat.yml
dest: /etc/filebeat/filebeat.yml
owner: root
group: root
mode: 0644
- name: remove
file: #delete all files in this directory
path: /var/lib/filebeat/registry
state: absent
- name: restart filebeat
service:
name: filebeat
state: restarted
enabled: yes
查看 filebeat output
首先需要修改配置栈顷,將 filebeat 輸出到本地的文件,輸出的格式為 json.
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/app/**
fields:
host: "x.x.x.x"
region: "sg"
multiline:
match: after
negate: true
pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
ignore_older: 24h
clean_inactive: 72h
output.file:
path: "/home/feiyang"
filename: feiyang.json
通過上述的配置嵌巷,我們就可以在路徑 /home/feiyang 下得到輸出結(jié)果文件 feiyang.json 在這里需要注意的是萄凤,不同版本的 filebeat 輸出結(jié)果的格式會有所不同,這會給 logstash 解析過濾造成一點點困難搪哪。下面舉例說明 6.x 和 7.x filebeat 輸出結(jié)果的不同
{
"@timestamp": "2019-06-27T15:53:27.682Z",
"@metadata": {
"beat": "filebeat",
"type": "doc",
"version": "6.4.2"
},
"fields": {
"host": "x.x.x.x",
"region": "sg"
},
"host": {
"name": "x.x.x.x"
},
"beat": {
"name": "x.x.x.x",
"hostname": "feiyang-localhost",
"version": "6.4.2"
},
"offset": 1567983499,
"message": "[2019-06-27T22:53:25.756327232][Info][@http.go.177] [48552188]request",
"source": "/var/log/feiyang/scripts/all.log"
}
6.4 與 7.2 還是有很大的差異靡努,在結(jié)構(gòu)上。
{
"@timestamp": "2019-06-27T15:41:42.991Z",
"@metadata": {
"beat": "filebeat",
"type": "_doc",
"version": "7.2.0"
},
"agent": {
"id": "3a38567b-e6c3-4b5a-a420-f0dee3a3bec8",
"version": "7.2.0",
"type": "filebeat",
"ephemeral_id": "b7e3c0b7-b460-4e43-a9af-6d36c25eece7",
"hostname": "feiyang-localhost"
},
"log": {
"offset": 69132192,
"file": {
"path": "/var/log/app/feiyang/scripts/info.log"
}
},
"message": "2019-06-27 22:41:25.312|WARNING|14186|Option|data|unrecognized|fields=set([u'id'])",
"input": {
"type": "log"
},
"fields": {
"region": "sg",
"host": "x.x.x.x"
},
"ecs": {
"version": "1.0.0"
},
"host": {
"name": "feiyang-localhost"
}
}
Kibana 簡單的使用
在搭建 ELK 時晓折,暴露出來的 5601 端口就是 Kibana 的服務惑朦。
訪問 http://your_elk_ip:5601
安裝設置集群 ELK 版本 6.7
ELK 安裝文檔集群主要是高可用,多節(jié)點的 Elasticsearch 還可以擴容漓概。本文中用的官方鏡像 The base image is centos:7
Elasticsearch 多節(jié)點搭建
# 掛載出來的文件夾權(quán)限非常的重要
mkdir -p /data/elk-data && chmod 755 /data/elk-data
chown -R root:root /data
docker run -p WAN_IP:9200:9200 -p 10.66.236.116:9300:9300 \
-v /data/elk-data:/usr/share/elasticsearch/data \
--name feiy_elk \
docker.elastic.co/elasticsearch/elasticsearch:6.7.0
接下來是修改配置文件 elasticsearch.yml
# Master 節(jié)點 node-1
# 進入容器 docker exec -it [container_id] bash
# docker exec -it 70ada825aae1 bash
# vi /usr/share/elasticsearch/config/elasticsearch.yml
cluster.name: "feiy_elk"
network.host: 0.0.0.0
node.master: true
node.data: true
node.name: node-1
network.publish_host: 10.66.236.116
discovery.zen.ping.unicast.hosts: ["10.66.236.116:9300","10.66.236.118:9300","10.66.236.115:9300"]
# exit
# docker restart 70ada825aae1
# slave 節(jié)點 node-2
# 進入容器 docker exec -it [container_id] bash
# vi /usr/share/elasticsearch/config/elasticsearch.yml
cluster.name: "feiy_elk"
network.host: "0.0.0.0"
node.name: node-2
node.data: true
network.publish_host: 10.66.236.118
discovery.zen.ping.unicast.hosts: ["10.66.236.116:9300","10.66.236.118:9300","10.66.236.115:9300"]
# exit
# docker restart 70ada825aae1
# slave 節(jié)點 node-3
# 進入容器 docker exec -it [container_id] bash
# vi /usr/share/elasticsearch/config/elasticsearch.yml
cluster.name: "feiy_elk"
network.host: "0.0.0.0"
node.name: node-3
node.data: true
network.publish_host: 10.66.236.115
discovery.zen.ping.unicast.hosts: ["10.66.236.116:9300","10.66.236.118:9300","10.66.236.115:9300"]
# exit
# docker restart 70ada825aae1
檢查集群節(jié)點個數(shù)漾月,狀態(tài)等
# curl http://wan_ip:9200/_cluster/health?pretty
{
"cluster_name" : "feiy_elk",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 3,
"active_primary_shards" : 9,
"active_shards" : 18,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}
最終結(jié)果圖在 kibana 上可以看到集群狀態(tài)
[圖片上傳失敗...(image-812ccc-1562318412515)]
Kibana 搭建
# docker run --link YOUR_ELASTICSEARCH_CONTAINER_NAME_OR_ID:elasticsearch -p 5601:5601 {docker-repo}:{version}
docker run -p 外網(wǎng)IP:5601:5601 --link elasticsearch容器的ID:elasticsearch docker.elastic.co/kibana/kibana:6.7.0
# 注意的是 --link 官方其實并不推薦的,推薦的是 use user-defined networks https://docs.docker.com/network/links/
# 測試不用 --link 也可以通胃珍。直接用容器的 IP
docker run -p 外網(wǎng)IP:5601:5601 docker.elastic.co/kibana/kibana:6.7.0
we recommend that you use user-defined networks to facilitate communication between two containers instead of using --link
# vi /usr/share/kibana/config/kibana.yml
# 需要把 hosts IP 改為 elasticsearch 容器的 IP
# 我這里 elasticsearch 容器的 IP 是 172.17.0.2
# 如何查看 docker inspect elasticsearch_ID
server.name: kibana
server.host: "0.0.0.0"
elasticsearch.hosts: [ "http://172.17.0.2:9200" ]
xpack.monitoring.ui.container.elasticsearch.enabled: true
# 退出容器并重啟
docker restart [container_ID]
Logstash 搭建
官方安裝文檔 Logstash
# docker -d 以后臺的方式啟動容器 --name 參數(shù)顯式地為容器命名
docker run -p 5044:5044 -d --name test_logstash docker.elastic.co/logstash/logstash:6.7.0
# 也可以指定網(wǎng)卡梁肿,監(jiān)聽在內(nèi)網(wǎng)或者外網(wǎng) 監(jiān)聽在內(nèi)網(wǎng) 192.168.1.2
docker run -p 192.168.1.2:5044:5044 -d --name test_logstash docker.elastic.co/logstash/logstash:6.7.0
# vi /usr/share/logstash/pipeline/logstash.conf
# 配置詳情請參考下面的鏈接,記得 output hosts IP 指向 Elasticsearch 的 IP
# Elasticsearch 的默認端口是9200,在下面的配置中可以省略觅彰。
hosts => ["IP Address 1:port1", "IP Address 2:port2", "IP Address 3"]
logstash 過濾規(guī)則 見上文的配置和 grok 語法規(guī)則
# vi /usr/share/logstash/config/logstash.yml
# 需要把 url 改為 elasticsearch master 節(jié)點的 IP
http.host: "0.0.0.0"
xpack.monitoring.elasticsearch.url: http://elasticsearch_master_IP:9200
node.name: "feiy"
pipeline.workers: 24 # same with cores
改完配置 exit 從容器里退出到宿主機吩蔑,然后重啟這個容器。更多配置詳情填抬,參見官方文檔
# 如何查看 container_ID
docker ps -a
docker restart [container_ID]
容災測試
我們把當前的 master 節(jié)點 node-1 關(guān)機烛芬,通過 kibana 看看集群的狀態(tài)是怎樣變化的。
當前集群的狀態(tài)變成了黃色,因為還有 3 個 Unassigned Shards赘娄。顏色含義請參考官方文檔仆潮,再過一會發(fā)現(xiàn)集群狀態(tài)變成了綠色。
kibana 控制臺 Console
Quick intro to the UI
The Console UI is split into two panes: an editor pane (left) and a response pane (right). Use the editor to type requests and submit them to Elasticsearch. The results will be displayed in the response pane on the right side.
Console understands requests in a compact format, similar to cURL:
# index a doc
PUT index/type/1
{
"body": "here"
}
# and get it ...
GET index/type/1
While typing a request, Console will make suggestions which you can then accept by hitting Enter/Tab. These suggestions are made based on the request structure as well as your indices and types.
A few quick tips, while I have your attention
- Submit requests to ES using the green triangle button.
- Use the wrench menu for other useful things.
- You can paste requests in cURL format and they will be translated to the Console syntax.
- You can resize the editor and output panes by dragging the separator between them.
- Study the keyboard shortcuts under the Help button. Good stuff in there!
Console 常用的命令
Kibana 控制臺
ELK技術(shù)棧中的那些查詢語法
GET _search
{
"query": {
"match_all": {}
}
}
GET /_cat/health?v
GET /_cat/nodes?v
GET /_cluster/allocation/explain
GET /_cluster/state
GET /_cat/thread_pool?v
GET /_cat/indices?health=red&v
GET /_cat/indices?v
#將當前所有的 index 的 replicas 設置為 0
PUT /*/_settings
{
"index" : {
"number_of_replicas" : 0,
"refresh_interval": "30s"
}
}
GET /_template
# 在單節(jié)點的時候擅憔,不需要備份鸵闪,所以將 replicas 設置為 0
PUT _template/app-logstash
{
"index_patterns": ["app-*"],
"settings": {
"number_of_shards": 3,
"number_of_replicas": 0,
"refresh_interval": "30s"
}
}
Elasticsearch 數(shù)據(jù)遷移
Elasticsearch 數(shù)據(jù)遷移官方文檔感覺不是很詳細。容器化的數(shù)據(jù)遷移暑诸,我太菜用 reindex 失敗了蚌讼,snapshot 也涼涼。
最后是用一個開源工具 An Elasticsearch Migration Tool 進行數(shù)據(jù)遷移的个榕。
wget https://github.com/medcl/esm-abandoned/releases/download/v0.4.2/linux64.tar.gz
tar -xzvf linux64.tar.gz
./esm -s http://127.0.0.1:9200 -d http://192.168.21.55:9200 -x index_name -w=5 -b=10 -c 10000 --copy_settings --copy_mappings --force --refresh
Nginx 代理轉(zhuǎn)發(fā)
因為有時候 docker 重啟篡石,iptables restart 也會刷新,所以導致了我們的限制規(guī)則會被更改西采,出現(xiàn)安全問題凰萨。這是由于 docker 的網(wǎng)絡隔離基于 iptable 實現(xiàn)造成的問題。為了避免這個安全問題械馆,我們可以在啟動 docker 時胖眷,就只監(jiān)聽在內(nèi)網(wǎng),或者本地 127.0.0.1 然后通過 nginx 轉(zhuǎn)發(fā)霹崎。
# cat kibana.conf
server {
listen 25601;
server_name x.x.x.x;
access_log /var/log/nginx/kibana.access.log;
error_log /var/log/nginx/kibana.error.log;
location / {
allow x.x.x.x;
allow x.x.x.x;
deny all;
proxy_http_version 1.1;
proxy_buffer_size 64k;
proxy_buffers 32 32k;
proxy_busy_buffers_size 128k;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_pass http://127.0.0.1:5601;
}
}
! 這里需要注意的是珊搀, iptable filter 表 INPUT 鏈 有沒有阻擋 172.17.0.0/16 docker 默認的網(wǎng)段。是否阻擋了 25601 這個端口尾菇。
踩過的坑
- iptables 防不住境析。需要看上一篇博客里的 iptable 問題∨晌埽或者監(jiān)聽在內(nèi)網(wǎng)劳淆,用 Nginx 代理轉(zhuǎn)發(fā)。
- elk 網(wǎng)絡問題
- elk node
-
discovery.type=single-node
在測試單點時可用默赂,搭建集群時不能設置這個環(huán)境變量沛鸵,詳情見官方文檔 - ELK的一次吞吐量優(yōu)化
- filebeat 版本過低導致 recursive glob patterns ** 不可用
用 ansible 升級 filebeat
---
- hosts: all
become: yes
gather_facts: yes
tasks:
- name: upload filebeat.repo
copy:
src: elasticsearch.repo
dest: /etc/yum.repos.d/elasticsearch.repo
owner: root
group: root
mode: 0644
- name: install the latest version of filebeat
yum:
name: filebeat
state: latest
- name: restart filebeat
service:
name: filebeat
state: restarted
enabled: yes
# elasticsearch.repo
[elasticsearch-6.x]
name=Elasticsearch repository for 6.x packages
baseurl=https://artifacts.elastic.co/packages/6.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
- filebeat 7.x 與 6.x 不兼容問題. 關(guān)鍵字變化很大, 比如說 "sorce" 變?yōu)榱?[log][file][path]
參考文章
- 騰訊云Elasticsearch Service 這個騰訊云的專欄非常的不錯,請您一定要點開看一眼缆八,總有你想要的谒臼。
- ELK重難點總結(jié)和整體優(yōu)化配置