Zabbix-server在監(jiān)控的過程中痒给,發(fā)現Web上報錯提示zabbix server與agent之間已經失聯超過5分鐘了,為了找到問題出現的根本原因璃饱,在troubleshouting的時候与斤,應該首先去查看服務對應的日志。首先應該查看server端的日志有無錯誤消息,通過日志查看撩穿,發(fā)現server端運行正常磷支,那么問題很可能就出現在了客戶端上,來到這臺agent上查看服務日志食寡。
1.查看日志
[root@iZbp11rfoyeescusr9ha9qZ tmp]# find / -name *agentd.log
/var/log/zabbix/zabbix_agentd.log
[root@iZbp11rfoyeescusr9ha9qZ tmp]# vim /var/log/zabbix/zabbix_agentd.log
23904:20170310:092458.633 Starting Zabbix Agent [Zabbix server]. Zabbix 2.2.16 (revision 64243).
23904:20170310:092458.634 using configuration file: /etc/zabbix_agentd.conf
23915:20170310:092458.636 agent #1 started [listener #1]
23918:20170310:092458.636 agent #3 started [listener #3]
23917:20170310:092458.636 agent #2 started [listener #2]
23914:20170310:092458.636 agent #0 started [collector]
23919:20170310:092458.637 agent #4 started [active checks #1]
23919:20170310:092458.637 active check configuration update from [127.0.0.1:10051] started to fail (cannot connect to [[127.0.0.1]:10051]: [111] Connection refused)
23919:20170310:102358.983 active check configuration update from [127.0.0.1:10051] is working again
23919:20170310:102358.983 no active checks on server [127.0.0.1:10051]: host [Zabbix server] not monitored
23919:20170310:102559.020 no active checks on server [127.0.0.1:10051]: host [Zabbix server] not monitored
23919:20170310:102759.073 no active checks on server [127.0.0.1:10051]: host [Zabbix server] not monitored
23919:20170310:102959.109 no active checks on server [127.0.0.1:10051]: host [Zabbix server] not monitored
23904:20170310:103011.545 Got signal [signal:15(SIGTERM),sender_pid:26144,sender_uid:0,reason:0]. Exiting ...
23904:20170310:103011.547 Zabbix Agent stopped. Zabbix 2.2.16 (revision 64243).
26157:20170310:103011.659 Starting Zabbix Agent [Zabbix server]. Zabbix 2.2.16 (revision 64243).
26157:20170310:103011.659 using configuration file: /etc/zabbix_agentd.conf
26168:20170310:103011.663 agent #1 started [listener #1]
26172:20170310:103011.663 agent #4 started [active checks #1]
26171:20170310:103011.663 agent #3 started [listener #3]
26170:20170310:103011.663 agent #2 started [listener #2]
26166:20170310:103011.664 agent #0 started [collector]
26172:20170310:103011.667 no active checks on server [127.0.0.1:10051]: host [Zabbix server] not monitored
通過查看日志雾狈,發(fā)現23919:20170310:092458.637 這條日志記錄告訴我們說,主動從[127.0.0.1:10051]檢查配置更新失敗抵皱,agent與server之間的連接失敗善榛。
2.修改agent的配置文件,將ServerActive的地址改為zabbix-server的IP地址
[root@iZbp11rfoyeescusr9ha9qZ tmp]# vim /etc/zabbix/zabbix_agentd.conf
122 ServerActive=121.43.161.35
3.重啟zabbix-agent服務叨叙,使得配置生效
[root@iZbp11rfoyeescusr9ha9qZ tmp]# /etc/init.d/zabbix-agentd restart
Shutting down Zabbix agent: [ OK ]
Starting Zabbix agent: [ OK ]
4.瀏覽器刷新頁面锭弊,發(fā)現server端已經重新監(jiān)控到agent運行狀況的數據了
Tips:
- 在troubleshouting查看服務日志的時候,可以將注意力集中在有顯示“fail”或者“Error”這類失敗的關鍵詞上擂错,這樣可以快速排錯,找到問題的原因樱蛤,而不必通篇閱讀所有的日志钮呀,極大的提高效率。
- 作為運維工程師昨凡,腦袋儲存的信息可能比較多爽醋、雜,時而出現忘記了某個服務便脊、配置文件的絕對路徑蚂四,如果記得文件或者目錄的完整名,可以使用“l(fā)ocate+文件名”命令來定位文件的絕對路徑哪痰,若是連文件名也記不大清了遂赠,沒關系,還可以用Linux平臺強大的搜索命令find晌杰,以全局查找的方式跷睦,通過星號來匹配到想要查找的文件的絕對路徑,例如:find / -name *agentd.conf (從/目錄開始肋演,全局搜索以agentd結尾的.conf文件)抑诸。這些都是作為一名運維工程師應該具備的基本技能,而不必通過死記硬背的方式來記憶所有文件的絕對路徑爹殊。