最近使用Ansible批量添加zabbix-agent時(shí)茫死,碰到不少問(wèn)題,在此記錄一下城舞。
Ansible版本:2.9.2
Python版本:2.7.5
pywinrm版本:0.4.3當(dāng)使用Ansible管理主機(jī)時(shí)報(bào)錯(cuò)
-- 首先要排除網(wǎng)絡(luò)問(wèn)題
-- 其次根據(jù)錯(cuò)誤提示來(lái)判斷問(wèn)題根源(可以在命令后添加-vvvvvv來(lái)查看詳細(xì)信息)
CentOS環(huán)境
報(bào)錯(cuò):Shared connection to xxx.xxx.xxx.xxx closed
xxx.xxx.xxx.xxx | FAILED! => {
"changed": false,
"module_stderr": "Shared connection to xxx.xxx.xxx.xxx closed.\r\n",
"module_stdout": "Traceback (most recent call last):\r\n File \"/root/.ansible/tmp/ansible-tmp-1657010661.53-56592330631735/command.py\", line 123, in <module>\r\n f.write(z.read('ansible_module_command.py'))\r\n File \"/data/app/python/Lib/zipfile.py\", line 1314, in read\r\n with self.open(name, \"r\", pwd) as fp:\r\n File \"/data/app/python/Lib/zipfile.py\", line 1425, in open\r\n return ZipExtFile(zef_file, mode, zinfo, zd, True)\r\n File \"/data/app/python/Lib/zipfile.py\", line 758, in __init__\r\n self._decompressor = _get_decompressor(self._compress_type)\r\n File \"/data/app/python/Lib/zipfile.py\", line 678, in _get_decompressor\r\n return zlib.decompressobj(-15)\r\nAttributeError: 'NoneType' object has no attribute 'decompressobj'\r\n",
"msg": "MODULE FAILURE",
"rc": 0
}
module_stdout:
Traceback (most recent call last):
File "/root/.ansible/tmp/ansible-tmp-1657786888.03-3276-66518254516251/AnsiballZ_ping.py", line 114, in
<module> _ansiballz_main()
File "/root/.ansible/tmp/ansible-tmp-1657786888.03-3276-66518254516251/AnsiballZ_ping.py", line 106, in
_ansiballz_main invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)
File "/root/.ansible/tmp/ansible-tmp-1657786888.03-3276-66518254516251/AnsiballZ_ping.py", line 41, in invoke_module
f.write(z.read('__main__.py'))
File "/data/app/python/Lib/zipfile.py", line 1314, in read
with self.open(name, "r", pwd) as fp:
File "/data/app/python/Lib/zipfile.py", line 1425, in open
return ZipExtFile(zef_file, mode, zinfo, zd, True)
File "/data/app/python/Lib/zipfile.py", line 758, in __init__
self._decompressor = _get_decompressor(self._compress_type)
File "/data/app/python/Lib/zipfile.py", line 678, in _get_decompressor
return zlib.decompressobj(-15)AttributeError: 'NoneType' object has no attribute 'decompressobj'",
原因:
??查看目標(biāo)主機(jī)抖坪,發(fā)現(xiàn)是升級(jí)過(guò)python版本(python2.7.5 -> python3.6.5),在編譯安裝時(shí)沒(méi)有安裝zlib-devel庫(kù)县昂。
解決方式:
??需要安裝zlib-devel庫(kù),并重新編譯安裝python3.6.5陷舅。
報(bào)錯(cuò):connect to host xxx.xxx.xxx.xxx port 22: Connection timed out
xxx.xxx.xxx.xxx | UNREACHABLE! => {
"changed": false,
"msg": "Failed to connect to the host via ssh: ssh: connect to host xxx.xxx.xxx.xxx port 22: Connection timed out\r\n",
"unreachable": true
}
原因:
??導(dǎo)致出現(xiàn)Connection timed out的基本都是網(wǎng)絡(luò)性問(wèn)題倒彰。
解決方式:
??排查網(wǎng)絡(luò)連通性以及端口是否開(kāi)啟,目標(biāo)主機(jī)有沒(méi)有開(kāi)啟防火墻莱睁。
報(bào)錯(cuò):Authentication failure
# Ansible 2.4.0
xxx.xxx.xxx.xxx | UNREACHABLE! => {
"changed": false,
"msg": "Authentication failure.",
"unreachable": true
}
# Ansible 2.9.2
xxx.xxx.xxx.xxx | UNREACHABLE! => {
"changed": false,
"msg": "Invalid/incorrect password: Permission denied, please try again.",
"unreachable": true
}
原因:
??密碼錯(cuò)誤
解決方法:
??更改正確密碼
報(bào)錯(cuò):安裝時(shí)報(bào)錯(cuò)缺少libpcre.so.1()庫(kù)
fatal: [xxx.xxx.xxx.xxx]: FAILED! => {
"changed": true,
"cmd": ["rpm", "-ivh", "/etc/zabbix/zabbix-agent-4.0.9-3.el7.x86_64.rpm"],
"delta": "0:00:00.086235", "end": "2022-07-15 05:38:46.825589",
"msg": "non-zero return code",
"rc": 1, "start": "2022-07-15 05:38:46.739354",
"stderr": "warning: /etc/zabbix/zabbix-agent-4.0.9-3.el7.x86_64.rpm: Header V4 RSA/SHA512 Signature, key ID a14fe591: NOKEY\nerror: Failed dependencies: libpcre.so.1()(64bit) is needed by zabbix-agent-4.0.9-3.el7.x86_64 systemd is needed by zabbix-agent-4.0.9-3.el7.x86_64",
"stderr_lines": [
"warning: /etc/zabbix/zabbix-agent-4.0.9-3.el7.x86_64.rpm: Header V4 RSA/SHA512 Signature, key ID a14fe591: NOKEY",
"error: Failed dependencies:",
"libpcre.so.1()(64bit) is needed by zabbix-agent-4.0.9-3.el7.x86_64",
"systemd is needed by zabbix-agent-4.0.9-3.el7.x86_64"
],
"stdout": "",
"stdout_lines": []
}
原因:
??1待讳、報(bào)錯(cuò)缺少libpcre.so.1()時(shí),首先檢查zabbix-agent版本是否與目標(biāo)主機(jī)的內(nèi)核版本一致仰剿;
??2创淡、如內(nèi)核版本與agent一致,則再查看目標(biāo)主機(jī)中是否缺少libpcre.so.1庫(kù)
解決方法:
??更換與目標(biāo)主機(jī)內(nèi)核版本一致的agent版本
報(bào)錯(cuò): Permission denied (publickey)
xxx.xxx.xxx.xxx | UNREACHABLE! => {
"changed": false,
"msg": "Failed to connect to the host via ssh: root@xxx.xxx.xxx.xxx: Permission denied (publickey).",
"unreachable": true
}
原因:
??目標(biāo)主機(jī)使用秘鑰登錄
解決方法:
??1南吮、將Ansible主機(jī)的ssh公鑰拷貝到目標(biāo)主機(jī)(推薦)琳彩;
??2、修改目標(biāo)主機(jī)的ssh登錄方式;
報(bào)錯(cuò):No space left on device
xxx.xxx.xxx.xxx | UNREACHABLE! => {
"changed": false,
"msg": "mkdir: cannot create directory ‘/root/.ansible’: No space left on device\n",
"unreachable": true
}
原因:
??目標(biāo)主機(jī)的磁盤空間不足
解決方法:
??清理磁盤空間
Windows環(huán)境
報(bào)錯(cuò):the specified credentials were rejected by the server
xxx.xxx.xxx.xxx | UNREACHABLE! => {
"changed": false,
"msg": "ntlm: the specified credentials were rejected by the server",
"unreachable": true
}
原因:
??目標(biāo)主機(jī)密碼不正確
解決方法:
??修改正確密碼
報(bào)錯(cuò):Failed to establish a new connection [Errno 111]
xxx.xxx.xxx.xxx10.2.50.73 | UNREACHABLE! => {
"changed": false,
"msg": "ntlm: HTTPConnectionPool(host='xxx.xxx.xxx.xxx', port=5985): Max retries exceeded with url: /wsman (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff2040e7f50>: Failed to establish a new connection: [Errno 111] Connection refused',))",
"unreachable": true
}
原因:
??1露乏、查看網(wǎng)絡(luò)是否互通碧浊;
??2、查看目標(biāo)主機(jī)沒(méi)有開(kāi)放5985遠(yuǎn)程端口瘟仿。
解決方法:
??查看目標(biāo)主機(jī)winrm遠(yuǎn)程服務(wù)是否開(kāi)啟 - powershell -> (Get-Service -Name winrm).status
????-- 如果沒(méi)有運(yùn)行箱锐,運(yùn)行1.ps1腳本開(kāi)啟winrm遠(yuǎn)程服務(wù)(需注意 - Powershell>=3.0, .NET>=4.0)
報(bào)錯(cuò):Failed to establish a new connection: [Errno 113]
xxx.xxx.xxx.xxx | UNREACHABLE! => {
"changed": false,
"msg": "ntlm: HTTPConnectionPool(host='xxx.xxx.xxx.xxx', port=5985): Max retries exceeded with url: /wsman (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f06d333e650>: Failed to establish a new connection: [Errno 113] No route to host',))",
"unreachable": true
}
原因:
??1、這臺(tái)主機(jī)是Linux系統(tǒng)劳较,但是使用windows方式連接驹止;
??2、這臺(tái)主機(jī)防火墻開(kāi)啟观蜗,No route to host報(bào)錯(cuò)是因?yàn)榉阑饓](méi)有開(kāi)放端口幢哨;
解決方法:
??在防火墻上開(kāi)啟對(duì)應(yīng)端口
報(bào)錯(cuò):Connection to xxx.xxx.xxx.xxx timed out
xxx.xxx.xxx.xxx | UNREACHABLE! => {
"changed": false,
"msg": "ntlm: HTTPConnectionPool(host='xxx.xxx.xxx.xxx', port=5985): Max retries exceeded with url: /wsman (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7ff2040cff90>, 'Connection to xxx.xxx.xxx.xxx timed out. (connect timeout=30)'))",
"unreachable": true
}
原因:
??1、網(wǎng)絡(luò)不通嫂便;
??2捞镰、端口超時(shí)
解決方法:
??1、開(kāi)通網(wǎng)絡(luò)互通策略毙替;
??2岸售、查看目標(biāo)主機(jī)是否開(kāi)放5985端口(winrm服務(wù));
報(bào)錯(cuò):winrm send_input failed
# Ansible 2.4.0
xxx.xxx.xxx.xxx | FAILED! => {
"changed": false,
"module_stderr": "An error occurred while creating the pipeline.\r\n + CategoryInfo : NotSpecified: (:) [], ParentContainsErrorRecordException\r\n + FullyQualifiedErrorId : RuntimeException\r\n \r\n",
"module_stdout": "",
"msg": "MODULE FAILURE",
"rc": 3221226519
}
# Ansible 2.9.2
[WARNING]: ERROR DURING WINRM SEND INPUT - attempting to recover: WinRMError WinRMError(u"\u7ba1\u9053\u5df2\u7ed3\u675f\u3002 (extended fault data:
{u'fault_subcode': 'w:InternalError', u'fault_code': 's:Receiver', u'wsmanfault_code': '109', 'transport_message': u'Bad HTTP response returned from server.
Code 500', 'http_status_code': 500})",)
xxx.xxx.xxx.xxx | FAILED! => {
"msg": "winrm send_input failed; \nstdout: \nstderr "
}
原因:
??這個(gè)原因沒(méi)有找到厂画,如果有找到原因和解決方法的小伙伴凸丸,希望能分享一下,不勝感激袱院。
解決方法:
報(bào)錯(cuò):There is not enough space on the disk
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: at System.Management.Automation.CommandProcessorBase.Complete()
xxx.xxx.xxx.xxx | FAILED! => {
"changed": false,
"msg": "internal error: failed to run exec_wrapper action module_powershell_wrapper: Exception calling \"CompileAssemblyFromDom\" with \"2\" argument(s): \"There is not enough space on the disk.\r\n\""
}
原因:
??目標(biāo)主機(jī)磁盤不足
解決方法:
??清理磁盤空間
報(bào)錯(cuò):The RPC server is unavailable
xxx.xxx.xxx.xxx | UNREACHABLE! => {
"changed": false,
"msg": "ntlm: The RPC server is unavailable. (extended fault data: {u'fault_subcode': 'w:InternalError', u'fault_code': 's:Receiver', u'wsmanfault_code': '2147944122', 'transport_message': u'Bad HTTP response returned from server. Code 500', 'http_status_code': 500})",
"unreachable": true
}
原因:
??目標(biāo)主機(jī)的RPC服務(wù)未開(kāi)啟
解決方法:
??開(kāi)啟RPC服務(wù)
????-- win+R -> 輸入services.msc -> 在服務(wù)中找到RPC Exdpoint Mapper -> 右鍵選擇屬性 -> 啟動(dòng)類型:自動(dòng)屎慢,服務(wù)狀態(tài)選擇啟動(dòng);