nagios基于NSClient监控windows系统.docx

资源描述

nagios基于NSClient监控windows系统之前测试nagios监控Linux结果搞得笔者要吐血，太大意了，跟zabbix完全两个东西，但是笔者在测试windows的监控就如鱼得水多了，废话少说，直接上正题。 NSClient Linux的被监控端需要安装check_nrpe以及插件，但是windows直接就一个NSClient就够了，所有的监控，就用它一个就够了！版本的选用上面是笔者下的版本，4.3和4.4的后来下下来才发现…… 4.4的就不能用啊！也许是笔者打开的方式不对，但是4.4真的不能用所以暂时就用4.3的，各位在测试的时候最好要确定版本！安装一系列默认点下一步就行了…… Windows下的安装就是简单…… 在需要勾选的时候记得全部打钩安装完了之后，点进服务开启桌面交互这个东西计算机右键管理里面找然后设置自动（默认是自动启动）检测端口 12489和5666都要有才行笔者用4.4版本的做测试12489端口就没有起来，也许是check_nt没有打开的原因。 Check_nt 测试命令之前我产出里面监测linux用的nrpe，这里也是一样，还是先看看nt的插件和命令在不在这里很明显，命令和插件都有，那么笔者来从远端测试一下虚拟机下的C盘多大！这边显示快满了……笔者用笔记本开虚拟机，唉，进去看看是不是这样呢？测试成功！部署cfg配置文件及脚本 Nagios最头疼的就是脚本了，nt有几个常用的命令，我就复制粘贴到这里一下（to江涛：这里我只有这一部分是复制粘贴！），仅供参考。首先是check_nt的语法及几个固定参数语法: check_nt -H host -v variable [-p port] [-w warning] [-c critical] [-l params] [-d SHOWALL] [-u] [-t timeout] 选项: -h, --help 显示帮助 -V, --version 显示版本信息 -H, --hostname=HOST 被监控主机名称或IP -p, --port=INTEGER 监控端口（默认为1248，不过我的nagios安装后端口为12489） -s, --secret=<password> 要求的密码 -w, --warning=INTEGER 引发报警状态的阀值 -c, --critical=INTEGER引发严重错误状态的阀值 -t, --timeout=INTEGER 连接尝试超时秒数 (默认 -l, --params=<parameters> 参数传递给指定的监控项 (见下文) -d, --display={SHOWALL}显示选项(目前仅支持SHOWALL) -u, --unknown-timeout多长时间后返回 UNKNOWN) -v, --variable=STRING 监控参数监控参数: CLIENTVERSION = Get the NSClient version 如果用 -l <version> 指定版本号,则版本不同时会报警 CPULOAD = 最后X分钟CPU的平均负载。 -l 格式为：-l <minutes range>,<warning threshold>,<critical threshold>。时间范围不能超过 24*60分钟阀值是一个百分比，最多一次可以监控10个阀值，比如： ie: -l 60,90,95,120,90,95 UPTIME = 获得主机开机时间。无指定参数，也没有报警与错误阀值。 USEDDISKSPACE = 指定磁盘的使用率。只要求 -l 指定磁盘的驱动器号。用-w 指定报警阀值用-c指定错误阀值 MEMUSE = 内存使用率。用-w 指定报警阀值用-c指定错误阀值 SERVICESTATE = 监控一个或多个服务的状态。-l 格式为：-l <service1>,<service2>,<service3>,...。可以使用-d SHOWALL指定返回某个服务的状态。 PROCSTATE = 监控某个或某几个进程是否在运行。命令格式类似于SERVICESTATE COUNTER = 监控Windows NT/2000的任何性能计数器。 -l 格式为：-l "\\<performance object>\\counter","<描述>“ 描述参数是可选的仅用于要求输出浮点数的计数器。如果<描述>内容中不包含"%%"，那它就做为标题显示。例如："Paging file usage is %%.2f %%%%""%%.f %%%% paging file used." 　　　　对于计数器中包含“\","$"字符的最好都用上转值符号”\",以保证计数器名称正确。 INSTANCES = 监控Windows NT/2000的任何性能计数器对象。格式: check_nt -H <hostname> -p <port> -v INSTANCES -l <counter object> <counter object>为一项性能计数器 (比如： Process),如果计数器名称有两个单词，需要用括号括起来。返加的结果为用逗号分开的各计数器数据列表。这样做的目的在于不登录进系统就可以在命令行下直接运行perfmon中的指定计数器。它也可以用做脚本自动创建Nagios服务配置文件。笔者把自己做的测试配置文件拿出来仅供参考 define host{ host_name 115 alias 115-windows-test address 192.168.1.115 notification_interval 0 notification_options d,u,r max_check_attempts 1 process_perf_data 1 active_checks_enabled 1 passive_checks_enabled 0 notifications_enabled 1 check_period 24x7 notification_period 24x7 contact_groups admins } define service{ use linux-jim-server host_name 115 service_description PING check_command check_ping!100.0,20%!200.0,50% } define service{ use linux-jim-server host_name 115 service_description 检查NSClient++版本 check_command check_nt!CLIENTVERSION } define service{ use linux-jim-server host_name 115 service_description CPU负载 check_command check_nt!CPULOAD!-l 5,80,90 } define service{ use linux-jim-server host_name 115 service_description 内存使用 check_command check_nt!MEMUSE!-w 80 -c 90 } define service{ use linux-jim-server host_name 115 service_description C盘使用情况 check_command check_nt!USEDDISKSPACE!-l c -w 80 -c 90 } define service{ use linux-jim-server host_name 115 service_description explorer情况 check_command check_nt!PROCSTATE!-d SHOWALL -l explorer.exe } define service{ use linux-jim-server host_name 115 service_description W3SVC check_command check_nt!SERVICESTATE!-d SHOWALL -l W3SVC } define service{ use linux-jim-server host_name 115 service_description 运行时间 check_command check_nt!UPTIME } 在这里，我主机定义的host的IP为115 另外我调用的自己设置的服务linux-jim-server define service{ name linux-jim-server check_freshness 0 ; Default is to NOT check service 'freshness' notifications_enabled 1 ; Service notifications are enabled event_handler_enabled 1 ; Service event handler is enabled flap_detection_enabled 0 ; Flap detection is enabled failure_prediction_enabled 1 ; Failure prediction is enabled process_perf_data 1 ; Process performance data retain_status_information 1 ; Retain status information across program restarts retain_nonstatus_information 1 ; Retain non-status information across program restarts is_volatile 0 ; The service is not volatile check_period 24x7 ; The service can be checked at any time of the day max_check_attempts 1 ; Re-check the service up to 3 times in order to determine its final (hard) state normal_check_interval 1 ; Check the service every 10 minutes under normal conditions retry_check_interval 1 ; Re-check the service every two minutes until a hard state can be determined contact_groups admins ; Notifications get sent out to everyone in the 'admins' group notification_options w,u,c,r ; Send notifications about warning, unknown, critical, and recovery events notification_interval 0 ; Re-notify about service problems every hour notification_period 24x7 ; Notifications can be sent out at any time } 这里后面的标注是我直接从模板的cfg里面复制过来的，稍作修改，不是很喜欢它自己的模板，报警不能及时反映。最后检查nagios.cfg之后service nagios reload。部署结果在这里我检测的进程W3SVC是IIS的进程，但是笔者我的笔记本是在很垃圾，虚拟机带不动IIS啊！所以只能这样勉强搞一下了。报警实例笔者在这里只测试微信报警，因为微信稍微麻烦一点，笔者的微信号名字改不了，还是zabbix，不过结果还是满意的！那么，下一步，开始研究cacti吧！期待cacti的产出文档！

展开阅读全文