基于zabbix的openstack-nova服务状态监控
基于zabbix的openstack-nova服务状态监控
对于Openstack运维人员来说,需要关注openstack中,计算节点的服务状态,当发现服务异常时,应当及时处理。
本文讨论基于nova python-api的服务状态监控
1 使用控制台获取相关服务状态
使用如下命令可以获取当前openstack的nova服务状态
nova service-list
+----+------------------+--------------+----------+---------+-------+----------------------------+-----------------+
| Id | Binary | Host | Zone | Status | State | Updated_at | Disabled Reason |
+----+------------------+--------------+----------+---------+-------+----------------------------+-----------------+
| 1 | nova-monitor | controler02 | internal | enabled | up | 2019-01-16T07:34:10.000000 | - |
| 2 | nova-monitor | controller01 | internal | enabled | up | 2019-01-16T07:34:09.000000 | - |
| 3 | nova-consoleauth | controler02 | internal | enabled | down | 2017-10-09T09:46:37.000000 | - |
| 4 | nova-consoleauth | controller01 | internal | enabled | up | 2019-01-16T07:34:10.000000 | - |
| 5 | nova-scheduler | controler02 | internal | enabled | down | 2017-10-09T09:46:37.000000 | - |
| 6 | nova-scheduler | controller01 | internal | enabled | up | 2019-01-16T07:34:05.000000 | - |
| 7 | nova-conductor | controler02 | internal | enabled | down | 2017-10-09T09:46:27.000000 | - |
| 10 | nova-conductor | controller01 | internal | enabled | up | 2019-01-16T07:34:10.000000 | - |
| 14 | nova-cert | controler02 | internal | enabled | down | 2017-10-09T09:46:37.000000 | - |
| 15 | nova-cert | controller01 | internal | enabled | up | 2019-01-16T07:34:06.000000 | - |
| 17 | nova-storage | computer05 | internal | enabled | up | 2019-01-16T07:34:12.000000 | - |
| 19 | nova-compute | computer04 | IMS | enabled | up | 2019-01-16T07:34:06.000000 | - |
| 21 | nova-compute | computer03 | IMS | enabled | up | 2019-01-16T07:34:08.000000 | - |
| 23 | nova-compute | computer05 | paas | enabled | up | 2019-01-16T07:34:06.000000 | - |
| 25 | nova-compute | computer06 | paas | enabled | up | 2019-01-16T07:34:04.000000 | - |
| 27 | nova-compute | computer07 | paas | enabled | up | 2019-01-16T07:34:03.000000 | - |
| 29 | nova-storage | computer06 | internal | enabled | up | 2019-01-16T07:34:05.000000 | - |
| 31 | nova-compute | computer08 | IMS | enabled | up | 2019-01-16T07:34:08.000000 | - |
| 32 | nova-storage | computer04 | internal | enabled | up | 2019-01-16T07:34:07.000000 | - |
| 34 | nova-storage | computer07 | internal | enabled | up | 2019-01-16T07:34:06.000000 | - |
| 36 | nova-storage | computer03 | internal | enabled | up | 2019-01-16T07:34:05.000000 | - |
| 38 | nova-storage | computer08 | internal | enabled | up | 2019-01-16T07:34:02.000000 | - |
+----+------------------+--------------+----------+---------+-------+----------------------------+-----------------+
可以看到,除了controler01和controler02因为主备的问题而有down的情况,其他节点应该均为enable和up的状态,我们监控的也就是这个状态。
2 使用zabbix自动发现获取相关服务信息
2.1 zabbix自动发现配置与Item
zabbix自动发现配置
openstack.service.discovery
zabbix自动发现Item
openstack.service.status[state, {#BINARY}, {#HOST}]
openstack.service.status[status, {#BINARY}, {#HOST}]
同时设置对应的Trigger, 当节点服务异常时就可以及时发出告警了。
2.2 zabbix-agent配置文件
UserParameter=openstack.service.discovery,python /etc/zabbix/zabbix_agentd.d/openstack-service.py --item discovery
UserParameter=openstack.service.status[*],python /etc/zabbix/zabbix_agentd.d/openstack-service.py --item $1 --binary $2 --host $3
2.3 zabbix相关监控与查询脚本
/etc/zabbix/zabbix_agentd.d/openstack-service.py
#!/usr/bin/python
# -*- coding: utf-8 -*-
#imports
import json
from optparse import OptionParser
from novaclient import client as noclient
from novaclient import utils
#getting the credentials
keystone = {}
keystone['os_username']='admin'
keystone['os_password']='keystone'
keystone['os_auth_url']='http://lb-vip:5000/v2.0/'
keystone['os_tenant_name']='admin'
nova_client = noclient.Client(2, keystone['os_username'], keystone['os_password'], keystone['os_tenant_name'], keystone['os_auth_url'])
def main():
options = parse_args()
if options.item=="discovery":
service_list()
else:
service_moniter(options)
#判断入参合法性
def parse_args():
parser = OptionParser()
valid_item = ["discovery", "status", "state", "disabled_reason"]
parser.add_option("", "--item", dest="item", help="", action="store", type="string", default=None)
parser.add_option("", "--binary", dest="binary", help="", action="store", type="string", default=None)
parser.add_option("", "--host", dest="host", help="", action="store", type="string", default=None)
(options, args) = parser.parse_args()
if options.item not in valid_item:
parser.error("Item has to be one of: "+", ".join(valid_item))
return options
#获取服务列表
def service_list():
r = {"data":[]}
services = nova_client.services.list()
for service in services:
service_info = service._info.copy()
#排除两个控制节点
if service_info["host"] == "controller01" or service_info["host"] == "controller02":
pass
else:
r['data'].append( {"{#BINARY}":service_info["binary"], "{#HOST}":service_info["host"], "{#ZONE}":service_info["zone"]} )
print(json.dumps(r, indent=2, sort_keys=True, encoding="utf-8"))
#获取对应服务的监控信息
def service_moniter(options):
services = nova_client.services.list(host = options.host, binary = options.binary)
for service in services:
service_info = service._info.copy()
print (service_info[options.item])
if __name__ == "__main__":
main()
3 配合grafana展示
使用此方法,在控制节点就可以及时监控相关服务的状态,同时由于采用了Openstack控制台相同的数据,结果上也更加准确。同时计算节点变动后,不需要修改任何监控参数,即可自动发现与调整。
grafana进行展示时,结果只能配置为数字,另外因为自动发现的服务数量较多,且经常因为缩容,扩容而变动,因此想到可以在grafana只展示所有服务的合成状态,专门供grafana展示使用。
zabbix增加监控项
openstack.service.all_status[state]
openstack.service.all_status[status]
zabbix修改后配置文件
UserParameter=openstack.service.discovery,python /etc/zabbix/zabbix_agentd.d/openstack-service.py --item discovery
UserParameter=openstack.service.status[*],python /etc/zabbix/zabbix_agentd.d/openstack-service.py --item $1 --binary $2 --host $3
UserParameter=openstack.service.all_status[*],python /etc/zabbix/zabbix_agentd.d/openstack-service.py --item all_status --item1 $1
修改后脚本
#!/usr/bin/python
# -*- coding: utf-8 -*-
#imports
import json
from optparse import OptionParser
from novaclient import client as noclient
from novaclient import utils
#getting the credentials
keystone = {}
keystone['os_username']='admin'
keystone['os_password']='keystone'
keystone['os_auth_url']='http://lb-vip:5000/v2.0/'
keystone['os_tenant_name']='admin'
nova_client = noclient.Client(2, keystone['os_username'], keystone['os_password'], keystone['os_tenant_name'], keystone['os_auth_url'])
def main():
options = parse_args()
if options.item=="discovery":
service_list()
elif options.item=="all_status":
service_status(options)
else:
service_moniter(options)
#判断入参合法性
def parse_args():
parser = OptionParser()
valid_item = ["discovery", "status", "state", "disabled_reason", "all_status"]
parser.add_option("", "--item", dest="item", help="", action="store", type="string", default=None)
parser.add_option("", "--binary", dest="binary", help="", action="store", type="string", default=None)
parser.add_option("", "--host", dest="host", help="", action="store", type="string", default=None)
parser.add_option("", "--item1", dest="item1", help="", action="store", type="string", default=None)
(options, args) = parser.parse_args()
if options.item not in valid_item:
parser.error("Item has to be one of: "+", ".join(valid_item))
return options
#获取服务列表
def service_list():
r = {"data":[]}
services = nova_client.services.list()
for service in services:
service_info = service._info.copy()
if service_info["host"] == "Controller02" or service_info["host"] == "Controller01":
pass
else:
r['data'].append( {"{#BINARY}":service_info["binary"], "{#HOST}":service_info["host"], "{#ZONE}":service_info["zone"]} )
print(json.dumps(r, indent=2, sort_keys=True, encoding="utf-8"))
#获取所有节点状态的合成
def service_status(options):
services = nova_client.services.list()
status = 0
for service in services:
service_info = service._info.copy()
if service_info["host"] == "Controller02" or service_info["host"] == "Controller01":
pass
else:
if service_info[options.item1]=="down" or service_info[options.item1]=="disabled":
status = 1
break
else:
pass
print(status)
#获取对应服务的监控信息
def service_moniter(options):
services = nova_client.services.list(host = options.host, binary = options.binary)
for service in services:
service_info = service._info.copy()
if service_info[options.item]=="up" or service_info[options.item]=="enabled":
print 0
elif service_info[options.item]=="down" or service_info[options.item]=="disabled":
print 1
else:
print -1
if __name__ == "__main__":
main()
参考资料
- nova 命令汇总四 ——计算相关命令,http://blog.51cto.com/13788458/2129157
- The novaclient Python API,https://docs.openstack.org/python-novaclient/latest/reference/api/index.html
- GitHub - larsks/openstack-api-samples,https://github.com/larsks/openstack-api-samples