Tumblelog by Soup.io
Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

check-mk: monitor switches for GBit links

For one of our customers we are using the Open Monitoring Distribution which includes Check_MK as monitoring system. We’re monitoring the switches (Cisco) via SNMP. The switches as well as all the servers support GBit connections, though there are some systems in the wild which are still operating at 100MBit (or even worse on 10MBit). Recently there have been some performance issues related to network access. To make sure it’s not the fault of a server or a service we decided to monitor the switch ports for their network speed. By default we assume all ports to be running at GBit speed. This can be configured either manually via:

cat etc/check_mk/conf.d/wato/rules.mk
[...]
checkgroup_parameters.setdefault('if', [])

checkgroup_parameters['if'] = [
  ( {'speed': 1000000000}, [], ['switch1', 'switch2', 'switch3', 'switch4'], ALL_SERVICES, {'comment': u'GBit links should be used as default on all switches'} ),
] + checkgroup_parameters['if']

or by visting Check_MK’s admin web-interface at ‘WATO Configuration’ -> ‘Host & Service Parameters’ -> ‘Parameters for Inventorized Checks’ -> ‘Networking’ -> ‘Network interfaces and switch ports’ and creating a rule for the ‘Explicit hosts’ switch1, switch2, etc and setting ‘Operating speed’ to ‘1 GBit/s’ there.

So far so straight forward and this works fine. Thanks to this setup we could identify several systems which used 100Mbit and 10MBit links. Definitely something to investigate on the according systems with their auto-negotiation configuration. But to avoid flooding the monitoring system and its notifications we want to explicitly ignore those systems in the monitoring setup until those issues have been resolved.

First step: identify the checks and their format by either invoking `cmk -D switch2` or looking at var/check_mk/autochecks/switch2.mk:

OMD[synpros]:~$ cat var/check_mk/autochecks/switch2.mk
[
  ("switch2", "cisco_cpu", None, cisco_cpu_default_levels),
  ("switch2", "cisco_fan", 'Switch#1, Fan#1', None),
  ("switch2", "cisco_mem", 'Driver text', cisco_mem_default_levels),
  ("switch2", "cisco_mem", 'I/O', cisco_mem_default_levels),
  ("switch2", "cisco_mem", 'Processor', cisco_mem_default_levels),
  ("switch2", "cisco_temp_perf", 'SW#1, Sensor#1, GREEN', None),
  ("switch2", "if64", '10101', {'state': ['1'], 'speed': 1000000000}),
  ("switch2", "if64", '10102', {'state': ['1'], 'speed': 1000000000}),
  ("switch2", "if64", '10103', {'state': ['1'], 'speed': 1000000000}),
  [...]
  ("switch2", "snmp_info", None, None),
  ("switch2", "snmp_uptime", None, {}),
]
OMD[synpros]:~$

Second step: translate this into the according format for usage in etc/check_mk/main.mk:

checks = [
  ( 'switch2', 'if64', '10105', {'state': ['1'], 'errors': (0.01, 0.1), 'speed': None}), # MAC: 00:42:de:ad:be:af,  10MBit
  ( 'switch2', 'if64', '10107', {'state': ['1'], 'errors': (0.01, 0.1), 'speed': None}), # MAC: 00:23:de:ad:be:af, 100MBit
  ( 'switch2', 'if64', '10139', {'state': ['1'], 'errors': (0.01, 0.1), 'speed': None}), # MAC: 00:42:de:ad:be:af, 100MBit
  [...]
]

Using this configuration we ignore the operation speed on ports 10105, 10107 and 10139 of switch2 using the the if64 check. We kept the state setting untouched where sensible (‘1′ means that the expected operational status of the interface is to be ‘up’). The errors settings specifies the error rates in percent for warnings (0.01%) and critical (0.1%). For further details refer to the online documentation or invoke ‘cmk -M if64′.

Final step: after modifying the checks’ configuration make sure to run `cmk -IIu switch2 ; cmk -R` to renew the inventory for switch2 and apply the changes. Do not forget to verify the running configuration by invoking ‘cmk -D switch2′:

Screenshot of 'cmk -D switch2' execution

Don't be the product, buy the product!

Schweinderl