{"id":1091,"date":"2010-04-26T22:18:23","date_gmt":"2010-04-26T21:18:23","guid":{"rendered":"http:\/\/www.zarrelli.org\/blog\/?p=1091"},"modified":"2010-04-26T22:18:23","modified_gmt":"2010-04-26T21:18:23","slug":"nagios-notification-escalations-made-easy","status":"publish","type":"post","link":"https:\/\/www.zarrelli.org\/blog\/nagios-notification-escalations-made-easy\/","title":{"rendered":"Nagios notification escalations made easy"},"content":{"rendered":"<p style=\"text-align: justify;\">Since someone asked in the nagios users mailing list for an help to understand how notification escalations work in Nagios, I wrote an easy and short workflow to explain the logic underneath.<\/p>\n<p style=\"text-align: justify;\">Notification escalations are a means to ensure that someone will ne notified, eventually. If a contact doesn&#8217;t acknowledge a problem, we can escalate the nofications to someone else, changing the way we deliver the messages if we think it&#8217;s a good strategy.<\/p>\n<p style=\"text-align: justify;\">In this post we won&#8217;t discuss of the syntax or the options regarding the escalations, we will have a look at the logic of the escalations, at how they take place and how they work.<\/p>\n<p style=\"text-align: justify;\">Here the workflow for check\/notification\/escalation is the following:<\/p>\n<ol>\n<li style=\"text-align: justify;\">The service\/host is checked in <strong>OK<\/strong> state with the <strong>check_interva<\/strong>l\u00a0timing;<\/li>\n<li style=\"text-align: justify;\">As the service\/host goes into a <strong>NON OK<\/strong> state, but didn&#8217;t reach the\u00a0<strong>max_check_attempts<\/strong>, the service enters a <strong>SOFT NON OK<\/strong> state and the next\u00a0check is scheduled with the <strong>retry_interval<\/strong> timing;<\/li>\n<li style=\"text-align: justify;\">As the service\/host in<strong> NON OK<\/strong> state reaches the\u00a0<strong>max_check_attempts <\/strong>value, the service enters an <strong>HARD NON OK STATE<\/strong> and\u00a0the next service\/host check is scheduled with the <strong>check_interval timing<\/strong>;<\/li>\n<li style=\"text-align: justify;\">Now, if you set <strong>first_notification_delay<\/strong>, this can alter the timing\u00a0for the first notification to be sent (<strong>0 means notifications to be sent\u00a0immediately<\/strong>);<\/li>\n<li style=\"text-align: justify;\">If you didn&#8217;t set the first_notification_delay, the first\u00a0notification is immediately sent and the following will be scheduled\u00a0with the <strong>notification_interval<\/strong> timing (<strong>0 means only the first\u00a0notification will be sent, the other will not be sent<\/strong>);<\/li>\n<\/ol>\n<p>Now, let&#8217;s make a practical example:<\/p>\n<pre>define serviceescalation{<\/pre>\n<pre style=\"padding-left: 90px;\">host_name \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 webserver<\/pre>\n<pre style=\"padding-left: 90px;\">service_description \u00a0 \u00a0 HTTP<\/pre>\n<pre style=\"padding-left: 90px;\">first_notification \u00a0 \u00a0 \u00a03<\/pre>\n<pre style=\"padding-left: 90px;\">last_notification \u00a0 \u00a0 \u00a0 5<\/pre>\n<pre style=\"padding-left: 90px;\">notification_interval \u00a0 45<\/pre>\n<pre style=\"padding-left: 90px;\">contact_groups \u00a0 \u00a0 \u00a0 \u00a0 \u00a0ITOps_Oncall,managers<\/pre>\n<pre style=\"padding-left: 90px;\">}\r\ndefine serviceescalation{<\/pre>\n<pre style=\"padding-left: 90px;\">host_name \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 webserver<\/pre>\n<pre style=\"padding-left: 90px;\">service_description \u00a0 \u00a0 HTTP<\/pre>\n<pre style=\"padding-left: 90px;\">first_notification \u00a0 \u00a0 \u00a06<\/pre>\n<pre style=\"padding-left: 90px;\">last_notification \u00a0 \u00a0 \u00a0 0<\/pre>\n<pre style=\"padding-left: 90px;\">notification_interval \u00a0 60<\/pre>\n<pre style=\"padding-left: 90px;\">contact_groups\t \u00a0ITOps_Oncall,managers,everyone<\/pre>\n<pre style=\"padding-left: 90px;\">}<\/pre>\n<p>What happens? Here we go:<\/p>\n<ol>\n<li style=\"text-align: justify;\">In this escalation, at the third notification (with the\u00a0notification_interval taking place), the notification interval changes\u00a0to 45 minutes, so the first notification will be sent after the\u00a0max_check_attempts value will be reached (assuming you didn&#8217;t put any\u00a0delay), then the second after 10 time units, usually 10 minutes, the\u00a0third 10 mins after the second, the fourth 45 mins after the third, the\u00a0fifth 45 mins after the fourth, the sixth 45 mins after the fifth.<\/li>\n<li style=\"text-align: justify;\">From the sixth notification, the new escalation comes into play. The\u00a0seventh notification will be sent after 60 minutes, and all the other\u00a0notifications will be sent 60 mins after each other. Keep in mind that\u00a0having used 0 as the last_notification value, you escalation will never\u00a0end till your check will return an OK status.<\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>Since someone asked in the nagios users mailing list for an help to understand how notification escalations work in Nagios, I wrote an easy and short workflow to explain the logic underneath. Notification escalations are a means to ensure that someone will ne notified, eventually. If a contact doesn&#8217;t acknowledge a problem, we can escalate &hellip;<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[326,128],"tags":[],"class_list":["post-1091","post","type-post","status-publish","format-standard","hentry","category-nagios","category-open-source","without-featured-image"],"_links":{"self":[{"href":"https:\/\/www.zarrelli.org\/blog\/wp-json\/wp\/v2\/posts\/1091","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.zarrelli.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.zarrelli.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.zarrelli.org\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.zarrelli.org\/blog\/wp-json\/wp\/v2\/comments?post=1091"}],"version-history":[{"count":0,"href":"https:\/\/www.zarrelli.org\/blog\/wp-json\/wp\/v2\/posts\/1091\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.zarrelli.org\/blog\/wp-json\/wp\/v2\/media?parent=1091"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.zarrelli.org\/blog\/wp-json\/wp\/v2\/categories?post=1091"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.zarrelli.org\/blog\/wp-json\/wp\/v2\/tags?post=1091"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}