Nagios notification escalations made easy

Since someone asked in the nagios users mailing list for an help to understand how notification escalations work in Nagios, I wrote an easy and short workflow to explain the logic underneath.

Notification escalations are a means to ensure that someone will ne notified, eventually. If a contact doesn’t acknowledge a problem, we can escalate the nofications to someone else, changing the way we deliver the messages if we think it’s a good strategy.

In this post we won’t discuss of the syntax or the options regarding the escalations, we will have a look at the logic of the escalations, at how they take place and how they work.

Here the workflow for check/notification/escalation is the following:

  1. The service/host is checked in OK state with the check_interval timing;
  2. As the service/host goes into a NON OK state, but didn’t reach the max_check_attempts, the service enters a SOFT NON OK state and the next check is scheduled with the retry_interval timing;
  3. As the service/host in NON OK state reaches the max_check_attempts value, the service enters an HARD NON OK STATE and the next service/host check is scheduled with the check_interval timing;
  4. Now, if you set first_notification_delay, this can alter the timing for the first notification to be sent (0 means notifications to be sent immediately);
  5. If you didn’t set the first_notification_delay, the first notification is immediately sent and the following will be scheduled with the notification_interval timing (0 means only the first notification will be sent, the other will not be sent);

Now, let’s make a practical example:

define serviceescalation{
host_name               webserver
service_description     HTTP
first_notification      3
last_notification       5
notification_interval   45
contact_groups          ITOps_Oncall,managers
}
define serviceescalation{
host_name               webserver
service_description     HTTP
first_notification      6
last_notification       0
notification_interval   60
contact_groups	  ITOps_Oncall,managers,everyone
}

What happens? Here we go:

  1. In this escalation, at the third notification (with the notification_interval taking place), the notification interval changes to 45 minutes, so the first notification will be sent after the max_check_attempts value will be reached (assuming you didn’t put any delay), then the second after 10 time units, usually 10 minutes, the third 10 mins after the second, the fourth 45 mins after the third, the fifth 45 mins after the fourth, the sixth 45 mins after the fifth.
  2. From the sixth notification, the new escalation comes into play. The seventh notification will be sent after 60 minutes, and all the other notifications will be sent 60 mins after each other. Keep in mind that having used 0 as the last_notification value, you escalation will never end till your check will return an OK status.
Be Sociable, Share!

9 Risposte a “Nagios notification escalations made easy”

  1. Great Article, it helps me alot in writing service escalation configs, the workflow is great, it wasn’t explained well in the Nagios-3 guide like that, Thank you for the useful information 🙂

  2. Great! Glad to hear it 🙂

  3. Great, this article explain very well how escaltion works.
    A question, in the first example, when last_notification is reached no more mail will be send. Is this correct ?

    Thanks

  4. Yes, but the second configuration starts from the sixth notification to infinite

  5. Ok…..so I think I have a problem, nagios continue to send mail beyond the last_notification….

  6. Luke, send me your config, please

  7. So where is this defined? In a config file? If so which one?

  8. What should I be the file in your example above. Can I append the above example to an existing .cfg file? If so, to which .cfg file?

  9. Hi Giorgio ,

    Thank you for the this subject.I was created a hostescalation which is contain one host config, but it doesn’t work properly when i set all settings for the escalation purpose.

    I read your suggestion above, but still same.

    My configurations are below:

    Host config:

    define host {
    host_name                       operasyon_test
    alias                           operasyon_test
    address                         10.10.10.20
    check_command                   check-host-alive
    max_check_attempts              1
    check_interval                  2
    retry_interval                  1
    check_period                    24×7
    contacts                        first_mail
    notification_interval           1
    notification_period             24×7
    notification_options            d,u,r,f
    notifications_enabled           1
    register                        1
    }

    Hostescalation config:

    define hostescalation {
    host_name                       operasyon_test
    contacts                        second_mail
    first_notification              2
    last_notification               1
    notification_interval           5
    escalation_period               24×7
    escalation_options              d,u,r
    register                        1
    }

Lascia un commento

Il tuo indirizzo email non sarà pubblicato.

Questo sito usa Akismet per ridurre lo spam. Scopri come i tuoi dati vengono elaborati.