Muting Rules

Muting rules in Nightingale monitoring (menu entry: Alerts - Rule Management - Muting Rules TAB) are typically used in the following scenarios:

  • To pre-block expected alerts, typically during maintenance activities, such as restarting a machine and pre-muting alerts related to that machine in advance
  • For issues that cannot be fixed immediately but are already known; continuous alert notifications are unnecessary, so temporary muting is applied

Principle

After an alert event is generated by the alert engine, it will first go through the judgment of muting rules before being persisted to the database. If it matches a muting rule, it will not be persisted to the database, let alone notify users. The working timing is as shown in the following figure:

Nightingale Muting Rule Working Timing

A muting rule is essentially a set of filter conditions used to filter the alert events that need to be muted. The filtering is based on the attributes and labels of the alert events. For example:

  • Which data source the event comes from
  • The severity level of the event
  • The labels of the event

Here is an example:

Example of Nightingale Muting Rule Configuration
  • Data source type: Prometheus; only alert events with the data source type Prometheus will be muted
  • Data source: Not configured, meaning no restriction
  • Event level: All three levels are checked, indicating that alert events of all levels will be muted
  • Event labels: Two labels are configured, which is equivalent to: ident in ("10,1.2.3", "10.1.2.4") and rulename =~ "downtime"

All the above filter conditions are in an and relationship, meaning an event will be muted only if it meets all the conditions.

FAQ

1. Why can I still see related alert events even after configuring a muting rule?

This is usually because the event was generated before the muting rule was configured. Muting rules are a post-remedial measure and cannot affect events that have already been generated.

2. Multiple conditions in event labels are also in an and relationship, but users may not understand this

As shown in the following figure, the user configured two entries in the event label filtering, both with the label key ident:

Example of Incorrect Event Label Configuration for Nightingale Muting Rules

The user intended to mute either of the two machines 10.1.2.113 and 10.1.2.114, but contrary to expectations, the relationship here is and, which is equivalent to: ident = "10.1.2.113" and ident = "10.1.2.114". Obviously, this condition will never match any event. In fact, the user should use the in operator, as shown below:

Example of Correct Event Label Configuration for Nightingale Muting Rules

3. The effective scope of a muting rule is limited to the current business group

This is actually prompted on the page. To avoid misoperations, the effective scope of a muting rule is limited to the current business group. That is, a muting rule can only mute alert events under the current business group, and alert events under other business groups will not be affected.

In other words: If a muting rule and an alert rule belong to different business groups, the muting rule will not take effect on the alert rule.

If a muting rule took effect globally, it would be dangerous. For example, if a user arbitrarily configured an alert rule with filter conditions that could match all alert events, all alert events of the company would be muted.