Subscription Rules

The subscription rules in Nightingale monitoring can be accessed through the menu: Alerts - Rule Management - Subscription Rules TAB.

Why This Design

In Nightingale’s alert rules, you can directly configure notification rules, which is very intuitive. Alert events generated by this alert rule will follow this notification rule. Datadog and Open-Falcon have similar designs, which are basically sufficient. However, if you are familiar with Zabbix and Prometheus, you will find that after an alert event is generated, who it is sent to actually follows a subsequent subscription logic:

  • In the alert rule, only query conditions, thresholds, etc., are defined. That is, the alert rule is only responsible for event generation. As for how to notify and who to notify, the alert rule does not care about these.
  • Users use the subscription mechanism to filter all alert events, and for these filtered alert events, specify relevant notification rules (who to notify, how to notify).

This method is actually more flexible, but the disadvantage is that it is not intuitive enough. What about Nightingale? Both methods are supported. For ordinary users, it is recommended to use the method of “directly configuring notification rules in the alert rule” first, and use “subscription rules” for relatively rare scenarios, such as:

  • My service depends on other services that I don’t manage (the alert rules of these services notify their responsible persons, not me). However, if these services fail, they may affect my service. So I want to subscribe to SLI-related alert events of these services (this is a demand scenario mentioned by some community users. Although it is written here, the author does not actually endorse it. Please consider it carefully. The author believes that each service should have a dashboard that lists SLI data of other dependent services. When your own service fails, you should check this dashboard to determine whether it is a problem with your own service or a dependent downstream service).
  • Alert events generated by some general alert rules need to be distributed to different people. In this case, notification rules cannot be directly bound in the alert rules, so subscription rules can be used together to achieve this.
  • Some global operations, such as global callbacks, can be implemented through subscription rules. For example, you want any alert event generated by the system to call back to a certain Webhook address. In this case, you can configure a global subscription rule to match all alert events, and then configure a Webhook notification rule.

💡 Please carefully read the above text to understand the original intention of the subscription rule design. It is very, very, very important.

Configuration Method

Nightingale subscription rule configuration example

A subscription rule includes three parts of configuration:

  • Name: The name of the subscription rule. It is recommended to use a meaningful name so that others can know what this subscription rule is for at a glance, which is convenient for maintenance.
  • Filter configuration: Filter alert events in various dimensions. Note that it is to filter alert events, and these filtered alert events will follow the notification rules below.
  • Notification rules: The filtered alert events will follow these notification rules.

The overall logic is relatively clear. There are many configuration items in the filter configuration, which are introduced one by one below.

  • Data source type: Used to filter which data source type the alert event is generated through.
  • Data source: Used to filter which data source the alert event is generated through.
  • Event level: Used to filter the level of alert events. Multiple levels can be selected. By default, all are selected, which is equivalent to severity in ("Info", "Warning", "Critical"). Selecting all is actually equivalent to not filtering in the “event level” dimension.
  • Subscription alert rule: Used to filter which alert rule the alert event is generated by.
  • Business group: Used to filter which business group the alert event belongs to. An alert event must be triggered by a certain alert rule, so the business group of the alert event is the business group that the alert rule belongs to (the current version is v8.0.0, and this part will be considered for optimization in the future. In the future, the business group of the machine in the alert event will also be considered).
  • Event label: Used to filter the labels of alert events. Pay attention to the usage of operators. The specific explanation is below.
  • Subscription event duration: There is a small question mark icon on the right, which provides instructions for using this function, so it will not be repeated here.

The above filter conditions are in an and relationship as a whole. The event label part can be configured with multiple filter items, and the relationship between different items is also and. If you want to match multiple label values, you can use the in operator or the regular expression =~.

The specific explanations for the operators are as follows:

  • == matches a specific label value and can only fill in one. If you want to match multiple at the same time, you should use the in operator.
  • =~ fills in a regular expression to flexibly match label values.
  • in matches multiple label values, similar to in in SQL.
  • not in does not match the label values, and multiple can be filled in, similar to not in in SQL, used to exclude multiple label values.
  • != is not equal to, used to exclude a specific label value.
  • !~ does not match the regular expression. Fill in a regular expression, and label values that match this regular expression will be excluded, similar to !~ in PromQL.

Scenario example: Subscribe to all time-series alerts

For example, I want to subscribe to all time-series indicator-related alerts and then follow a unified Webhook notification rule for some automated processing logic. In this case, you can configure the data source type as Prometheus, select all event levels, and not configure other filter conditions.