Configuration
The configuration file for the central n9e is etc/config.toml
, and the configuration file for the edge alert engine n9e-edge is etc/edge/edge.toml
. Here we first explain the n9e configuration file in sections.
Global
[Global]
RunMode = "release"
This is a configuration item used by Nightingale developers; ordinary users don’t need to care about it. It should always remain release
.
Log
[Log]
# stdout, stderr, file
Output = "stdout"
# log write dir
Dir = "logs"
# log level: DEBUG INFO WARNING ERROR
Level = "DEBUG"
# # rotate by time
# KeepHours = 4
# # rotate by size
# RotateNum = 3
# # unit: MB
# RotateSize = 256
Output
: Log output method, supportingstdout
,stderr
,file
. Only infile
mode will logs be output to files, and the following configuration items will be used.Dir
: Directory for storing log filesLevel
: Log level, supportingDEBUG
,INFO
,WARNING
,ERROR
KeepHours
: Log file retention time in hours. Logs can be rotated by time or size. If rotated by time, use this configuration item (one log file per hour); if rotated by size, use the following two configuration items.RotateNum
: Number of retained log filesRotateSize
: Log file size in MB
HTTP
[HTTP]
# http listening address
Host = "0.0.0.0"
# http listening port
Port = 17000
# https cert file path
CertFile = ""
# https key file path
KeyFile = ""
# whether print access log
PrintAccessLog = false
# whether enable pprof
PProf = true
# expose prometheus /metrics?
ExposeMetrics = true
# http graceful shutdown timeout, unit: s
ShutdownTimeout = 30
# max content length: 64M
MaxContentLength = 67108864
# http server read timeout, unit: s
ReadTimeout = 20
# http server write timeout, unit: s
WriteTimeout = 40
# http server idle timeout, unit: s
IdleTimeout = 120
Host
: HTTP service listening address, usually0.0.0.0
to listen on all network interfacesPort
: HTTP service listening portCertFile
: HTTPS certificate file pathKeyFile
: HTTPS key file pathPrintAccessLog
: Whether to print access logsPProf
: Whether to enable pprof. If enabled, pprof information can be viewed at/api/debug/pprof/
ExposeMetrics
: Whether to expose Prometheus’s/metrics
interface for exposing Nightingale’s own monitoring metricsShutdownTimeout
: HTTP service graceful shutdown timeout in secondsMaxContentLength
: Maximum HTTP request length in bytesReadTimeout
: HTTP read timeout in secondsWriteTimeout
: HTTP write timeout in secondsIdleTimeout
: HTTP idle timeout in seconds
HTTP.ShowCaptcha
[HTTP.ShowCaptcha]
Enable = false
Enable
: Whether to enable the captcha function
HTTP.APIForAgent
[HTTP.APIForAgent]
Enable = true
# [HTTP.APIForAgent.BasicAuth]
# user001 = "ccc26da7b9aba533cbb263a36c07dcc5"
Enable
: Whether to enable the API interface for Agent. Normally, this must be enabled, so this configuration item is generallytrue
.BasicAuth
: The Agent’s API interface supports BasicAuth. Configure BasicAuth username and password here. For internal network communication, BasicAuth is not required; for public network communication, it is recommended to configure BasicAuth, and the password must not use the default one to avoid attacks.- In the example above,
user001
is the BasicAuth username, andccc26da7b9aba533cbb263a36c07dcc5
is the BasicAuth password. To configure multiple users, you can add more entries, for example:
[HTTP.APIForAgent.BasicAuth]
user001 = "ccc26da7b9aba533cbb263a36c07dcc5"
user002 = "d4f5e6a7b8c9d0e1f2g3h4i5j6k7l8m9"
Note: If you configure BasicAuth, the corresponding username and password must also be configured in the Agent’s n9e
configuration file; otherwise, the Agent cannot connect to the central n9e.
The default configuration has Enable set to true
and HTTP.APIForAgent.BasicAuth
empty, indicating that the API interfaces for Agent are enabled without BasicAuth.
HTTP.APIForService
[HTTP.APIForService]
Enable = false
[HTTP.APIForService.BasicAuth]
user001 = "ccc26da7b9aba533cbb263a36c07dcc5"
Enable
: Whether to enable the API interface for Service. The edge alert engine n9e-edge communicates with the central n9e through these interfaces. So if you use n9e-edge, this needs to be enabled (set totrue
).BasicAuth
: The Service’s API interface supports BasicAuth. Configure BasicAuth username and password here. For internal network communication, BasicAuth is not required; for public network communication, it is recommended to configure BasicAuth, and the password must not use the default one to avoid attacks.- In the example above,
user001
is the BasicAuth username, andccc26da7b9aba533cbb263a36c07dcc5
is the BasicAuth password. To configure multiple users, you can add more entries, for example:
[HTTP.APIForService.BasicAuth]
user001 = "ccc26da7b9aba533cbb263a36c07dcc5"
user002 = "d4f5e6a7b8c9d0e1f2g3h4i5j6k7l8m9"
Note: If you configure BasicAuth, the corresponding username and password must also be configured in the n9e-edge configuration file; otherwise, n9e-edge cannot connect to the central n9e.
The default configuration has Enable set to false
, meaning the API interfaces for other Services are not enabled, and n9e-edge cannot connect to the central n9e.
HTTP.JWTAuth
[HTTP.JWTAuth]
# unit: min
AccessExpired = 1500
# unit: min
RefreshExpired = 10080
RedisKeyPrefix = "/jwt/"
Nightingale uses JWT for authentication. Here configure JWT expiration times in minutes: AccessExpired
is the expiration time of the access token, and RefreshExpired
is the expiration time of the refresh token. For the roles of these two tokens in JWT mechanism, you can ask GPT for details, which will not be elaborated here. Nightingale stores some JWT-related information in Redis, and RedisKeyPrefix
is the prefix for Redis keys, which generally does not need to be changed.
HTTP.ProxyAuth
[HTTP.ProxyAuth]
# if proxy auth enabled, jwt auth is disabled
Enable = false
# username key in http proxy header
HeaderUserNameKey = "X-User-Name"
DefaultRoles = ["Standard"]
If you want to embed Nightingale into your own system, you can consider using ProxyAuth, similar to Grafana’s ProxyAuth. It means that after a user logs in to your system, you can get the username and put it in the X-User-Name
header to pass to Nightingale, and Nightingale will consider the user logged in. DefaultRoles
is the default role; if you don’t pass roles, Nightingale will treat the user as having the Standard
role.
In fact, according to observations, no community users are currently using this function, so please use it with caution.
HTTP.RSA
[HTTP.RSA]
OpenRSA = false
When logging in to Nightingale, user passwords are transmitted in plain text. If the Nightingale site uses HTTPS, it’s fine; if it’s HTTP, it’s recommended to enable RSA encryption to prevent plain text transmission of user passwords.
DB
[DB]
# mysql postgres sqlite
DBType = "sqlite"
# postgres: host=%s port=%s user=%s dbname=%s password=%s sslmode=%s
# postgres: DSN="host=127.0.0.1 port=5432 user=root dbname=n9e_v6 password=1234 sslmode=disable"
# mysql: DSN="root:1234@tcp(localhost:3306)/n9e_v6?charset=utf8mb4&parseTime=True&loc=Local"
DSN = "n9e.db"
# enable debug mode or not
Debug = false
# unit: s
MaxLifetime = 7200
# max open connections
MaxOpenConns = 32
# max idle connections
MaxIdleConns = 8
DBType and DSN are the most critical, and the two configurations are linked. DBType supports three databases: mysql
, postgres
, and sqlite
. DSN is the database connection information; for sqlite, it’s the database file path; for mysql or postgres, it’s the database connection information.
Starting from version v8, Nightingale sets DBType to sqlite
by default to facilitate users’ quick experience without installing a database. However, in production environments, please use mysql
or postgres
.
For DSN configurations of Postgres and MySQL, you can refer to the examples in the comments. Other configurations are related to database connections and can be modified according to your environment. For general small and medium-sized environments, setting MaxOpenConns
to 32 and MaxIdleConns
to 8 is sufficient.
Redis
[Redis]
# standalone cluster sentinel miniredis
RedisType = "miniredis"
# address, ip:port or ip1:port,ip2:port for cluster and sentinel(SentinelAddrs)
Address = "127.0.0.1:6379"
# Username = ""
# Password = ""
# DB = 0
# UseTLS = false
# TLSMinVersion = "1.2"
# Mastername for sentinel type
# MasterName = "mymaster"
# SentinelUsername = ""
# SentinelPassword = ""
Redis is used not only to store JWT-related login authentication information but also to store metadata reported by machine heartbeats. The machine offline alert rules supported in Nightingale judge based on the heartbeat time of machines in Redis. If there is no heartbeat for a long time, the machine is considered offline.
If Redis responds slowly, it may cause false judgments of offline alerts. That is, the machine is actually alive, but the heartbeat information in Redis is not updated in time, eventually leading Nightingale to mistakenly judge the machine as offline. Starting from version V8.beta11, monitoring indicators related to Redis operations have been added. These indicators need to be paid attention to to detect slow Redis response issues in time.
RedisType supports four types: standalone
, cluster
, sentinel
, and miniredis
. Starting from Nightingale v8, Nightingale uses miniredis
by default to facilitate users’ quick experience without installing Redis. However, in production environments, please use other modes.
Address is the Redis connection address, and the configuration method varies according to RedisType:
standalone
: When RedisType isstandalone
, Address is the address of the Redis instance in the formatip:port
cluster
: When RedisType iscluster
, Address is the address of the Redis cluster in the formatip1:port,ip2:port
sentinel
: When RedisType issentinel
, Address is the address of Redis Sentinel in the formatip1:port,ip2:port
. In sentinel mode,MasterName
,SentinelUsername
, andSentinelPassword
also need to be configuredUseTLS
: Whether to use TLSTLSMinVersion
: Minimum TLS version, which takes effect only whenUseTLS
istrue
Alert
Starting from a certain version, Nightingale merged the webapi and alert engine modules to reduce deployment complexity. The configuration items here under Alert are for the alert engine.
Alert.Heartbeat
[Alert.Heartbeat]
# auto detect if blank
IP = ""
# unit ms
Interval = 1000
EngineName = "default"
IP
: The IP address of the alert engine. If empty, Nightingale will automatically detect it. Each alert engine writes heartbeat information to MySQL, so that each alert engine knows the list of all alive alert engines, and then can perform sharding processing of alert rules. For example, with 100 alert rules and a cluster of two n9e instances, each n9e will process approximately 50 rules. When one alert engine goes down, the other will take over all 100 rules.Interval
: Heartbeat interval in millisecondsEngineName
: The name of the alert engine. Generally, the central end maintainsdefault
; for the edge alert engine n9e-edge, you can customize the EngineName, such asedge1
,edge2
, etc. Alert engines with the same EngineName are considered a cluster.
Center
Unique configurations for the central n9e, not present in the edge alert engine n9e-edge. These are the unique configurations related to the old version of n9e-webapi.
[Center]
MetricsYamlFile = "./etc/metrics.yaml"
I18NHeaderKey = "X-Language"
[Center.AnonymousAccess]
PromQuerier = true
AlertDetail = true
MetricsYamlFile
: Path to the metric configuration file. The explanations of metrics you see in the quick view come from this configuration file. Later, the metric view was launched, making this configuration file less important, and even the quick view function is planned to be removed.I18NHeader
: This is a configuration item for developers; ordinary users don’t need to care about it.Center.AnonymousAccess
: Configuration items related to anonymous access. PromQuerier indicates whether to allow anonymous query of interfaces of various data sources; AlertDetail indicates whether to allow anonymous viewing of alert details. It can be enabled in internal network environments but must be disabled in public network environments.
The dashboard has a public access function, which can even be set to be accessible without login, but this requires PromQuerier to be set to true
. That is, if PromQuerier = false
, even if the dashboard is set to public access, login is still required.
Pushgw
Although Nightingale does not directly store monitoring data, it provides multiple interfaces for receiving monitoring data, such as interfaces for the Prometheus remote write protocol, OpenTSDB protocol, etc. After receiving the data, Nightingale forwards the monitoring data to the backend time-series database, so Nightingale acts as a Pushgateway here, and the configuration items related to Pushgateway are under Pushgw
.
[Pushgw]
# use target labels in database instead of in series
LabelRewrite = true
ForceUseServerTS = true
LabelRewrite
: Nightingale has a machine management menu where you can tag machines, and these tags are attached to time-series data related to the machines. However, if a tag in the reported data conflicts with a tag in machine management, which one takes precedence? IfLabelRewrite
istrue
, the tag in machine management takes precedence; otherwise, the reported tag takes precedence.ForceUseServerTS
: Whether to force the use of the server’s timestamp to overwrite the timestamp of the received monitoring data. Previously, there was no this configuration item. Due to confusion caused by uncalibrated machine time in many companies, Nightingale provides this configuration. It is recommended to enable it to uniformly use the server’s timestamp.
Pushgw.DebugSample
[Pushgw.DebugSample]
ident = "xx"
__name__ = "cpu_usage_active"
This configuration is for debugging and troubleshooting. It is actually a filter condition for monitoring metrics. If the metrics reported to Nightingale meet this filter condition, they will be printed to the log. Generally, no configuration is needed; it can be commented out.
Pushgw.WriterOpt
[Pushgw.WriterOpt]
QueueMaxSize = 1000000
QueuePopSize = 1000
QueueNumber = 0
This part of the configuration is commented out by default because normally users don’t need to pay attention to it. If Nightingale receives too much data, which gets congested in memory and eventually leads to metric loss, you need to consider adjusting the configuration here.
Nightingale creates QueueNumber queues in memory. After receiving monitoring data, it puts the data into these queues. The default configuration of QueueNumber is 0, indicating that the specific number is not specified, and queues are created according to the number of CPU cores. The maximum capacity of each queue is QueueMaxSize, which defaults to 1000000, meaning each queue can store up to 1 million data entries.
Each queue corresponds to a goroutine, which fetches QueuePopSize metrics from the queue each time. The default is 1000, meaning 1000 data entries are fetched from the queue each time and written to the backend time-series database as a batch. This takes full advantage of multi-core CPU performance. Therefore, the number of QueueNumber is essentially equal to the concurrency of writing to the backend time-series database.
Pushgw.Writers
Here configure the remote write addresses of the backend time-series databases. All time-series databases supporting the remote write protocol can be configured here. Generally, only one needs to be configured; if you want to write to multiple time-series databases simultaneously, you can configure multiple.
[[Pushgw.Writers]]
Url = "http://127.0.0.1:9090/api/v1/write"
BasicAuthUser = "xx"
BasicAuthPass = "xx"
[[Pushgw.Writers]]
Url = "http://127.0.0.1:8482/api/v1/write"
BasicAuthUser = "xx"
BasicAuthPass = "xx"
Url
: Remote write address of the time-series databaseBasicAuth
: If the time-series database requires BasicAuth authentication, configure the BasicAuth username and password hereHeaders
: If the time-series database requires additional headers, configure them hereTimeout
: Write timeout in millisecondsDialTimeout
: Connection timeout in milliseconds
Pushgw.Writers.WriteRelabels
Data written to the time-series database can undergo relabeling before writing. This configuration item is for relabeling, similar to Prometheus’s relabel configuration items, except that Prometheus uses yaml format while Nightingale uses toml format.
Ibex
Configuration items for the fault self-healing engine Ibex, i.e., the function for remote script execution. Originally, this function was a separate module called ibex, which was later merged into n9e, so this configuration item is also in n9e.
[Ibex]
Enable = true
RPCListen = "0.0.0.0:20090"
Enable
: Whether to enable the Ibex server functionRPCListen
: RPC service listening address of Ibex
n9e-edge configuration
The configuration file for the edge alert engine n9e-edge is etc/edge/edge.toml
, and most configurations are the same as those of the central n9e. For more information, you can refer to this article: 《Nightingale Monitoring - Edge Alert Engine Architecture Detailed Explanation》.