alertmanager

mirror of https://github.com/prometheus/alertmanager.git synced 2026-02-06 00:46:17 +01:00

Author	SHA1	Message	Date
Siavash Safi	c90d8707d5	feat(provider): implement per-alert limits (#4819 ) * feat(limit): add new limit package with bucket Add a new limit package with generic bucket implementation. This can be used for example to limit the number of alerts in memory. Benchmarks: ```go goos: darwin goarch: arm64 pkg: github.com/prometheus/alertmanager/limit cpu: Apple M3 Pro BenchmarkBucketUpsert/EmptyBucket-12 8816954 122.4 ns/op 56 B/op 2 allocs/op BenchmarkBucketUpsert/AddToFullBucketWithExpiredItems-12 9861010 123.0 ns/op 56 B/op 2 allocs/op BenchmarkBucketUpsert/AddToFullBucketWithActiveItems-12 8343778 143.6 ns/op 56 B/op 2 allocs/op BenchmarkBucketUpsert/UpdateExistingAlert-12 10107787 118.9 ns/op 56 B/op 2 allocs/op BenchmarkBucketUpsert/MixedWorkload-12 9436174 126.0 ns/op 56 B/op 2 allocs/op BenchmarkBucketUpsertScaling/BucketSize_10-12 10255278 115.4 ns/op 56 B/op 2 allocs/op BenchmarkBucketUpsertScaling/BucketSize_50-12 10166518 117.1 ns/op 56 B/op 2 allocs/op BenchmarkBucketUpsertScaling/BucketSize_100-12 10457394 115.0 ns/op 56 B/op 2 allocs/op BenchmarkBucketUpsertScaling/BucketSize_500-12 9644079 115.2 ns/op 56 B/op 2 allocs/op BenchmarkBucketUpsertScaling/BucketSize_1000-12 10426184 116.6 ns/op 56 B/op 2 allocs/op BenchmarkBucketUpsertConcurrent-12 5796210 216.3 ns/op 406 B/op 5 allocs/op PASS ok github.com/prometheus/alertmanager/limit 15.497s ``` Signed-off-by: Siavash Safi <siavash@cloudflare.com> * feat(provider): implement per-alert limits Use the new limit module to add optional per alert-name limits. The metrics for limited alerts can be enabled using `alerts-limited-metric` feature flag. Signed-off-by: Siavash Safi <siavash@cloudflare.com> --------- Signed-off-by: Siavash Safi <siavash@cloudflare.com>	2026-02-01 13:21:20 +01:00
Siavash Safi	18939cee8f	feat: add distributed tracing support (#4745 ) Add tracing support using otel to the the following components: - api: extract trace and span IDs from request context - provider: mem put - dispatch: split logic and use better naming - inhibit: source and target traces, mutes, etc. drop metrics - silence: query, expire, mutes - notify: add distributed tracing support to stages and all http requests Note: inhibitor metrics are dropped since we have tracing now and they are not needed. We have not released any version with these metrics so we can drop them safely, this is not a breaking change. This change borrows part of the implementation from #3673 Fixes #3670 Signed-off-by: Dave Henderson <dhenderson@gmail.com> Signed-off-by: Siavash Safi <siavash@cloudflare.com> Co-authored-by: Dave Henderson <dhenderson@gmail.com>	2025-12-05 22:58:44 +01:00
Ben Kochie	f656273159	Enable modernize linter (#4750 ) Enable the golangci-lint modernize linter. * Add exception for the omitempty struct issue. * Apply modernize fixes. Signed-off-by: SuperQ <superq@gmail.com>	2025-11-18 11:46:15 +01:00
Solomon Jacobs	023885c52c	chore: modernize with range over int (#4746 ) A purely mechanical change with no effect on behaviour. Stops `gopls` from complaining. Corresponding spec, see here: https://github.com/golang/go/issues/61405 Introduced in `Go 1.22`, see here: https://go.dev/ref/spec#For_range Signed-off-by: Solomon Jacobs <solomonjacobs@protonmail.com>	2025-11-18 09:43:43 +01:00
Ethan Hunter	ab315ea134	Add new behavior to avoid races on config reload (#4705 ) * Add new behavior to avoid races on config reload * Add context to Groups to allow timeouts --------- Signed-off-by: Ethan Hunter <ehunter@hudson-trading.com>	2025-11-18 09:42:18 +01:00
Siavash Safi	2e0970e8d8	feat(dispatch): add start delay (#4704 ) This change adds a new cmd flag `--dispatch.start-delay` which corresponds to the `--rules.alert.resend-delay` flag in Prometheus. This flag controls the minimum amount of time that Prometheus waits before resending an alert to Alertmanager. By adding this value to the start time of Alertmanager, we delay the aggregation groups' first flush, until we are confident all alerts are resent by Prometheus instances. This should help avoid race conditions in inhibitions after a (re)start. Other improvements: - remove hasFlushed flag from aggrGroup - remove mutex locking from aggrGroup Signed-off-by: Alexander Rickardsson <alxric@aiven.io> Signed-off-by: Siavash Safi <siavash@cloudflare.com> Co-authored-by: Alexander Rickardsson <alxric@aiven.io>	2025-11-15 15:35:59 +01:00
Siavash Safi	f7ff687529	feat(provider): add subscriber channel metrics (#4630 ) Add `alertmanager_alerts_subscriber_channel_writes_total` metric to track the number of alerts written to subscriber channels. A drop in the rate of this metric may indicate a problem with the ingestion of alerts by subscribers (inhibitor and dispatcher). Signed-off-by: Siavash Safi <siavash@cloudflare.com>	2025-10-28 12:01:43 +01:00
Siavash Safi	d7f2c924a9	fix(marker): stop state leakage from aggregation groups This change makes aggregation groups to delete resolved alerts from marker, therefore avoiding the leakage of ghost states mentioned in #4402. Signed-off-by: Siavash Safi <siavash@cloudflare.com>	2025-10-19 00:27:06 +02:00
Siavash Safi	1b221e38af	feat(dispatcher): add maintenance interval config - make dispatcher maintenance interval configurable Related to #4540 Signed-off-by: Siavash Safi <siavash@cloudflare.com>	2025-09-08 10:02:17 +02:00
TJ Hoplock	f6b942cf9b	chore!: adopt log/slog, drop go-kit/log (#4089 ) * chore!: adopt log/slog, drop go-kit/log The bulk of this change set was automated by the following script which is being used to aid in converting the various exporters/projects to use slog: https://gist.github.com/tjhop/49f96fb7ebbe55b12deee0b0312d8434 This commit includes several changes: - bump exporter-tookit to v0.13.1 for log/slog support - updates golangci-lint deprecated configs - enables sloglint linter - removes old go-kit/log linter configs - introduce some `if logger == nil { $newLogger }` additions to prevent nil references - converts cluster membership config to use a stdlib compatible slog adapter, rather than creating a custom io.Writer for use as the membership `logOutput` config Signed-off-by: TJ Hoplock <t.hoplock@gmail.com> * chore: address PR feedback Signed-off-by: TJ Hoplock <t.hoplock@gmail.com> --------- Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>	2024-11-06 09:09:57 +00:00
George Robinson	5979dff9dc	Show muted alerts in Alert Groups API (#3797 ) This commit updates /api/v2/alerts/groups to show if an alert is suppressed from one or more active or mute time intervals. While the muted by field can be found in /api/v2/alerts, it is not used here because /api/v2/alerts does not take aggregation or routing into consideration. It also updates the UI to support filtering muted alerts via the Muted checkbox. Signed-off-by: George Robinson <george.robinson@grafana.com>	2024-10-23 16:42:21 +01:00
George Robinson	c4a763c401	#3513 : Mark muted alerts (#3793 ) * Mark muted groups This commit updates TimeMuteStage and TimeActiveStage to mark groups as muted when its alerts are muted by an active or mute time interval, and remove any existing markers when outside all active and mute time intervals. Signed-off-by: George Robinson <george.robinson@grafana.com> * Remove unlock to defer Signed-off-by: George Robinson <george.robinson@grafana.com> --------- Signed-off-by: George Robinson <george.robinson@grafana.com>	2024-05-13 11:16:26 +01:00
George Robinson	ca4c90eb4e	Fix race condition in dispatch.go (#3826 ) * Fix race condition in dispatch.go This commit fixes a race condition in dispatch.go that would cause a firing alert to be deleted from the aggregation group when instead it should have been flushed. The root cause is a race condition that can occur when dispatch.go deletes resolved alerts from the aggregation group following a successful notification. If a firing alert with the same fingerprint is added back to the aggregation group at the same time then the firing alert can be deleted. --------- Signed-off-by: George Robinson <george.robinson@grafana.com>	2024-05-07 10:34:03 +01:00
Matthieu MOREL	b9e347b9d1	golangci-lint: enable testifylint linter Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>	2023-12-10 08:50:03 +00:00
gotjosh	805e505288	Alert metric reports different results to what the user sees via API (#2943 ) * Alert metric reports different results to what the user sees via API Fixes #1439 and #2619. The previous metric is not _technically_ reporting incorrect results as the alerts _are_ still around and will be re-used if that same alert (equal fingerprint) is received before it is GCed. Therefore, I have kept the old metric under a new name `alertmanager_marked_alerts` and repurpose the current metric to match what the user sees in the UI. Signed-off-by: gotjosh <josue.abreu@gmail.com>	2022-06-16 12:16:06 +02:00
Matthias Loibl	a6d10bd5bc	Update golangci-lint and fix complaints (#2853 ) * Copy latest golangci-lint files from Prometheus Signed-off-by: Matthias Loibl <mail@matthiasloibl.com> * Use grafana/regexp over stdlib regexp Signed-off-by: Matthias Loibl <mail@matthiasloibl.com> * Fix typos in comments Signed-off-by: Matthias Loibl <mail@matthiasloibl.com> * Fix goimports complains in import sorting Signed-off-by: Matthias Loibl <mail@matthiasloibl.com> * gofumpt all Go files Signed-off-by: Matthias Loibl <mail@matthiasloibl.com> * Update naming to comply with revive linter Signed-off-by: Matthias Loibl <mail@matthiasloibl.com> * config: Fix error messages to be lower case Signed-off-by: Matthias Loibl <mail@matthiasloibl.com> * test/cli: Fix error messages to be lower case Signed-off-by: Matthias Loibl <mail@matthiasloibl.com> * .golangci.yaml: Remove obsolete space Signed-off-by: Matthias Loibl <mail@matthiasloibl.com> * config: Fix expected victorOps error Signed-off-by: Matthias Loibl <mail@matthiasloibl.com> * Use stdlib regexp Signed-off-by: Matthias Loibl <mail@matthiasloibl.com> * Clean up Go modules Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>	2022-03-25 17:59:51 +01:00
Julien Pivotto	b2a4cacb95	Update go dependencies & switch to go-kit/log Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2021-08-02 12:43:23 +02:00
Peter Štibraný	0bb65d1e4b	Reduce number of dispatched alerts to avoid hitting the limit on number of alive goroutines. Signed-off-by: Peter Štibraný <pstibrany@gmail.com>	2021-06-02 15:28:00 +02:00
Peter Štibraný	b3ea60e9bb	Fix compilation errors after rebase on master. Signed-off-by: Peter Štibraný <pstibrany@gmail.com>	2021-06-02 15:14:55 +02:00
Peter Štibraný	358645cfe2	Extract TestGroupsWithLimits, and remove limit test from TestGroups. Signed-off-by: Peter Štibraný <pstibrany@gmail.com>	2021-06-02 12:00:31 +02:00
Peter Štibraný	0f86edcf5c	Extract TestGroupsWithLimits, and remove limit test from TestGroups. Signed-off-by: Peter Štibraný <pstibrany@gmail.com>	2021-06-02 12:00:31 +02:00
Peter Štibraný	d5ed7bfb15	Only register limit metrics when they are used. Limits are not used in standalone alertmanager. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>	2021-06-02 12:00:31 +02:00
Peter Štibraný	390474ffbe	Added group limit to dispatcher. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>	2021-06-02 12:00:31 +02:00
Peter Štibraný	cc0b08fd7c	Added possibility to pass callback to *mem.NewAlerts, useful for implementing limits on alerts. Update provider/mem/mem.go Co-authored-by: Julien Pivotto <roidelapluie@gmail.com> Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>	2021-05-31 09:56:57 +02:00
Marco Pracucci	f84af78693	Lowered number of alert groups Signed-off-by: Marco Pracucci <marco@pracucci.com>	2021-05-11 16:15:46 +02:00
Marco Pracucci	1ad22c808f	Added unit test Signed-off-by: Marco Pracucci <marco@pracucci.com>	2021-05-11 15:48:02 +02:00
Jacob Lisi	0c0c6bdb01	Fix race condition in dispatcher (#2208 ) * fix dispatcher race condition Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com> * add test to check for race condition in dispatcher Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com> * return when dispatcher Stop has nil receiver Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com> * remove unneeded chec Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>	2020-03-19 15:32:37 +01:00
Simon Pasquier	4f45457b9c	dispatch: add metrics (#2113 ) Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2019-11-26 09:04:56 +01:00
johncming	bad2e792ca	dispatch: route group labels should contain group common label. (#2055 ) Signed-off-by: johncming <johncming@yahoo.com>	2019-10-02 14:54:34 +02:00
Simon Pasquier	ab537b5b2f	dispatch: fix missing receivers in Groups() (#1964 ) Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2019-07-24 17:12:37 +02:00
Simon Pasquier	2ccb4707f1	dispatch: fix flaky test Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2019-04-23 11:16:48 +02:00
stuart nelson	2fa210d0e3	add groups endpoint to v2 api Signed-off-by: stuart nelson <stuartnelson3@gmail.com>	2019-04-17 11:32:21 +02:00
Brian Brazil	7078333202	Make a copy of firing alerts with EndsAt=0 when flushing. (#1686 ) If the original EndsAt is left in place, then as time moves forwards past the EndsAt then firing alerts will be rendered and treated as resolved alerts which can cause confusion and races. This is most likely to happen on retries for a notification. Mitigate race and fix data races in TestAggrGroup. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>	2019-01-04 16:52:20 +01:00
kirillsablin	32bb289906	dispatch: Add group_by_all support (#1588 ) To aggregate by all possible labels use '...' as the sole label name. This effectively disables aggregation entirely, passing through all alerts as-is. This is unlikely to be what you want, unless you have a very low alert volume or your upstream notification system performs its own grouping. Example: group_by: [...] Signed-off-by: Kyryl Sablin <kyryl.sablin@schibsted.com>	2018-11-29 12:31:14 +01:00
Simon Pasquier	306fd73e32	*: remove use of golang.org/x/net/context Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2018-11-09 10:00:23 +01:00
Simon Pasquier	899226f3ac	*: remove v1/alerts/groups API endpoint (#1525 ) Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2018-08-23 16:03:49 +02:00
Simon Pasquier	0ebaeccd4b	*: add missing license headers Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2018-05-14 17:37:13 +02:00
pasquier-s	c39a913f8a	test: enable race detection (#1262 ) This change enables race detection when running the tests. It also fixes a couple of existing race conditions.	2018-02-27 18:18:53 +01:00
Julius Volz	947970af44	Convert Alertmanager to use non-global go-kit loggers Fixes https://github.com/prometheus/alertmanager/issues/1040	2017-10-22 00:20:40 -07:00
Frederic Branczyk	5328885fe9	dispatch: fix race condition in dispatch test (#1025 )	2017-10-04 18:01:23 +02:00
Corentin Chary	9b2afbf18b	Make sure Matchers are always ordered This fixes https://github.com/prometheus/alertmanager/issues/881 Also add some unit tests	2017-06-23 15:30:34 +02:00
Fabian Reinartz	3269bc39e1	*: switch group key to matcher serialization Turn the GroupKey into a string that is composed of the matchers if the path in the routing tree and the grouping labels. Only hash it at the very end to ensure we don't exceed size limits of integration APIs.	2017-04-21 12:06:23 +02:00
stuart nelson	1e34f29532	Filter alerts (#633 ) * Vendor dependencies. This updates several old dependencies, removes some that are no longer needed, and adds `pkg/labels` from prometheus `dev-2.0` branch. * Add metrics selector parsing code This is a temporary simplified re-implementation of promQL's metric selector parsing. * Add alerts filtering Filter alerts through `?filter=` query string. * Add silences filtering Filter silences through `?filter=` query string. * Move `parse` to `pkg/parse`	2017-03-16 11:16:10 +01:00
Fabian Reinartz	e9fbe62e0f	*: consider mesh wait in notification timeouts This adds the peer wait duration to the standard timeout to avoid terminating a notification prematurely while being in failover wait status.	2016-09-05 13:21:28 +02:00
Fabian Reinartz	998a9ce38e	notify: rename Receiver to ReceiverName This string value is initially used to store a receiver name. It is later overloaded with a unique string identifier of <name, integration, index>. This renaming is in preparation to separate the two and use the Receiver object of the nflogpb package.	2016-08-16 16:33:17 +02:00
Fabian Reinartz	3931d4e64b	*: restructure package tree This commit packages up individual modules and removes the top-level main package.	2016-08-09 14:24:52 +02:00

46 Commits