1
0
mirror of https://github.com/prometheus/alertmanager.git synced 2026-02-06 00:46:17 +01:00

46 Commits

Author SHA1 Message Date
Siavash Safi
c90d8707d5 feat(provider): implement per-alert limits (#4819)
* feat(limit): add new limit package with bucket

Add a new limit package with generic bucket implementation.
This can be used for example to limit the number of alerts in memory.

Benchmarks:
```go
goos: darwin
goarch: arm64
pkg: github.com/prometheus/alertmanager/limit
cpu: Apple M3 Pro
BenchmarkBucketUpsert/EmptyBucket-12  	 8816954	       122.4 ns/op	      56 B/op	       2 allocs/op
BenchmarkBucketUpsert/AddToFullBucketWithExpiredItems-12         	 9861010	       123.0 ns/op	      56 B/op	       2 allocs/op
BenchmarkBucketUpsert/AddToFullBucketWithActiveItems-12          	 8343778	       143.6 ns/op	      56 B/op	       2 allocs/op
BenchmarkBucketUpsert/UpdateExistingAlert-12                     	10107787	       118.9 ns/op	      56 B/op	       2 allocs/op
BenchmarkBucketUpsert/MixedWorkload-12                           	 9436174	       126.0 ns/op	      56 B/op	       2 allocs/op
BenchmarkBucketUpsertScaling/BucketSize_10-12                    	10255278	       115.4 ns/op	      56 B/op	       2 allocs/op
BenchmarkBucketUpsertScaling/BucketSize_50-12                    	10166518	       117.1 ns/op	      56 B/op	       2 allocs/op
BenchmarkBucketUpsertScaling/BucketSize_100-12                   	10457394	       115.0 ns/op	      56 B/op	       2 allocs/op
BenchmarkBucketUpsertScaling/BucketSize_500-12                   	 9644079	       115.2 ns/op	      56 B/op	       2 allocs/op
BenchmarkBucketUpsertScaling/BucketSize_1000-12                  	10426184	       116.6 ns/op	      56 B/op	       2 allocs/op
BenchmarkBucketUpsertConcurrent-12                               	 5796210	       216.3 ns/op	     406 B/op	       5 allocs/op
PASS
ok  	github.com/prometheus/alertmanager/limit	15.497s
```

Signed-off-by: Siavash Safi <siavash@cloudflare.com>

* feat(provider): implement per-alert limits

Use the new limit module to add optional per alert-name limits.
The metrics for limited alerts can be enabled using
`alerts-limited-metric` feature flag.

Signed-off-by: Siavash Safi <siavash@cloudflare.com>

---------

Signed-off-by: Siavash Safi <siavash@cloudflare.com>
2026-02-01 13:21:20 +01:00
Siavash Safi
18939cee8f feat: add distributed tracing support (#4745)
Add tracing support using otel to the the following components:
- api: extract trace and span IDs from request context
- provider: mem put
- dispatch: split logic and use better naming
- inhibit: source and target traces, mutes, etc. drop metrics
- silence: query, expire, mutes
- notify: add distributed tracing support to stages and all http requests

Note: inhibitor metrics are dropped since we have tracing now and they
are not needed. We have not released any version with these metrics so
we can drop them safely, this is not a breaking change.

This change borrows part of the implementation from #3673
Fixes #3670

Signed-off-by: Dave Henderson <dhenderson@gmail.com>
Signed-off-by: Siavash Safi <siavash@cloudflare.com>
Co-authored-by: Dave Henderson <dhenderson@gmail.com>
2025-12-05 22:58:44 +01:00
Ben Kochie
f656273159 Enable modernize linter (#4750)
Enable the golangci-lint modernize linter.
* Add exception for the omitempty struct issue.
* Apply modernize fixes.

Signed-off-by: SuperQ <superq@gmail.com>
2025-11-18 11:46:15 +01:00
Solomon Jacobs
023885c52c chore: modernize with range over int (#4746)
A purely mechanical change with no effect on behaviour. Stops `gopls`
from complaining.

Corresponding spec, see here:
https://github.com/golang/go/issues/61405

Introduced in `Go 1.22`, see here:
https://go.dev/ref/spec#For_range

Signed-off-by: Solomon Jacobs <solomonjacobs@protonmail.com>
2025-11-18 09:43:43 +01:00
Ethan Hunter
ab315ea134 Add new behavior to avoid races on config reload (#4705)
* Add new behavior to avoid races on config reload
* Add context to Groups to allow timeouts

---------

Signed-off-by: Ethan Hunter <ehunter@hudson-trading.com>
2025-11-18 09:42:18 +01:00
Siavash Safi
2e0970e8d8 feat(dispatch): add start delay (#4704)
This change adds a new cmd flag `--dispatch.start-delay` which
corresponds to the `--rules.alert.resend-delay` flag in Prometheus.
This flag controls the minimum amount of time that Prometheus waits
before resending an alert to Alertmanager.

By adding this value to the start time of Alertmanager, we delay
the aggregation groups' first flush, until we are confident all alerts
are resent by Prometheus instances.

This should help avoid race conditions in inhibitions after a (re)start.

Other improvements:
- remove hasFlushed flag from aggrGroup
- remove mutex locking from aggrGroup

Signed-off-by: Alexander Rickardsson <alxric@aiven.io>
Signed-off-by: Siavash Safi <siavash@cloudflare.com>
Co-authored-by: Alexander Rickardsson <alxric@aiven.io>
2025-11-15 15:35:59 +01:00
Siavash Safi
f7ff687529 feat(provider): add subscriber channel metrics (#4630)
Add `alertmanager_alerts_subscriber_channel_writes_total` metric to
track the number of alerts written to subscriber channels.
A drop in the rate of this metric may indicate a problem with the
ingestion of alerts by subscribers (inhibitor and dispatcher).

Signed-off-by: Siavash Safi <siavash@cloudflare.com>
2025-10-28 12:01:43 +01:00
Siavash Safi
d7f2c924a9 fix(marker): stop state leakage from aggregation groups
This change makes aggregation groups to delete resolved alerts from marker,
therefore avoiding the leakage of ghost states mentioned in #4402.

Signed-off-by: Siavash Safi <siavash@cloudflare.com>
2025-10-19 00:27:06 +02:00
Siavash Safi
1b221e38af feat(dispatcher): add maintenance interval config
- make dispatcher maintenance interval configurable

Related to #4540

Signed-off-by: Siavash Safi <siavash@cloudflare.com>
2025-09-08 10:02:17 +02:00
TJ Hoplock
f6b942cf9b chore!: adopt log/slog, drop go-kit/log (#4089)
* chore!: adopt log/slog, drop go-kit/log

The bulk of this change set was automated by the following script which
is being used to aid in converting the various exporters/projects to use
slog:

https://gist.github.com/tjhop/49f96fb7ebbe55b12deee0b0312d8434

This commit includes several changes:
- bump exporter-tookit to v0.13.1 for log/slog support
- updates golangci-lint deprecated configs
- enables sloglint linter
- removes old go-kit/log linter configs
- introduce some `if logger == nil { $newLogger }` additions to prevent
  nil references
- converts cluster membership config to use a stdlib compatible slog
  adapter, rather than creating a custom io.Writer for use as the
membership `logOutput` config

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>

* chore: address PR feedback

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>

---------

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
2024-11-06 09:09:57 +00:00
George Robinson
5979dff9dc Show muted alerts in Alert Groups API (#3797)
This commit updates /api/v2/alerts/groups to show if an alert is
suppressed from one or more active or mute time intervals. While
the muted by field can be found in /api/v2/alerts, it is not
used here because /api/v2/alerts does not take aggregation
or routing into consideration.

It also updates the UI to support filtering muted alerts via the
Muted checkbox.

Signed-off-by: George Robinson <george.robinson@grafana.com>
2024-10-23 16:42:21 +01:00
George Robinson
c4a763c401 #3513: Mark muted alerts (#3793)
* Mark muted groups

This commit updates TimeMuteStage and TimeActiveStage to mark groups
as muted when its alerts are muted by an active or mute time interval,
and remove any existing markers when outside all active and mute
time intervals.

Signed-off-by: George Robinson <george.robinson@grafana.com>

* Remove unlock to defer

Signed-off-by: George Robinson <george.robinson@grafana.com>

---------

Signed-off-by: George Robinson <george.robinson@grafana.com>
2024-05-13 11:16:26 +01:00
George Robinson
ca4c90eb4e Fix race condition in dispatch.go (#3826)
* Fix race condition in dispatch.go

This commit fixes a race condition in dispatch.go that would cause
a firing alert to be deleted from the aggregation group when instead
it should have been flushed.

The root cause is a race condition that can occur when dispatch.go
deletes resolved alerts from the aggregation group following a
successful notification. If a firing alert with the same
fingerprint is added back to the aggregation group at the same time
then the firing alert can be deleted.

---------

Signed-off-by: George Robinson <george.robinson@grafana.com>
2024-05-07 10:34:03 +01:00
Matthieu MOREL
b9e347b9d1 golangci-lint: enable testifylint linter
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-12-10 08:50:03 +00:00
gotjosh
805e505288 Alert metric reports different results to what the user sees via API (#2943)
* Alert metric reports different results to what the user sees via API

Fixes #1439 and #2619.

The previous metric is not _technically_ reporting incorrect results as the alerts _are_ still around and will be re-used if that same alert (equal fingerprint) is received before it is GCed. Therefore, I have kept the old metric under a new name `alertmanager_marked_alerts` and repurpose the current metric to match what the user sees in the UI.

Signed-off-by: gotjosh <josue.abreu@gmail.com>
2022-06-16 12:16:06 +02:00
Matthias Loibl
a6d10bd5bc Update golangci-lint and fix complaints (#2853)
* Copy latest golangci-lint files from Prometheus

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* Use grafana/regexp over stdlib regexp

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* Fix typos in comments

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* Fix goimports complains in import sorting

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* gofumpt all Go files

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* Update naming to comply with revive linter

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* config: Fix error messages to be lower case

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* test/cli: Fix error messages to be lower case

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* .golangci.yaml: Remove obsolete space

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* config: Fix expected victorOps error

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* Use stdlib regexp

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* Clean up Go modules

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
2022-03-25 17:59:51 +01:00
Julien Pivotto
b2a4cacb95 Update go dependencies & switch to go-kit/log
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-08-02 12:43:23 +02:00
Peter Štibraný
0bb65d1e4b Reduce number of dispatched alerts to avoid hitting the limit on number of alive goroutines.
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
2021-06-02 15:28:00 +02:00
Peter Štibraný
b3ea60e9bb Fix compilation errors after rebase on master.
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
2021-06-02 15:14:55 +02:00
Peter Štibraný
358645cfe2 Extract TestGroupsWithLimits, and remove limit test from TestGroups.
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
2021-06-02 12:00:31 +02:00
Peter Štibraný
0f86edcf5c Extract TestGroupsWithLimits, and remove limit test from TestGroups.
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
2021-06-02 12:00:31 +02:00
Peter Štibraný
d5ed7bfb15 Only register limit metrics when they are used.
Limits are not used in standalone alertmanager.

Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
2021-06-02 12:00:31 +02:00
Peter Štibraný
390474ffbe Added group limit to dispatcher.
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
2021-06-02 12:00:31 +02:00
Peter Štibraný
cc0b08fd7c Added possibility to pass callback to *mem.NewAlerts, useful for implementing limits on alerts.
Update provider/mem/mem.go

Co-authored-by: Julien Pivotto <roidelapluie@gmail.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
2021-05-31 09:56:57 +02:00
Marco Pracucci
f84af78693 Lowered number of alert groups
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2021-05-11 16:15:46 +02:00
Marco Pracucci
1ad22c808f Added unit test
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2021-05-11 15:48:02 +02:00
Jacob Lisi
0c0c6bdb01 Fix race condition in dispatcher (#2208)
* fix dispatcher race condition

Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>

* add test to check for race condition in dispatcher

Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>

* return when dispatcher Stop has nil receiver

Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>

* remove unneeded chec

Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
2020-03-19 15:32:37 +01:00
Simon Pasquier
4f45457b9c dispatch: add metrics (#2113)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-11-26 09:04:56 +01:00
johncming
bad2e792ca dispatch: route group labels should contain group common label. (#2055)
Signed-off-by: johncming <johncming@yahoo.com>
2019-10-02 14:54:34 +02:00
Simon Pasquier
ab537b5b2f dispatch: fix missing receivers in Groups() (#1964)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-07-24 17:12:37 +02:00
Simon Pasquier
2ccb4707f1 dispatch: fix flaky test
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-04-23 11:16:48 +02:00
stuart nelson
2fa210d0e3 add groups endpoint to v2 api
Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
2019-04-17 11:32:21 +02:00
Brian Brazil
7078333202 Make a copy of firing alerts with EndsAt=0 when flushing. (#1686)
If the original EndsAt is left in place, then as time moves forwards
past the EndsAt then firing alerts will be rendered and treated as
resolved alerts which can cause confusion and races. This is most
likely to happen on retries for a notification.

Mitigate race and fix data races in TestAggrGroup.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-01-04 16:52:20 +01:00
kirillsablin
32bb289906 dispatch: Add group_by_all support (#1588)
To aggregate by all possible labels use '...' as the sole label name. 
This effectively disables aggregation entirely, passing through all 
alerts as-is. This is unlikely to be what you want, unless you have 
a very low alert volume or your upstream notification system performs 
its own grouping. Example: group_by: [...]

Signed-off-by: Kyryl Sablin <kyryl.sablin@schibsted.com>
2018-11-29 12:31:14 +01:00
Simon Pasquier
306fd73e32 *: remove use of golang.org/x/net/context
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-11-09 10:00:23 +01:00
Simon Pasquier
899226f3ac *: remove v1/alerts/groups API endpoint (#1525)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-08-23 16:03:49 +02:00
Simon Pasquier
0ebaeccd4b *: add missing license headers
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-05-14 17:37:13 +02:00
pasquier-s
c39a913f8a test: enable race detection (#1262)
This change enables race detection when running the tests. It also fixes
a couple of existing race conditions.
2018-02-27 18:18:53 +01:00
Julius Volz
947970af44 Convert Alertmanager to use non-global go-kit loggers
Fixes https://github.com/prometheus/alertmanager/issues/1040
2017-10-22 00:20:40 -07:00
Frederic Branczyk
5328885fe9 dispatch: fix race condition in dispatch test (#1025) 2017-10-04 18:01:23 +02:00
Corentin Chary
9b2afbf18b Make sure Matchers are always ordered
This fixes https://github.com/prometheus/alertmanager/issues/881
Also add some unit tests
2017-06-23 15:30:34 +02:00
Fabian Reinartz
3269bc39e1 *: switch group key to matcher serialization
Turn the GroupKey into a string that is composed of the matchers if the
path in the routing tree and the grouping labels.
Only hash it at the very end to ensure we don't exceed size limits of
integration APIs.
2017-04-21 12:06:23 +02:00
stuart nelson
1e34f29532 Filter alerts (#633)
* Vendor dependencies.

This updates several old dependencies, removes
some that are no longer needed, and adds
`pkg/labels` from prometheus `dev-2.0` branch.

* Add metrics selector parsing code

This is a temporary simplified re-implementation
of promQL's metric selector parsing.

* Add alerts filtering

Filter alerts through `?filter=` query string.

* Add silences filtering

Filter silences through `?filter=` query string.

* Move `parse` to `pkg/parse`
2017-03-16 11:16:10 +01:00
Fabian Reinartz
e9fbe62e0f *: consider mesh wait in notification timeouts
This adds the peer wait duration to the standard timeout to avoid
terminating a notification prematurely while being in failover
wait status.
2016-09-05 13:21:28 +02:00
Fabian Reinartz
998a9ce38e notify: rename Receiver to ReceiverName
This string value is initially used to store a receiver name. It is
later overloaded with a unique string identifier of <name, integration,
index>.
This renaming is in preparation to separate the two and use the Receiver
object of the nflogpb package.
2016-08-16 16:33:17 +02:00
Fabian Reinartz
3931d4e64b *: restructure package tree
This commit packages up individual modules and removes the top-level
main package.
2016-08-09 14:24:52 +02:00