1
0
mirror of https://github.com/prometheus/alertmanager.git synced 2026-02-05 15:45:34 +01:00

47 Commits

Author SHA1 Message Date
Siavash Safi
c90d8707d5 feat(provider): implement per-alert limits (#4819)
* feat(limit): add new limit package with bucket

Add a new limit package with generic bucket implementation.
This can be used for example to limit the number of alerts in memory.

Benchmarks:
```go
goos: darwin
goarch: arm64
pkg: github.com/prometheus/alertmanager/limit
cpu: Apple M3 Pro
BenchmarkBucketUpsert/EmptyBucket-12  	 8816954	       122.4 ns/op	      56 B/op	       2 allocs/op
BenchmarkBucketUpsert/AddToFullBucketWithExpiredItems-12         	 9861010	       123.0 ns/op	      56 B/op	       2 allocs/op
BenchmarkBucketUpsert/AddToFullBucketWithActiveItems-12          	 8343778	       143.6 ns/op	      56 B/op	       2 allocs/op
BenchmarkBucketUpsert/UpdateExistingAlert-12                     	10107787	       118.9 ns/op	      56 B/op	       2 allocs/op
BenchmarkBucketUpsert/MixedWorkload-12                           	 9436174	       126.0 ns/op	      56 B/op	       2 allocs/op
BenchmarkBucketUpsertScaling/BucketSize_10-12                    	10255278	       115.4 ns/op	      56 B/op	       2 allocs/op
BenchmarkBucketUpsertScaling/BucketSize_50-12                    	10166518	       117.1 ns/op	      56 B/op	       2 allocs/op
BenchmarkBucketUpsertScaling/BucketSize_100-12                   	10457394	       115.0 ns/op	      56 B/op	       2 allocs/op
BenchmarkBucketUpsertScaling/BucketSize_500-12                   	 9644079	       115.2 ns/op	      56 B/op	       2 allocs/op
BenchmarkBucketUpsertScaling/BucketSize_1000-12                  	10426184	       116.6 ns/op	      56 B/op	       2 allocs/op
BenchmarkBucketUpsertConcurrent-12                               	 5796210	       216.3 ns/op	     406 B/op	       5 allocs/op
PASS
ok  	github.com/prometheus/alertmanager/limit	15.497s
```

Signed-off-by: Siavash Safi <siavash@cloudflare.com>

* feat(provider): implement per-alert limits

Use the new limit module to add optional per alert-name limits.
The metrics for limited alerts can be enabled using
`alerts-limited-metric` feature flag.

Signed-off-by: Siavash Safi <siavash@cloudflare.com>

---------

Signed-off-by: Siavash Safi <siavash@cloudflare.com>
2026-02-01 13:21:20 +01:00
Siavash Safi
5d9cce4599 fix(provider): reduce lock contention (#4809)
The provider loops over all alerts per state, which results in 3 loops
over all store alerts and one call per alert to the marker per loop.

Add a custom collector from counting alerts by state.
This reduces the number of calls to store and marker to 1/3.

Signed-off-by: Siavash Safi <siavash@cloudflare.com>
2025-12-18 08:14:59 +01:00
Siavash Safi
18939cee8f feat: add distributed tracing support (#4745)
Add tracing support using otel to the the following components:
- api: extract trace and span IDs from request context
- provider: mem put
- dispatch: split logic and use better naming
- inhibit: source and target traces, mutes, etc. drop metrics
- silence: query, expire, mutes
- notify: add distributed tracing support to stages and all http requests

Note: inhibitor metrics are dropped since we have tracing now and they
are not needed. We have not released any version with these metrics so
we can drop them safely, this is not a breaking change.

This change borrows part of the implementation from #3673
Fixes #3670

Signed-off-by: Dave Henderson <dhenderson@gmail.com>
Signed-off-by: Siavash Safi <siavash@cloudflare.com>
Co-authored-by: Dave Henderson <dhenderson@gmail.com>
2025-12-05 22:58:44 +01:00
Solomon Jacobs
e36a127f42 chore: don't depend on godebug (#4764)
We don't need to depend on this, so we shouldn't. The new error looks
this:
```
Error Trace:	/home/solomonjacobs/get/f-alertmanager/provider/mem/mem_test.go:224
Error:      	Received unexpected error:
                field `Labels` mismatch.
                 Expected: {bar="foo"}
                 Got: {bar="boo"}
                field `Annotations` mismatch.
                 Expected: {foo="bar"}
                 Got: {boo="bar"}
                field `UpdatedAt` mismatch.
                 Expected: 2025-11-22 20:48:30.75618754 +0100 CET m=+0.000942595
                 Got: 2025-11-22 20:48:30.85618754 +0100 CET m=+0.100942595
                field `Timeout` mismatch.
                 Expected: false
                 Got: true
```

Signed-off-by: Solomon Jacobs <solomonjacobs@protonmail.com>
2025-12-05 16:14:39 +01:00
Ben Kochie
f656273159 Enable modernize linter (#4750)
Enable the golangci-lint modernize linter.
* Add exception for the omitempty struct issue.
* Apply modernize fixes.

Signed-off-by: SuperQ <superq@gmail.com>
2025-11-18 11:46:15 +01:00
Solomon Jacobs
023885c52c chore: modernize with range over int (#4746)
A purely mechanical change with no effect on behaviour. Stops `gopls`
from complaining.

Corresponding spec, see here:
https://github.com/golang/go/issues/61405

Introduced in `Go 1.22`, see here:
https://go.dev/ref/spec#For_range

Signed-off-by: Solomon Jacobs <solomonjacobs@protonmail.com>
2025-11-18 09:43:43 +01:00
Ethan Hunter
ab315ea134 Add new behavior to avoid races on config reload (#4705)
* Add new behavior to avoid races on config reload
* Add context to Groups to allow timeouts

---------

Signed-off-by: Ethan Hunter <ehunter@hudson-trading.com>
2025-11-18 09:42:18 +01:00
Ben Kochie
616f03407e Use stdlib sync/atomic (#4744)
Since Go 1.19 now has enough functionality that we can use the
standard library `sync/atomic`.

Related to: https://github.com/prometheus/prometheus/issues/14866

Signed-off-by: SuperQ <superq@gmail.com>
2025-11-15 10:15:58 +01:00
Anand Rajagopal
ce8ce3ba9e Remove unnecessary lock from count() method (#4352)
Signed-off-by: Anand Rajagopal <anrajag@amazon.com>
2025-11-05 22:05:34 +01:00
Siavash Safi
f7ff687529 feat(provider): add subscriber channel metrics (#4630)
Add `alertmanager_alerts_subscriber_channel_writes_total` metric to
track the number of alerts written to subscriber channels.
A drop in the rate of this metric may indicate a problem with the
ingestion of alerts by subscribers (inhibitor and dispatcher).

Signed-off-by: Siavash Safi <siavash@cloudflare.com>
2025-10-28 12:01:43 +01:00
SuperQ
e3725c1290 Update Go
* Update Go build to 1.24.
* Update minimum Go version to 1.23.
* Fixup various testifylint issues.

Signed-off-by: SuperQ <superq@gmail.com>
2025-03-27 09:56:40 +01:00
TJ Hoplock
f6b942cf9b chore!: adopt log/slog, drop go-kit/log (#4089)
* chore!: adopt log/slog, drop go-kit/log

The bulk of this change set was automated by the following script which
is being used to aid in converting the various exporters/projects to use
slog:

https://gist.github.com/tjhop/49f96fb7ebbe55b12deee0b0312d8434

This commit includes several changes:
- bump exporter-tookit to v0.13.1 for log/slog support
- updates golangci-lint deprecated configs
- enables sloglint linter
- removes old go-kit/log linter configs
- introduce some `if logger == nil { $newLogger }` additions to prevent
  nil references
- converts cluster membership config to use a stdlib compatible slog
  adapter, rather than creating a custom io.Writer for use as the
membership `logOutput` config

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>

* chore: address PR feedback

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>

---------

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
2024-11-06 09:09:57 +00:00
SuperQ
e2c4e1e5cf Fix up linting issue
Fix `govet` linting issue: `printf: non-constant format string`.

Signed-off-by: SuperQ <superq@gmail.com>
2024-08-21 14:08:59 +02:00
Xiaochao Dong
91a94f00f9 Fix race conditions in the memory alerts store (#3648)
* Fix race conditions in the memory alerts store

Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com>

* Expose the GC method from store.Alerts

Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com>

* Use RLock/Unlock on read path

Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com>

* Resolve conflicts

Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com>

* release locks by using the defer

Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com>

* Revert the RWMutex back to Mutex

Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com>

---------

Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com>
2024-05-16 11:25:21 +01:00
George Robinson
d31a249ffc #3513: Add GroupMarker interface (#3792)
* Add GroupMarker interface

This commit adds a new GroupMarker interface that marks the status
of groups. For example, whether an alert is muted because or one
or more active or mute time intervals.

It renames the existing Marker interface to AlertMarker to avoid
confusion.

Signed-off-by: George Robinson <george.robinson@grafana.com>

---------

Signed-off-by: George Robinson <george.robinson@grafana.com>
2024-04-30 15:26:04 +01:00
Anand Rajagopal
680568b518 Send a slice of values to callback function instead of references (#3745) 2024-03-10 17:40:58 +00:00
Anand Rajagopal
1eb83c21eb A small fix to avoid deadlock that can happen as mentioned in issue #3682 (#3715)
Signed-off-by: Anand Rajagopal <anrajag@amazon.com>
2024-03-01 09:39:01 +00:00
gotjosh
f66bbab421 Fix tests after rebase
Signed-off-by: gotjosh <josue.abreu@gmail.com>
2022-06-17 13:20:21 +01:00
gotjosh
cfb909f419 Marker: Rename SetSilenced to SetActiveOrSilenced
This accurately reflects what the function _actually_ does. If no active silences IDs are provided and the list of inhibitions we have is already empty the alert is actually set to Active. Took me a while to realise this as I was understanding how do we populate the alert list.

Signed-off-by: gotjosh <josue.abreu@gmail.com>
2022-06-17 12:51:23 +01:00
gotjosh
805e505288 Alert metric reports different results to what the user sees via API (#2943)
* Alert metric reports different results to what the user sees via API

Fixes #1439 and #2619.

The previous metric is not _technically_ reporting incorrect results as the alerts _are_ still around and will be re-used if that same alert (equal fingerprint) is received before it is GCed. Therefore, I have kept the old metric under a new name `alertmanager_marked_alerts` and repurpose the current metric to match what the user sees in the UI.

Signed-off-by: gotjosh <josue.abreu@gmail.com>
2022-06-16 12:16:06 +02:00
Matthias Loibl
a6d10bd5bc Update golangci-lint and fix complaints (#2853)
* Copy latest golangci-lint files from Prometheus

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* Use grafana/regexp over stdlib regexp

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* Fix typos in comments

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* Fix goimports complains in import sorting

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* gofumpt all Go files

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* Update naming to comply with revive linter

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* config: Fix error messages to be lower case

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* test/cli: Fix error messages to be lower case

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* .golangci.yaml: Remove obsolete space

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* config: Fix expected victorOps error

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* Use stdlib regexp

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* Clean up Go modules

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
2022-03-25 17:59:51 +01:00
Julien Pivotto
b2a4cacb95 Update go dependencies & switch to go-kit/log
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-08-02 12:43:23 +02:00
Björn Rabenstein
fd0929ba9f Merge pull request #2627 from prometheus/release-0.22
Merge release branch back into master
2021-06-23 13:41:56 +02:00
Peter Štibraný
15ea220f45 Don't return error from mem.Alerts.Put.
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
2021-05-31 10:05:04 +02:00
Peter Štibraný
cc0b08fd7c Added possibility to pass callback to *mem.NewAlerts, useful for implementing limits on alerts.
Update provider/mem/mem.go

Co-authored-by: Julien Pivotto <roidelapluie@gmail.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
2021-05-31 09:56:57 +02:00
beorn7
e84c265196 Include pending silences for future muting decisions
Previously, if a pending silence existed for an alert, and it later
became active without any silences getting added in the meantime, we
would miss the existence of that newly active silence.

Signed-off-by: beorn7 <beorn@grafana.com>
2021-05-27 22:15:57 +02:00
Simon Pasquier
25b32434a6 store: fix potential flaky test (#2077)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-10-22 09:25:31 +02:00
Simon Pasquier
4535311c34 dispatch: don't garbage-collect alerts from store
The aggregation group is already responsible for removing the resolved
alerts. Running the garbage collection in parallel introduces a race and
eventually resolved notifications may be dropped.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-09-18 11:42:14 +02:00
Simon Pasquier
c78b449f4a provider/mem: fix dropped alerts
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-04-19 15:35:21 +02:00
Simon Pasquier
510cb2936f provider/mem: add test detecting dropped alerts
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-04-19 14:00:58 +02:00
Max Leonard Inden
09a7370572 main.go: Move marker metric registering into types/types.go
Instead of registering marker metrics inside of
cmd/alertmanager/main.go, register them in types/types.go, encapsulating
marker specific logic in its module, not in main.go. In addition it
paves the path for removing the usage of the global metric registry in
the future, by taking a local metric registerer.

Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2019-02-05 14:59:22 +01:00
Steve Winslow
8ca1f66a2d Fixed typo in license statement URL
Signed-off-by: Steve Winslow <swinslow@gmail.com>
2018-12-02 08:12:09 -05:00
stuart nelson
e883ccb9de pull out shared code for storing alerts (#1507)
Move the code for storing and GC'ing alerts from being re-implemented in
several packages to existing in its own package

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
2018-09-03 14:52:53 +02:00
Max Leonard Inden
1219541184 *.go: Introduce errcheck enforcing error handling
Errcheck [1] enforces error handling accross all go files. Functions can
be excluded via `scripts/errcheck_excludes.txt`.

This patch adds errcheck to the `test` Make target.

[1] https://github.com/kisielk/errcheck

Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2018-08-30 15:47:13 +02:00
Sergiusz Urbaniak
f9896e0162 provider/mem: cleanup closed listener in GC
... rather than in the Subscribe method. Currently the cleanup for a
given Alert subscription is done in a blocking goroutine, started in
the Subscribe method.

This simplifies it by moving the cleanup to the GC.

Additionally it simplifies the subscribe method by setting up the
buffered channel big enough to fill it up with all pending alerts
preventing the necessity to start a goroutine in Subscribe at all.

Signed-off-by: Sergiusz Urbaniak <sergiusz.urbaniak@gmail.com>
2018-08-13 09:35:11 +02:00
Max Inden
d4788ed195 provider/mem: Add Put Subscribe starvation test (#1503)
TestAlertsSubscribePutStarvation tests starvation of `iterator.Close` and
`alerts.Put`. Both `Subscribe` and `Put` use the Alerts.mtx lock. `Subscribe`
needs it to subscribe and more importantly unsubscribe `Alerts.listeners`.
`Put` uses the lock to add additional alerts and iterate the `Alerts.listeners`
map.  If the channel of a listener is at its limit, `alerts.Lock` is blocked,
whereby a listener can not unsubscribe as the lock is hold by `alerts.Lock`.

Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2018-08-06 16:00:17 +02:00
wangYue
0fc0ff8e71 Avoid listener blocking (#1482)
Signed-off-by: wangyue <wangyue@actiontech.com>
2018-08-06 13:24:21 +02:00
pasquier-s
7b80919b36 Remove unused code (#1272) 2018-03-03 11:07:47 +01:00
pasquier-s
e8a92f65ef Run staticcheck as part of the build process (#1264)
This change also fixes potential issues highlighted by running
staticcheck.
2018-02-28 17:42:32 +01:00
pasquier-s
29e441f88f Fix miscellaneous issues revealed by Go 1.10 (#1256)
* provider/mem: fix format verbs in tests

* api: fix format verb
2018-02-22 14:57:45 +00:00
Stuart Nelson
b45c11b561 Fix tests 2018-01-21 15:38:19 +01:00
Jose Donizetti
2fe013bcaa Add tests to memory provider (#1104) 2018-01-21 15:27:21 +01:00
stuart nelson
69b97058f6 Fix tests 2017-12-19 15:43:23 +01:00
stuart nelson
481eab7b83 Make alertGC interval configurable 2017-12-19 15:36:38 +01:00
Fabian Reinartz
8170206070 Fix alert status handling in UI 2017-05-08 12:56:03 +02:00
stuart nelson
6a909abf17 Add processing status field to alert 2017-04-27 14:18:52 +02:00
Fabian Reinartz
6a20296af4 *: fixup, remove bolt provider 2016-08-09 14:17:50 +02:00