1
0
mirror of https://github.com/openshift/openshift-docs.git synced 2026-02-05 12:46:18 +01:00
Files
openshift-docs/modules/log6x-enabling-loki-alerts.adoc
2025-04-11 10:06:24 +01:00

105 lines
4.2 KiB
Plaintext

// Module included in the following assemblies:
//
// observability/logging/logging-6.0/log6x-loki.adoc
// observability/logging/logging-6.2/log6x-loki-6.2.adoc
:_mod-docs-content-type: PROCEDURE
[id="logging-enabling-loki-alerts_{context}"]
= Creating a log-based alerting rule with Loki
The `AlertingRule` CR contains a set of specifications and webhook validation definitions to declare groups of alerting rules for a single `LokiStack` instance. In addition, the webhook validation definition provides support for rule validation conditions:
* If an `AlertingRule` CR includes an invalid `interval` period, it is an invalid alerting rule
* If an `AlertingRule` CR includes an invalid `for` period, it is an invalid alerting rule.
* If an `AlertingRule` CR includes an invalid LogQL `expr`, it is an invalid alerting rule.
* If an `AlertingRule` CR includes two groups with the same name, it is an invalid alerting rule.
* If none of the above applies, an alerting rule is considered valid.
.AlertingRule definitions
[options="header"]
|===
| Tenant type | Valid namespaces for `AlertingRule` CRs
| application a| `<your_application_namespace>`
| audit a| `openshift-logging`
| infrastructure a| `openshift-/\*`, `kube-/\*`, `default`
|===
.Procedure
. Create an `AlertingRule` custom resource (CR):
+
.Example infrastructure `AlertingRule` CR
[source,yaml]
----
apiVersion: loki.grafana.com/v1
kind: AlertingRule
metadata:
name: loki-operator-alerts
namespace: openshift-operators-redhat <1>
labels: <2>
openshift.io/<label_name>: "true"
spec:
tenantID: "infrastructure" <3>
groups:
- name: LokiOperatorHighReconciliationError
rules:
- alert: HighPercentageError
expr: | <4>
sum(rate({kubernetes_namespace_name="openshift-operators-redhat", kubernetes_pod_name=~"loki-operator-controller-manager.*"} |= "error" [1m])) by (job)
/
sum(rate({kubernetes_namespace_name="openshift-operators-redhat", kubernetes_pod_name=~"loki-operator-controller-manager.*"}[1m])) by (job)
> 0.01
for: 10s
labels:
severity: critical <5>
annotations:
summary: High Loki Operator Reconciliation Errors <6>
description: High Loki Operator Reconciliation Errors <7>
----
<1> The namespace where this `AlertingRule` CR is created must have a label matching the LokiStack `spec.rules.namespaceSelector` definition.
<2> The `labels` block must match the LokiStack `spec.rules.selector` definition.
<3> `AlertingRule` CRs for `infrastructure` tenants are only supported in the `openshift-\*`, `kube-\*`, or `default` namespaces.
<4> The value for `kubernetes_namespace_name:` must match the value for `metadata.namespace`.
<5> The value of this mandatory field must be `critical`, `warning`, or `info`.
<6> This field is mandatory.
<7> This field is mandatory.
+
.Example application `AlertingRule` CR
[source,yaml]
----
apiVersion: loki.grafana.com/v1
kind: AlertingRule
metadata:
name: app-user-workload
namespace: app-ns <1>
labels: <2>
openshift.io/<label_name>: "true"
spec:
tenantID: "application"
groups:
- name: AppUserWorkloadHighError
rules:
- alert:
expr: | <3>
sum(rate({kubernetes_namespace_name="app-ns", kubernetes_pod_name=~"podName.*"} |= "error" [1m])) by (job)
for: 10s
labels:
severity: critical <4>
annotations:
summary: <5>
description: <6>
----
<1> The namespace where this `AlertingRule` CR is created must have a label matching the LokiStack `spec.rules.namespaceSelector` definition.
<2> The `labels` block must match the LokiStack `spec.rules.selector` definition.
<3> Value for `kubernetes_namespace_name:` must match the value for `metadata.namespace`.
<4> The value of this mandatory field must be `critical`, `warning`, or `info`.
<5> The value of this mandatory field is a summary of the rule.
<6> The value of this mandatory field is a detailed description of the rule.
. Apply the `AlertingRule` CR:
+
[source,terminal]
----
$ oc apply -f <filename>.yaml
----