mirror of
https://github.com/openshift/openshift-docs.git
synced 2026-02-05 12:46:18 +01:00
Merge pull request #100456 from openshift/revert-100033-OBSDOCS-2350
Revert "OBSDOCS-2350 Remove content in OCP and point at standalone docs"
This commit is contained in:
@@ -2937,8 +2937,34 @@ Topics:
|
||||
Dir: cluster_observability_operator
|
||||
Distros: openshift-enterprise,openshift-origin
|
||||
Topics:
|
||||
- Name: Cluster Observability Operator release notes
|
||||
File: cluster-observability-operator-release-notes
|
||||
- Name: Cluster Observability Operator overview
|
||||
File: cluster-observability-operator-overview
|
||||
- Name: Installing the Cluster Observability Operator
|
||||
File: installing-the-cluster-observability-operator
|
||||
- Name: Configuring the Cluster Observability Operator to monitor a service
|
||||
File: configuring-the-cluster-observability-operator-to-monitor-a-service
|
||||
- Name: Observability UI plugins
|
||||
Dir: ui_plugins
|
||||
Distros: openshift-enterprise,openshift-origin
|
||||
Topics:
|
||||
- Name: Observability UI plugins overview
|
||||
File: observability-ui-plugins-overview
|
||||
- Name: Monitoring UI plugin
|
||||
File: monitoring-ui-plugin
|
||||
- Name: Logging UI plugin
|
||||
File: logging-ui-plugin
|
||||
- Name: Distributed tracing UI plugin
|
||||
File: distributed-tracing-ui-plugin
|
||||
- Name: Troubleshooting UI plugin
|
||||
File: troubleshooting-ui-plugin
|
||||
# - Name: Dashboard UI plugin
|
||||
# File: dashboard-ui-plugin
|
||||
- Name: Monitoring API reference
|
||||
File: api-monitoring-package
|
||||
# - Name: Observability API reference
|
||||
# File: api-observability-package
|
||||
- Name: Monitoring
|
||||
Dir: monitoring
|
||||
Distros: openshift-enterprise,openshift-origin
|
||||
|
||||
36
modules/coo-advantages.adoc
Normal file
36
modules/coo-advantages.adoc
Normal file
@@ -0,0 +1,36 @@
|
||||
// Module included in the following assemblies:
|
||||
// * observability/cluster_observability_operator/cluster-observability-operator-overview.adoc
|
||||
|
||||
:_mod-docs-content-type: CONCEPT
|
||||
[id="coo-advantages_{context}"]
|
||||
= Key advantages of using {coo-short}
|
||||
|
||||
Deploying {coo-short} helps you address monitoring requirements that are hard to achieve using the default monitoring stack.
|
||||
|
||||
[id="coo-advantages-extensibility_{context}"]
|
||||
== Extensibility
|
||||
|
||||
- You can add more metrics to a {coo-short}-deployed monitoring stack, which is not possible with core platform monitoring without losing support.
|
||||
- You can receive cluster-specific metrics from core platform monitoring through federation.
|
||||
- {coo-short} supports advanced monitoring scenarios like trend forecasting and anomaly detection.
|
||||
|
||||
[id="coo-advantages-multi-tenancy_{context}"]
|
||||
== Multi-tenancy support
|
||||
|
||||
- You can create monitoring stacks per user namespace.
|
||||
- You can deploy multiple stacks per namespace or a single stack for multiple namespaces.
|
||||
- {coo-short} enables independent configuration of alerts and receivers for different teams.
|
||||
|
||||
[id="coo-advantages-scalability_{context}"]
|
||||
== Scalability
|
||||
|
||||
- Supports multiple monitoring stacks on a single cluster.
|
||||
- Enables monitoring of large clusters through manual sharding.
|
||||
- Addresses cases where metrics exceed the capabilities of a single Prometheus instance.
|
||||
|
||||
[id="coo-advantages-scalabilityflexibility_{context}"]
|
||||
== Flexibility
|
||||
|
||||
- Decoupled from {product-title} release cycles.
|
||||
- Faster release iterations and rapid response to changing requirements.
|
||||
- Independent management of alerting rules.
|
||||
200
modules/coo-dashboard-ui-plugin-configure.adoc
Normal file
200
modules/coo-dashboard-ui-plugin-configure.adoc
Normal file
@@ -0,0 +1,200 @@
|
||||
// Module included in the following assemblies:
|
||||
|
||||
// * observability/cluster_observability_operator/ui_plugins/dashboard-ui-plugin.adoc
|
||||
|
||||
:_mod-docs-content-type: PROCEDURE
|
||||
[id="coo-dashboard-ui-plugin-configure-_{context}"]
|
||||
= Configuring a dashboard
|
||||
|
||||
The dashboard UI plugin searches for datasources from `ConfigMap` resources in the `openshift-config-managed` namespace, that have the label `console.openshift.io/dashboard-datasource: 'true'`. The `ConfigMap` resource must define a datasource type and an in-cluster service where the data can be fetched.
|
||||
|
||||
The examples in the following section are taken from link:https://github.com/openshift/console-dashboards-plugin[https://github.com/openshift/console-dashboards-plugin].
|
||||
|
||||
.Prerequisites
|
||||
|
||||
* You have access to the cluster as a user with the `cluster-admin` cluster role.
|
||||
* You have logged in to the {product-title} web console.
|
||||
* You have installed the {coo-full}.
|
||||
* You have installed the dashboard UI plugin.
|
||||
|
||||
.Procedure
|
||||
|
||||
. Create a `ConfigMap` resource in the `openshift-config-managed` namespace, with the label `console.openshift.io/dashboard-datasource: 'true'`. The example below is from link:https://github.com/openshift/console-dashboards-plugin/blob/main/docs/prometheus-datasource-example.yaml[prometheus-datasource-example.yaml]
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: v1
|
||||
kind: ConfigMap
|
||||
metadata:
|
||||
name: cluster-prometheus-proxy
|
||||
namespace: openshift-config-managed
|
||||
labels:
|
||||
console.openshift.io/dashboard-datasource: "true"
|
||||
data:
|
||||
"dashboard-datasource.yaml": |-
|
||||
kind: "Datasource"
|
||||
metadata:
|
||||
name: "cluster-prometheus-proxy"
|
||||
project: "openshift-config-managed"
|
||||
spec:
|
||||
plugin:
|
||||
kind: "prometheus"
|
||||
spec:
|
||||
direct_url: "https://prometheus-k8s.openshift-monitoring.svc.cluster.local:9091"
|
||||
----
|
||||
|
||||
. Configure a custom dashboard that connects to the datasource. The YAML for a sample dashboard is available at link:https://github.com/openshift/console-dashboards-plugin/blob/main/docs/prometheus-dashboard-example.yaml[prometheus-dashboard-example.yaml]. An excerpt from that file is shown below for demonstration purposes:
|
||||
+
|
||||
.Extract from example dashboard, taken from prometheus-dashboard-example.yaml
|
||||
[%collapsible]
|
||||
====
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: v1
|
||||
kind: ConfigMap
|
||||
metadata:
|
||||
name: dashboard-example
|
||||
namespace: openshift-config-managed
|
||||
labels:
|
||||
console.openshift.io/dashboard: "true"
|
||||
data:
|
||||
k8s-resources-workloads-namespace.json: |-
|
||||
{
|
||||
"annotations": {
|
||||
"list": [
|
||||
|
||||
]
|
||||
},
|
||||
"editable": true,
|
||||
"gnetId": null,
|
||||
"graphTooltip": 0,
|
||||
"hideControls": false,
|
||||
"links": [
|
||||
|
||||
],
|
||||
"refresh": "10s",
|
||||
"rows": [
|
||||
{
|
||||
"collapse": false,
|
||||
"height": "250px",
|
||||
"panels": [
|
||||
{
|
||||
"aliasColors": {
|
||||
|
||||
},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": {
|
||||
"name": "cluster-prometheus-proxy",
|
||||
"type": "prometheus"
|
||||
},
|
||||
"fill": 10,
|
||||
"id": 1,
|
||||
"interval": "1m",
|
||||
"legend": {
|
||||
"alignAsTable": true,
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": true,
|
||||
"show": true,
|
||||
"total": false,
|
||||
"values": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 0,
|
||||
"links": [
|
||||
|
||||
],
|
||||
"nullPointMode": "null as zero",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [
|
||||
{
|
||||
"alias": "quota - requests",
|
||||
"color": "#F2495C",
|
||||
"dashes": true,
|
||||
"fill": 0,
|
||||
"hiddenSeries": true,
|
||||
"hideTooltip": true,
|
||||
"legend": true,
|
||||
"linewidth": 2,
|
||||
"stack": false
|
||||
},
|
||||
{
|
||||
"alias": "quota - limits",
|
||||
"color": "#FF9830",
|
||||
"dashes": true,
|
||||
"fill": 0,
|
||||
"hiddenSeries": true,
|
||||
"hideTooltip": true,
|
||||
"legend": true,
|
||||
"linewidth": 2,
|
||||
"stack": false
|
||||
}
|
||||
],
|
||||
"spaceLength": 10,
|
||||
"span": 12,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum( node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{cluster=\"$cluster\", namespace=\"$namespace\"}* on(namespace,pod) group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}) by (workload, workload_type)",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "{{workload}} - {{workload_type}}",
|
||||
"legendLink": null,
|
||||
"step": 10
|
||||
},
|
||||
{
|
||||
"expr": "scalar(kube_resourcequota{cluster=\"$cluster\", namespace=\"$namespace\", type=\"hard\",resource=\"requests.cpu\"})",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "quota - requests",
|
||||
"legendLink": null,
|
||||
"step": 10
|
||||
},
|
||||
{
|
||||
"expr": "scalar(kube_resourcequota{cluster=\"$cluster\", namespace=\"$namespace\", type=\"hard\",resource=\"limits.cpu\"})",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "quota - limits",
|
||||
"legendLink": null,
|
||||
"step": 10
|
||||
}
|
||||
],
|
||||
"thresholds": [
|
||||
|
||||
],
|
||||
"timeFrom": null,
|
||||
"timeShift": null,
|
||||
"title": "CPU Usage",
|
||||
"tooltip": {
|
||||
"shared": false,
|
||||
"sort": 2,
|
||||
"value_type": "individual"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"buckets": null,
|
||||
"mode": "time",
|
||||
"name": null,
|
||||
"show": true,
|
||||
"values": [
|
||||
|
||||
]
|
||||
},
|
||||
...
|
||||
----
|
||||
====
|
||||
|
||||
. Click *Observe* -> *Dashboards* and the custom dashboard is available with the title ++** DASHBOARD EXAMPLE **++, based on the configuration in `prometheus-dashboard-example.yaml`.
|
||||
+
|
||||
image::coo-custom-dashboard.png[]
|
||||
+
|
||||
You can set the namespace, time range and refresh interval for the dashboard in the UI.
|
||||
|
||||
30
modules/coo-dashboard-ui-plugin-install.adoc
Normal file
30
modules/coo-dashboard-ui-plugin-install.adoc
Normal file
@@ -0,0 +1,30 @@
|
||||
// Module included in the following assemblies:
|
||||
|
||||
// * observability/cluster_observability_operator/ui_plugins/dashboard-ui-plugin.adoc
|
||||
|
||||
:_mod-docs-content-type: PROCEDURE
|
||||
[id="coo-dashboard-ui-plugin-install-_{context}"]
|
||||
= Installing the {coo-full} dashboard UI plugin
|
||||
|
||||
|
||||
.Prerequisites
|
||||
|
||||
* You have access to the cluster as a user with the `cluster-admin` cluster role.
|
||||
* You have logged in to the {product-title} web console.
|
||||
* You have installed the {coo-full}.
|
||||
|
||||
.Procedure
|
||||
|
||||
. In the {product-title} web console, click *Ecosystem* -> *Installed Operators* and select {coo-full}.
|
||||
. Choose the *UI Plugin* tab (at the far right of the tab list) and press *Create UIPlugin*.
|
||||
. Select *YAML view*, enter the following content, and then press *Create*:
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: observability.openshift.io/v1alpha1
|
||||
kind: UIPlugin
|
||||
metadata:
|
||||
name: dashboards
|
||||
spec:
|
||||
type: Dashboards
|
||||
----
|
||||
30
modules/coo-distributed-tracing-ui-plugin-install.adoc
Normal file
30
modules/coo-distributed-tracing-ui-plugin-install.adoc
Normal file
@@ -0,0 +1,30 @@
|
||||
// Module included in the following assemblies:
|
||||
|
||||
// * observability/cluster_observability_operator/ui_plugins/distributed-tracing-ui-plugin.adoc
|
||||
|
||||
:_mod-docs-content-type: PROCEDURE
|
||||
[id="coo-distributed-tracing-ui-plugin-install_{context}"]
|
||||
= Installing the {coo-full} distributed tracing UI plugin
|
||||
|
||||
|
||||
.Prerequisites
|
||||
|
||||
* You have access to the cluster as a user with the `cluster-admin` cluster role.
|
||||
* You have logged in to the {product-title} web console.
|
||||
* You have installed the {coo-full}
|
||||
|
||||
.Procedure
|
||||
|
||||
. In the {product-title} web console, click *Ecosystem* -> *Installed Operators* and select {coo-full}
|
||||
. Choose the *UI Plugin* tab (at the far right of the tab list) and press *Create UIPlugin*
|
||||
. Select *YAML view*, enter the following content, and then press *Create*:
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: observability.openshift.io/v1alpha1
|
||||
kind: UIPlugin
|
||||
metadata:
|
||||
name: distributed-tracing
|
||||
spec:
|
||||
type: DistributedTracing
|
||||
----
|
||||
27
modules/coo-distributed-tracing-ui-plugin-using.adoc
Normal file
27
modules/coo-distributed-tracing-ui-plugin-using.adoc
Normal file
@@ -0,0 +1,27 @@
|
||||
// Module included in the following assemblies:
|
||||
|
||||
// * observability/cluster_observability_operator/ui_plugins/distributed-tracing-ui-plugin.adoc
|
||||
|
||||
:_mod-docs-content-type: PROCEDURE
|
||||
[id="coo-distributed-tracing-ui-plugin-using_{context}"]
|
||||
= Using the {coo-full} distributed tracing UI plugin
|
||||
|
||||
.Prerequisites
|
||||
|
||||
* You have access to the cluster as a user with the `cluster-admin` cluster role.
|
||||
* You have logged in to the {product-title} web console.
|
||||
* You have installed the {coo-full}.
|
||||
* You have installed the {coo-full} distributed tracing UI plugin.
|
||||
* You have a `TempoStack` or `TempoMonolithic` multi-tenant instance in the cluster.
|
||||
|
||||
.Procedure
|
||||
|
||||
. In the {product-title} web console, click **Observe** → **Traces**.
|
||||
. Select a `TempoStack` or `TempoMonolithic` multi-tenant instance and set a time range and query for the traces to be loaded.
|
||||
+
|
||||
The traces are displayed on a scatter-plot showing the trace start time, duration, and number of spans. Underneath the scatter plot, there is a list of traces showing information such as the `Trace Name`, number of `Spans`, and `Duration`.
|
||||
. Click on a trace name link.
|
||||
+
|
||||
The trace detail page for the selected trace contains a Gantt Chart of all of the spans within the trace. Select a span to show a breakdown of the configured attributes.
|
||||
|
||||
|
||||
16
modules/coo-incident-detection-overview.adoc
Normal file
16
modules/coo-incident-detection-overview.adoc
Normal file
@@ -0,0 +1,16 @@
|
||||
// Module included in the following assemblies:
|
||||
|
||||
// * observability/cluster_observability_operator/ui_plugins/incident-detection-ui-plugin.adoc
|
||||
|
||||
:_mod-docs-content-type: CONCEPT
|
||||
[id="coo-incident-detection-overview_{context}"]
|
||||
= {coo-full} incident detection overview
|
||||
|
||||
Clusters can generate significant volumes of monitoring data, making it hard for you to distinguish critical signals from noise.
|
||||
Single incidents can trigger a cascade of alerts, and this results in extended time to detect and resolve issues.
|
||||
|
||||
The {coo-full} incident detection feature groups related alerts into *incidents*. These incidents are then visualized as timelines that are color-coded by severity.
|
||||
Alerts are mapped to specific components, grouped by severity, helping you to identify root causes by focusing on high impact components first.
|
||||
You can then drill down from the incident timelines to individual alerts to determine how to fix the underlying issue.
|
||||
|
||||
{coo-full} incident detection transforms the alert storm into clear steps for faster understanding and resolution of the incidents that occur on your clusters.
|
||||
58
modules/coo-incident-detection-using.adoc
Normal file
58
modules/coo-incident-detection-using.adoc
Normal file
@@ -0,0 +1,58 @@
|
||||
// Module included in the following assemblies:
|
||||
|
||||
// * observability/cluster_observability_operator/ui_plugins/incident-detection-ui-plugin.adoc
|
||||
|
||||
:_mod-docs-content-type: PROCEDURE
|
||||
[id="coo-incident-detection-using_{context}"]
|
||||
= Using {coo-full} incident detection
|
||||
|
||||
.Prerequisites
|
||||
|
||||
* You have access to the cluster as a user with the `cluster-admin` cluster role.
|
||||
* You have logged in to the {product-title} web console.
|
||||
* You have installed the {coo-full}.
|
||||
* You have installed the {coo-full} monitoring UI plugin with incident detection enabled.
|
||||
|
||||
|
||||
.Procedure
|
||||
|
||||
. In the Administrator perspective of the web console, click on *Observe* -> *Incidents*.
|
||||
|
||||
. The Incidents Timeline UI shows the grouping of alerts into *incidents*. The color coding of the lines in the graph corresponds to the severity of the incident. By default, a seven day timeline is presented.
|
||||
+
|
||||
image::coo-incidents-timeline-weekly.png[Weekly incidents timeline]
|
||||
+
|
||||
[NOTE]
|
||||
====
|
||||
It will take at least 10 minutes to process the correlations and to see the timeline, after you enable incident detection.
|
||||
|
||||
The analysis and grouping into incidents is performed only for alerts that are firing after you have enabled this feature. Alerts that have been resolved before feature enablement are not included.
|
||||
====
|
||||
|
||||
. Zoom in to a 1-day view by clicking on the drop-down to specify the duration.
|
||||
+
|
||||
image::coo-incidents-timeline-daily.png[Daily incidents timeline]
|
||||
|
||||
. By clicking on an incident, you can see the timeline of alerts that are part of that incident, in the Alerts Timeline UI.
|
||||
+
|
||||
image::coo-incident-alerts-timeline.png[Incidents alerts timeline]
|
||||
|
||||
. In the list of alerts that follows, alerts are mapped to specific components, which are grouped by severity.
|
||||
+
|
||||
image::coo-incident-alerts-components.png[Incidents alerts components]
|
||||
|
||||
. Click to expand a compute component in the list. The underlying alerts related to that component are displayed.
|
||||
+
|
||||
image::coo-incident-alerts-components-expanded.png[Incidents expanded components]
|
||||
|
||||
. Click the link for a firing alert, to see detailed information about that alert.
|
||||
|
||||
|
||||
|
||||
[NOTE]
|
||||
====
|
||||
**Known issues**
|
||||
|
||||
* Depending on the order of the timeline bars, the tooltip might overlap and hide the underlying bar. You can still click the bar and select the incident or alert.
|
||||
|
||||
====
|
||||
48
modules/coo-logging-ui-plugin-install.adoc
Normal file
48
modules/coo-logging-ui-plugin-install.adoc
Normal file
@@ -0,0 +1,48 @@
|
||||
// Module included in the following assemblies:
|
||||
|
||||
// * observability/cluster_observability_operator/ui_plugins/logging-ui-plugin.adoc
|
||||
|
||||
:_mod-docs-content-type: PROCEDURE
|
||||
[id="coo-logging-ui-plugin-install_{context}"]
|
||||
= Installing the {coo-full} logging UI plugin
|
||||
|
||||
.Prerequisites
|
||||
* You have access to the cluster as a user with the `cluster-admin` role.
|
||||
* You have logged in to the {product-title} web console.
|
||||
* You have installed the {coo-full}.
|
||||
* You have a `LokiStack` instance in your cluster.
|
||||
|
||||
|
||||
.Procedure
|
||||
. In the {product-title} web console, click *Ecosystem* -> *Installed Operators* and select {coo-full}.
|
||||
. Choose the *UI Plugin* tab (at the far right of the tab list) and click *Create UIPlugin*.
|
||||
. Select *YAML view*, enter the following content, and then click *Create*:
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: observability.openshift.io/v1alpha1
|
||||
kind: UIPlugin
|
||||
metadata:
|
||||
name: logging
|
||||
spec:
|
||||
type: Logging
|
||||
logging:
|
||||
lokiStack:
|
||||
name: logging-loki
|
||||
logsLimit: 50
|
||||
timeout: 30s
|
||||
schema: otel <1>
|
||||
----
|
||||
<1> `schema` is one of `otel`, `viaq`, or `select`. The default is `viaq` if no value is specified. When you choose `select`, you can select the mode in the UI when you run a query.
|
||||
+
|
||||
[NOTE]
|
||||
====
|
||||
These are the known issues for the logging UI plugin - for more information, see link:https://issues.redhat.com/browse/OU-587[OU-587].
|
||||
|
||||
* The `schema` feature is only supported in {product-title} 4.15 and later. In earlier versions of {product-title}, the logging UI plugin will only use the `viaq` attribute, ignoring any other values that might be set.
|
||||
|
||||
* Non-administrator users cannot query logs using the `otel` attribute with {logging} {for} versions 5.8 to 6.2. This issue will be fixed in a future {logging} release. (https://issues.redhat.com/browse/LOG-6589[LOG-6589])
|
||||
|
||||
* In {logging} {for} version 5.9, the `severity_text` Otel attribute is not set.
|
||||
====
|
||||
|
||||
42
modules/coo-monitoring-ui-plugin-install.adoc
Normal file
42
modules/coo-monitoring-ui-plugin-install.adoc
Normal file
@@ -0,0 +1,42 @@
|
||||
// Module included in the following assemblies:
|
||||
|
||||
// * observability/cluster_observability_operator/ui_plugins/monitoring-ui-plugin.adoc
|
||||
|
||||
:_mod-docs-content-type: PROCEDURE
|
||||
[id="coo-monitoring-ui-plugin-install_{context}"]
|
||||
= Installing the {coo-full} monitoring UI plugin
|
||||
|
||||
The monitoring UI plugin adds monitoring related UI features to the OpenShift web console, for the Advance Cluster Management (ACM) perspective and for incident detection.
|
||||
|
||||
.Prerequisites
|
||||
|
||||
* You have access to the cluster as a user with the `cluster-admin` cluster role.
|
||||
* You have logged in to the {product-title} web console.
|
||||
* You have installed the {coo-full}
|
||||
|
||||
.Procedure
|
||||
|
||||
. In the {product-title} web console, click *Ecosystem* -> *Installed Operators* and select {coo-full}
|
||||
. Choose the *UI Plugin* tab (at the far right of the tab list) and press *Create UIPlugin*
|
||||
. Select *YAML view*, enter the following content, and then press *Create*:
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: observability.openshift.io/v1alpha1
|
||||
kind: UIPlugin
|
||||
metadata:
|
||||
name: monitoring
|
||||
spec:
|
||||
type: Monitoring
|
||||
monitoring:
|
||||
acm: # <1>
|
||||
enabled: true
|
||||
alertmanager:
|
||||
url: 'https://alertmanager.open-cluster-management-observability.svc:9095'
|
||||
thanosQuerier:
|
||||
url: 'https://rbac-query-proxy.open-cluster-management-observability.svc:8443'
|
||||
incidents: # <2>
|
||||
enabled: true
|
||||
----
|
||||
<1> Enable {rh-rhacm} features. You must configure the Alertmanager and ThanosQuerier Service endpoints.
|
||||
<2> Enable incident detection features.
|
||||
302
modules/coo-server-side-apply.adoc
Normal file
302
modules/coo-server-side-apply.adoc
Normal file
@@ -0,0 +1,302 @@
|
||||
//Module included in the following assemblies:
|
||||
//
|
||||
// * observability/cluster_observability_operator/cluster-observability-operator-overview.adoc
|
||||
|
||||
:_mod-docs-content-type: PROCEDURE
|
||||
[id="server-side-apply_{context}"]
|
||||
= Using Server-Side Apply to customize Prometheus resources
|
||||
|
||||
Server-Side Apply is a feature that enables collaborative management of Kubernetes resources. The control plane tracks how different users and controllers manage fields within a Kubernetes object. It introduces the concept of field managers and tracks ownership of fields. This centralized control provides conflict detection and resolution, and reduces the risk of unintended overwrites.
|
||||
|
||||
Compared to Client-Side Apply, it is more declarative, and tracks field management instead of last applied state.
|
||||
|
||||
Server-Side Apply:: Declarative configuration management by updating a resource's state without needing to delete and recreate it.
|
||||
|
||||
Field management:: Users can specify which fields of a resource they want to update, without affecting the other fields.
|
||||
|
||||
Managed fields:: Kubernetes stores metadata about who manages each field of an object in the `managedFields` field within metadata.
|
||||
|
||||
Conflicts:: If multiple managers try to modify the same field, a conflict occurs. The applier can choose to overwrite, relinquish control, or share management.
|
||||
|
||||
Merge strategy:: Server-Side Apply merges fields based on the actor who manages them.
|
||||
|
||||
.Procedure
|
||||
|
||||
. Add a `MonitoringStack` resource using the following configuration:
|
||||
+
|
||||
.Example `MonitoringStack` object
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: monitoring.rhobs/v1alpha1
|
||||
kind: MonitoringStack
|
||||
metadata:
|
||||
labels:
|
||||
coo: example
|
||||
name: sample-monitoring-stack
|
||||
namespace: coo-demo
|
||||
spec:
|
||||
logLevel: debug
|
||||
retention: 1d
|
||||
resourceSelector:
|
||||
matchLabels:
|
||||
app: demo
|
||||
----
|
||||
|
||||
. A Prometheus resource named `sample-monitoring-stack` is generated in the `coo-demo` namespace. Retrieve the managed fields of the generated Prometheus resource by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc -n coo-demo get Prometheus.monitoring.rhobs -oyaml --show-managed-fields
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,yaml]
|
||||
----
|
||||
managedFields:
|
||||
- apiVersion: monitoring.rhobs/v1
|
||||
fieldsType: FieldsV1
|
||||
fieldsV1:
|
||||
f:metadata:
|
||||
f:labels:
|
||||
f:app.kubernetes.io/managed-by: {}
|
||||
f:app.kubernetes.io/name: {}
|
||||
f:app.kubernetes.io/part-of: {}
|
||||
f:ownerReferences:
|
||||
k:{"uid":"81da0d9a-61aa-4df3-affc-71015bcbde5a"}: {}
|
||||
f:spec:
|
||||
f:additionalScrapeConfigs: {}
|
||||
f:affinity:
|
||||
f:podAntiAffinity:
|
||||
f:requiredDuringSchedulingIgnoredDuringExecution: {}
|
||||
f:alerting:
|
||||
f:alertmanagers: {}
|
||||
f:arbitraryFSAccessThroughSMs: {}
|
||||
f:logLevel: {}
|
||||
f:podMetadata:
|
||||
f:labels:
|
||||
f:app.kubernetes.io/component: {}
|
||||
f:app.kubernetes.io/part-of: {}
|
||||
f:podMonitorSelector: {}
|
||||
f:replicas: {}
|
||||
f:resources:
|
||||
f:limits:
|
||||
f:cpu: {}
|
||||
f:memory: {}
|
||||
f:requests:
|
||||
f:cpu: {}
|
||||
f:memory: {}
|
||||
f:retention: {}
|
||||
f:ruleSelector: {}
|
||||
f:rules:
|
||||
f:alert: {}
|
||||
f:securityContext:
|
||||
f:fsGroup: {}
|
||||
f:runAsNonRoot: {}
|
||||
f:runAsUser: {}
|
||||
f:serviceAccountName: {}
|
||||
f:serviceMonitorSelector: {}
|
||||
f:thanos:
|
||||
f:baseImage: {}
|
||||
f:resources: {}
|
||||
f:version: {}
|
||||
f:tsdb: {}
|
||||
manager: observability-operator
|
||||
operation: Apply
|
||||
- apiVersion: monitoring.rhobs/v1
|
||||
fieldsType: FieldsV1
|
||||
fieldsV1:
|
||||
f:status:
|
||||
.: {}
|
||||
f:availableReplicas: {}
|
||||
f:conditions:
|
||||
.: {}
|
||||
k:{"type":"Available"}:
|
||||
.: {}
|
||||
f:lastTransitionTime: {}
|
||||
f:observedGeneration: {}
|
||||
f:status: {}
|
||||
f:type: {}
|
||||
k:{"type":"Reconciled"}:
|
||||
.: {}
|
||||
f:lastTransitionTime: {}
|
||||
f:observedGeneration: {}
|
||||
f:status: {}
|
||||
f:type: {}
|
||||
f:paused: {}
|
||||
f:replicas: {}
|
||||
f:shardStatuses:
|
||||
.: {}
|
||||
k:{"shardID":"0"}:
|
||||
.: {}
|
||||
f:availableReplicas: {}
|
||||
f:replicas: {}
|
||||
f:shardID: {}
|
||||
f:unavailableReplicas: {}
|
||||
f:updatedReplicas: {}
|
||||
f:unavailableReplicas: {}
|
||||
f:updatedReplicas: {}
|
||||
manager: PrometheusOperator
|
||||
operation: Update
|
||||
subresource: status
|
||||
----
|
||||
|
||||
. Check the `metadata.managedFields` values, and observe that some fields in `metadata` and `spec` are managed by the `MonitoringStack` resource.
|
||||
|
||||
. Modify a field that is not controlled by the `MonitoringStack` resource:
|
||||
|
||||
.. Change `spec.enforcedSampleLimit`, which is a field not set by the `MonitoringStack` resource. Create the file `prom-spec-edited.yaml`:
|
||||
+
|
||||
.`prom-spec-edited.yaml`
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: monitoring.rhobs/v1
|
||||
kind: Prometheus
|
||||
metadata:
|
||||
name: sample-monitoring-stack
|
||||
namespace: coo-demo
|
||||
spec:
|
||||
enforcedSampleLimit: 1000
|
||||
----
|
||||
|
||||
.. Apply the YAML by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc apply -f ./prom-spec-edited.yaml --server-side
|
||||
----
|
||||
+
|
||||
[NOTE]
|
||||
====
|
||||
You must use the `--server-side` flag.
|
||||
====
|
||||
|
||||
.. Get the changed Prometheus object and note that there is one more section in `managedFields` which has `spec.enforcedSampleLimit`:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc get prometheus -n coo-demo
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,yaml]
|
||||
----
|
||||
managedFields: <1>
|
||||
- apiVersion: monitoring.rhobs/v1
|
||||
fieldsType: FieldsV1
|
||||
fieldsV1:
|
||||
f:metadata:
|
||||
f:labels:
|
||||
f:app.kubernetes.io/managed-by: {}
|
||||
f:app.kubernetes.io/name: {}
|
||||
f:app.kubernetes.io/part-of: {}
|
||||
f:spec:
|
||||
f:enforcedSampleLimit: {} <2>
|
||||
manager: kubectl
|
||||
operation: Apply
|
||||
----
|
||||
<1> `managedFields`
|
||||
<2> `spec.enforcedSampleLimit`
|
||||
|
||||
. Modify a field that is managed by the `MonitoringStack` resource:
|
||||
.. Change `spec.LogLevel`, which is a field managed by the `MonitoringStack` resource, using the following YAML configuration:
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
# changing the logLevel from debug to info
|
||||
apiVersion: monitoring.rhobs/v1
|
||||
kind: Prometheus
|
||||
metadata:
|
||||
name: sample-monitoring-stack
|
||||
namespace: coo-demo
|
||||
spec:
|
||||
logLevel: info <1>
|
||||
----
|
||||
<1> `spec.logLevel` has been added
|
||||
|
||||
.. Apply the YAML by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc apply -f ./prom-spec-edited.yaml --server-side
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
error: Apply failed with 1 conflict: conflict with "observability-operator": .spec.logLevel
|
||||
Please review the fields above--they currently have other managers. Here
|
||||
are the ways you can resolve this warning:
|
||||
* If you intend to manage all of these fields, please re-run the apply
|
||||
command with the `--force-conflicts` flag.
|
||||
* If you do not intend to manage all of the fields, please edit your
|
||||
manifest to remove references to the fields that should keep their
|
||||
current managers.
|
||||
* You may co-own fields by updating your manifest to match the existing
|
||||
value; in this case, you'll become the manager if the other manager(s)
|
||||
stop managing the field (remove it from their configuration).
|
||||
See https://kubernetes.io/docs/reference/using-api/server-side-apply/#conflicts
|
||||
----
|
||||
|
||||
.. Notice that the field `spec.logLevel` cannot be changed using Server-Side Apply, because it is already managed by `observability-operator`.
|
||||
|
||||
.. Use the `--force-conflicts` flag to force the change.
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc apply -f ./prom-spec-edited.yaml --server-side --force-conflicts
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
prometheus.monitoring.rhobs/sample-monitoring-stack serverside-applied
|
||||
----
|
||||
+
|
||||
With `--force-conflicts` flag, the field can be forced to change, but since the same field is also managed by the `MonitoringStack` resource, the Observability Operator detects the change, and reverts it back to the value set by the `MonitoringStack` resource.
|
||||
+
|
||||
[NOTE]
|
||||
====
|
||||
Some Prometheus fields generated by the `MonitoringStack` resource are influenced by the fields in the `MonitoringStack` `spec` stanza, for example, `logLevel`. These can be changed by changing the `MonitoringStack` `spec`.
|
||||
====
|
||||
|
||||
.. To change the `logLevel` in the Prometheus object, apply the following YAML to change the `MonitoringStack` resource:
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: monitoring.rhobs/v1alpha1
|
||||
kind: MonitoringStack
|
||||
metadata:
|
||||
name: sample-monitoring-stack
|
||||
labels:
|
||||
coo: example
|
||||
spec:
|
||||
logLevel: info
|
||||
----
|
||||
|
||||
.. To confirm that the change has taken place, query for the log level by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc -n coo-demo get Prometheus.monitoring.rhobs -o=jsonpath='{.items[0].spec.logLevel}'
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
info
|
||||
----
|
||||
|
||||
|
||||
[NOTE]
|
||||
====
|
||||
. If a new version of an Operator generates a field that was previously generated and controlled by an actor, the value set by the actor will be overridden.
|
||||
+
|
||||
For example, you are managing a field `enforcedSampleLimit` which is not generated by the `MonitoringStack` resource. If the Observability Operator is upgraded, and the new version of the Operator generates a value for `enforcedSampleLimit`, this will overide the value you have previously set.
|
||||
|
||||
. The `Prometheus` object generated by the `MonitoringStack` resource may contain some fields which are not explicitly set by the monitoring stack. These fields appear because they have default values.
|
||||
====
|
||||
23
modules/coo-target-users.adoc
Normal file
23
modules/coo-target-users.adoc
Normal file
@@ -0,0 +1,23 @@
|
||||
// Module included in the following assemblies:
|
||||
// * observability/cluster_observability_operator/cluster-observability-operator-overview.adoc
|
||||
|
||||
:_mod-docs-content-type: CONCEPT
|
||||
[id="coo-target-users_{context}"]
|
||||
= Target users for {coo-short}
|
||||
|
||||
{coo-short} is ideal for users who need high customizability, scalability, and long-term data retention, especially in complex, multi-tenant enterprise environments.
|
||||
|
||||
[id="coo-target-users-enterprise_{context}"]
|
||||
== Enterprise-level users and administrators
|
||||
|
||||
Enterprise users require in-depth monitoring capabilities for {product-title} clusters, including advanced performance analysis, long-term data retention, trend forecasting, and historical analysis. These features help enterprises better understand resource usage, prevent performance issues, and optimize resource allocation.
|
||||
|
||||
[id="coo-target-users-multi-tenant_{context}"]
|
||||
== Operations teams in multi-tenant environments
|
||||
|
||||
With multi-tenancy support, {coo-short} allows different teams to configure monitoring views for their projects and applications, making it suitable for teams with flexible monitoring needs.
|
||||
|
||||
[id="coo-target-users-devops_{context}"]
|
||||
== Development and operations teams
|
||||
|
||||
{coo-short} provides fine-grained monitoring and customizable observability views for in-depth troubleshooting, anomaly detection, and performance tuning during development and operations.
|
||||
46
modules/coo-troubleshooting-ui-plugin-creating-alert.adoc
Normal file
46
modules/coo-troubleshooting-ui-plugin-creating-alert.adoc
Normal file
@@ -0,0 +1,46 @@
|
||||
// Module included in the following assemblies:
|
||||
|
||||
// * observability/cluster_observability_operator/ui_plugins/troubleshooting-ui-plugin.adoc
|
||||
|
||||
:_mod-docs-content-type: PROCEDURE
|
||||
[id="coo-troubleshooting-ui-plugin-creating-alert_{context}"]
|
||||
= Creating the example alert
|
||||
|
||||
|
||||
To trigger an alert as a starting point to use in the troubleshooting UI panel, you can deploy a container that is deliberately misconfigured.
|
||||
|
||||
.Procedure
|
||||
|
||||
. Use the following YAML, either from the command line or in the web console, to create a broken deployment in a system namespace:
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: bad-deployment
|
||||
namespace: default <1>
|
||||
spec:
|
||||
selector:
|
||||
matchLabels:
|
||||
app: bad-deployment
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: bad-deployment
|
||||
spec:
|
||||
containers: <2>
|
||||
- name: bad-deployment
|
||||
image: quay.io/openshift-logging/vector:5.8
|
||||
----
|
||||
<1> The deployment must be in a system namespace (such as `default`) to cause the desired alerts.
|
||||
<2> This container deliberately tries to start a `vector` server with no configuration file. The server logs a few messages, and then exits with an error. Alternatively, you can deploy any container you like that is badly configured, causing it to trigger an alert.
|
||||
|
||||
. View the alerts:
|
||||
.. Go to *Observe* -> *Alerting* and click *clear all filters*. View the `Pending` alerts.
|
||||
+
|
||||
[IMPORTANT]
|
||||
====
|
||||
Alerts first appear in the `Pending` state. They do not start `Firing` until the container has been crashing for some time. By viewing `Pending` alerts, you do not have to wait as long to see them occur.
|
||||
====
|
||||
.. Choose one of the `KubeContainerWaiting`, `KubePodCrashLooping`, or `KubePodNotReady` alerts and open the troubleshooting panel by clicking on the link. Alternatively, if the panel is already open, click the "Focus" button to update the graph.
|
||||
27
modules/coo-troubleshooting-ui-plugin-install.adoc
Normal file
27
modules/coo-troubleshooting-ui-plugin-install.adoc
Normal file
@@ -0,0 +1,27 @@
|
||||
// Module included in the following assemblies:
|
||||
|
||||
// * observability/cluster_observability_operator/ui_plugins/troubleshooting-ui-plugin.adoc
|
||||
|
||||
:_mod-docs-content-type: PROCEDURE
|
||||
[id="coo-troubleshooting-ui-plugin-install_{context}"]
|
||||
= Installing the {coo-full} Troubleshooting UI plugin
|
||||
|
||||
.Prerequisites
|
||||
* You have access to the cluster as a user with the `cluster-admin` cluster role.
|
||||
* You have logged in to the {product-title} web console.
|
||||
* You have installed the {coo-full}
|
||||
|
||||
.Procedure
|
||||
. In the {product-title} web console, click *Ecosystem* -> *Installed Operators* and select {coo-full}
|
||||
. Choose the *UI Plugin* tab (at the far right of the tab list) and press *Create UIPlugin*
|
||||
. Select *YAML view*, enter the following content, and then press *Create*:
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: observability.openshift.io/v1alpha1
|
||||
kind: UIPlugin
|
||||
metadata:
|
||||
name: troubleshooting-panel
|
||||
spec:
|
||||
type: TroubleshootingPanel
|
||||
----
|
||||
85
modules/coo-troubleshooting-ui-plugin-using.adoc
Normal file
85
modules/coo-troubleshooting-ui-plugin-using.adoc
Normal file
@@ -0,0 +1,85 @@
|
||||
// Module included in the following assemblies:
|
||||
|
||||
// * observability/cluster_observability_operator/ui_plugins/troubleshooting-ui-plugin.adoc
|
||||
|
||||
:_mod-docs-content-type: PROCEDURE
|
||||
[id="coo-troubleshooting-ui-plugin-using_{context}"]
|
||||
= Using the {coo-full} troubleshooting UI plugin
|
||||
|
||||
include::snippets/unified-perspective-web-console.adoc[]
|
||||
|
||||
.Prerequisites
|
||||
* You have access to the {product-title} cluster as a user with the `cluster-admin` cluster role. If your cluster version is 4.17+, you can access the troubleshooting UI panel from the Application Launcher {launch}.
|
||||
* You have logged in to the {product-title} web console.
|
||||
* You have installed {product-title} Logging, if you want to visualize correlated logs.
|
||||
* You have installed {product-title} Network Observability, if you want to visualize correlated netflows.
|
||||
* You have installed the {coo-full}.
|
||||
* You have installed the {coo-full} troubleshooting UI plugin.
|
||||
+
|
||||
[NOTE]
|
||||
====
|
||||
The troubleshooting panel relies on the observability signal stores installed in your cluster.
|
||||
Kuberenetes resources, alerts and metrics are always available by default in an {product-title} cluster.
|
||||
Other signal types require optional components to be installed:
|
||||
|
||||
* **Logs:** Red Hat Openshift Logging (collection) and Loki Operator provided by Red Hat (store)
|
||||
* **Network events:** Network observability provided by Red Hat (collection) and Loki Operator provided by Red Hat (store)
|
||||
====
|
||||
.Procedure
|
||||
|
||||
. In the web console, go to *Observe* -> *Alerting* and then select an alert. If the alert has correlated items, a **Troubleshooting Panel** link will appear above the chart on the alert detail page.
|
||||
+
|
||||
image::coo-troubleshooting-panel-link.png[Troubleshooting Panel link]
|
||||
+
|
||||
Click on the **Troubleshooting Panel** link to display the panel.
|
||||
. The panel consists of query details and a topology graph of the query results. The selected alert is converted into a Korrel8r query string and sent to the `korrel8r` service.
|
||||
The results are displayed as a graph network connecting the returned signals and resources. This is a _neighbourhood_ graph, starting at the current resource and including related objects up to 3 steps away from the starting point.
|
||||
Clicking on nodes in the graph takes you to the corresponding web console pages for those resouces.
|
||||
. You can use the troubleshooting panel to find resources relating to the chosen alert.
|
||||
+
|
||||
[NOTE]
|
||||
====
|
||||
Clicking on a node may sometimes show fewer results than indicated on the graph. This is a known issue that will be addressed in a future release.
|
||||
====
|
||||
+
|
||||
image::coo-troubleshooting-panel-graph.png[Troubleshooting panel]
|
||||
[arabic]
|
||||
.. **Alert (1):** This node is the starting point in the graph and represents the `KubeContainerWaiting` alert displayed in the web console.
|
||||
|
||||
.. **Pod (1):** This node indicates that there is a single `Pod` resource associated with this alert. Clicking on this node will open a console search showing the related pod directly.
|
||||
|
||||
.. **Event (2):** There are two Kuberenetes events associated with the pod. Click this node to see the events.
|
||||
|
||||
.. **Logs (74):** This pod has 74 lines of logs, which you can access by clicking on this node.
|
||||
|
||||
.. **Metrics (105):** There are many metrics associated with the pod.
|
||||
|
||||
.. **Network (6):** There are network events, meaning the pod has communicated over the network. The remaining nodes in the graph represent the `Service`, `Deployment` and `DaemonSet` resources that the pod has communicated with.
|
||||
|
||||
.. **Focus:** Clicking this button updates the graph. By default, the graph itself does not change when you click on nodes in the graph. Instead, the main web console page changes, and you can then navigate to other resources using links on the page, while the troubleshooting panel itself stays open and unchanged. To force an update to the graph in the troubleshooting panel, click **Focus**. This draws a new graph, using the current resource in the web console as the starting point.
|
||||
|
||||
.. **Show Query:** Clicking this button enables some experimental features:
|
||||
+
|
||||
image::coo-troubleshooting-experimental.png[Experimental features]
|
||||
[arabic]
|
||||
... **Hide Query** hides the experimental features.
|
||||
|
||||
... The query that identifies the starting point for the graph.
|
||||
The query language, part of the link:https://korrel8r.github.io/korrel8r[Korrel8r] correlation engine used to create the graphs, is experimental and may change in future.
|
||||
The query is updated by the **Focus** button to correspond to the resources in the main web console window.
|
||||
|
||||
... **Neighbourhood depth** is used to display a smaller or larger neighbourhood.
|
||||
+
|
||||
[NOTE]
|
||||
====
|
||||
Setting a large value in a large cluster might cause the query to fail, if the number of results is too big.
|
||||
====
|
||||
... **Goal class** results in a goal directed search instead of a neighbourhood search. A goal directed search shows all paths from the starting point to the goal class, which indicates a type of resource or signal. The format of the goal class is experimental and may change. Currently, the following goals are valid:
|
||||
**** `k8s:__RESOURCE[VERSION.[GROUP]]__` identifying a kind of kuberenetes resource. For example `k8s:Pod` or `k8s:Deployment.apps.v1`.
|
||||
**** `alert:alert` representing any alert.
|
||||
|
||||
**** `metric:metric` representing any metric.
|
||||
|
||||
**** `netflow:network` representing any network observability network event.
|
||||
|
||||
**** `log:__LOG_TYPE__` representing stored logs, where `__LOG_TYPE__` must be one of `application`, `infrastructure` or `audit`.
|
||||
36
modules/coo-versus-default-ocp-monitoring.adoc
Normal file
36
modules/coo-versus-default-ocp-monitoring.adoc
Normal file
@@ -0,0 +1,36 @@
|
||||
// Module included in the following assemblies:
|
||||
|
||||
// * observability/cluster_observability_operator/cluster-observability-operator-overview.adoc
|
||||
|
||||
:_mod-docs-content-type: CONCEPT
|
||||
[id="coo-versus-default-ocp-monitoring_{context}"]
|
||||
= {coo-short} compared to default monitoring stack
|
||||
|
||||
The {coo-short} components function independently of the default in-cluster monitoring stack, which is deployed and managed by the {cmo-first}.
|
||||
Monitoring stacks deployed by the two Operators do not conflict. You can use a {coo-short} monitoring stack in addition to the default platform monitoring components deployed by the {cmo-short}.
|
||||
|
||||
The key differences between {coo-short} and the default in-cluster monitoring stack are shown in the following table:
|
||||
|
||||
[cols="1,3,3", options="header"]
|
||||
|===
|
||||
| Feature | {coo-short} | Default monitoring stack
|
||||
|
||||
| **Scope and integration**
|
||||
| Offers comprehensive monitoring and analytics for enterprise-level needs, covering cluster and workload performance.
|
||||
|
||||
However, it lacks direct integration with {product-title} and typically requires an external Grafana instance for dashboards.
|
||||
| Limited to core components within the cluster, for example, API server and etcd, and to OpenShift-specific namespaces.
|
||||
|
||||
There is deep integration into {product-title} including console dashboards and alert management in the console.
|
||||
|
||||
| **Configuration and customization**
|
||||
| Broader configuration options including data retention periods, storage methods, and collected data types.
|
||||
|
||||
The {coo-short} can delegate ownership of single configurable fields in custom resources to users by using Server-Side Apply (SSA), which enhances customization.
|
||||
| Built-in configurations with limited customization options.
|
||||
|
||||
| **Data retention and storage**
|
||||
| Long-term data retention, supporting historical analysis and capacity planning
|
||||
| Shorter data retention times, focusing on short-term monitoring and real-time detection.
|
||||
|
||||
|===
|
||||
@@ -57,7 +57,7 @@ In order to continue to use Elasticsearch and Kibana managed by the elasticsearc
|
||||
|
||||
=== Log Storage
|
||||
|
||||
* With this release, the responsibility for deploying the {logging} view plugin shifts from the {clo} to the {coo-first}. For new log storage installations that need visualization, the {coo-full} and the associated UIPlugin resource must be deployed. Refer to thelink:https://docs.redhat.com/en/documentation/red_hat_openshift_cluster_observability_operator/1-latest/html/about_red_hat_openshift_cluster_observability_operator/index[Cluster Observability Operator Overview] product documentation for more details. (link:https://issues.redhat.com/browse/LOG-5461[LOG-5461])
|
||||
* With this release, the responsibility for deploying the {logging} view plugin shifts from the {clo} to the {coo-first}. For new log storage installations that need visualization, the {coo-full} and the associated UIPlugin resource must be deployed. Refer to the xref:../../observability/cluster_observability_operator/cluster-observability-operator-overview.adoc[Cluster Observability Operator Overview] product documentation for more details. (link:https://issues.redhat.com/browse/LOG-5461[LOG-5461])
|
||||
|
||||
* This enhancement improves Azure storage secret validation by providing early warnings for specific error conditions. (link:https://issues.redhat.com/browse/LOG-4571[LOG-4571])
|
||||
[id="log6x-release-notes-6-0-0-technology-preview-features"]
|
||||
|
||||
@@ -0,0 +1,122 @@
|
||||
// Module included in the following assemblies:
|
||||
//
|
||||
// * observability/cluster-observability-operator/configuring-the-cluster-observability-operator-to-monitor-a-service.adoc
|
||||
|
||||
:_mod-docs-content-type: PROCEDURE
|
||||
[id="creating-a-monitoringstack-object-for-cluster-observability-operator_{context}"]
|
||||
= Creating a MonitoringStack object for the {coo-full}
|
||||
|
||||
To scrape the metrics data exposed by the target `prometheus-coo-example-app` service, create a `MonitoringStack` object that references the `ServiceMonitor` object you created in the "Specifying how a service is monitored for {coo-full}" section.
|
||||
This `MonitoringStack` object can then discover the service and scrape the exposed metrics data from it.
|
||||
|
||||
.Prerequisites
|
||||
|
||||
* You have access to the cluster as a user with the `cluster-admin` cluster role or as a user with administrative permissions for the namespace.
|
||||
* You have installed the {coo-full}.
|
||||
* You have deployed the `prometheus-coo-example-app` sample service in the `ns1-coo` namespace.
|
||||
* You have created a `ServiceMonitor` object named `prometheus-coo-example-monitor` in the `ns1-coo` namespace.
|
||||
|
||||
.Procedure
|
||||
|
||||
. Create a YAML file for the `MonitoringStack` object configuration. For this example, name the file `example-coo-monitoring-stack.yaml`.
|
||||
|
||||
. Add the following `MonitoringStack` object configuration details:
|
||||
+
|
||||
.Example `MonitoringStack` object
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: monitoring.rhobs/v1alpha1
|
||||
kind: MonitoringStack
|
||||
metadata:
|
||||
name: example-coo-monitoring-stack
|
||||
namespace: ns1-coo
|
||||
spec:
|
||||
logLevel: debug
|
||||
retention: 1d
|
||||
resourceSelector:
|
||||
matchLabels:
|
||||
k8s-app: prometheus-coo-example-monitor
|
||||
----
|
||||
|
||||
. Apply the `MonitoringStack` object by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc apply -f example-coo-monitoring-stack.yaml
|
||||
----
|
||||
|
||||
. Verify that the `MonitoringStack` object is available by running the following command and inspecting the output:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc -n ns1-coo get monitoringstack
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
NAME AGE
|
||||
example-coo-monitoring-stack 81m
|
||||
----
|
||||
|
||||
. Run the following comand to retrieve information about the active targets from Prometheus and filter the output to list only targets labeled with `app=prometheus-coo-example-app`. This verifies which targets are discovered and actively monitored by Prometheus with this specific label.
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc -n ns1-coo exec -c prometheus prometheus-example-coo-monitoring-stack-0 -- curl -s 'http://localhost:9090/api/v1/targets' | jq '.data.activeTargets[].discoveredLabels | select(.__meta_kubernetes_endpoints_label_app=="prometheus-coo-example-app")'
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,json]
|
||||
----
|
||||
{
|
||||
"__address__": "10.129.2.25:8080",
|
||||
"__meta_kubernetes_endpoint_address_target_kind": "Pod",
|
||||
"__meta_kubernetes_endpoint_address_target_name": "prometheus-coo-example-app-5d8cd498c7-9j2gj",
|
||||
"__meta_kubernetes_endpoint_node_name": "ci-ln-8tt8vxb-72292-6cxjr-worker-a-wdfnz",
|
||||
"__meta_kubernetes_endpoint_port_name": "web",
|
||||
"__meta_kubernetes_endpoint_port_protocol": "TCP",
|
||||
"__meta_kubernetes_endpoint_ready": "true",
|
||||
"__meta_kubernetes_endpoints_annotation_endpoints_kubernetes_io_last_change_trigger_time": "2024-11-05T11:24:09Z",
|
||||
"__meta_kubernetes_endpoints_annotationpresent_endpoints_kubernetes_io_last_change_trigger_time": "true",
|
||||
"__meta_kubernetes_endpoints_label_app": "prometheus-coo-example-app",
|
||||
"__meta_kubernetes_endpoints_labelpresent_app": "true",
|
||||
"__meta_kubernetes_endpoints_name": "prometheus-coo-example-app",
|
||||
"__meta_kubernetes_namespace": "ns1-coo",
|
||||
"__meta_kubernetes_pod_annotation_k8s_ovn_org_pod_networks": "{\"default\":{\"ip_addresses\":[\"10.129.2.25/23\"],\"mac_address\":\"0a:58:0a:81:02:19\",\"gateway_ips\":[\"10.129.2.1\"],\"routes\":[{\"dest\":\"10.128.0.0/14\",\"nextHop\":\"10.129.2.1\"},{\"dest\":\"172.30.0.0/16\",\"nextHop\":\"10.129.2.1\"},{\"dest\":\"100.64.0.0/16\",\"nextHop\":\"10.129.2.1\"}],\"ip_address\":\"10.129.2.25/23\",\"gateway_ip\":\"10.129.2.1\",\"role\":\"primary\"}}",
|
||||
"__meta_kubernetes_pod_annotation_k8s_v1_cni_cncf_io_network_status": "[{\n \"name\": \"ovn-kubernetes\",\n \"interface\": \"eth0\",\n \"ips\": [\n \"10.129.2.25\"\n ],\n \"mac\": \"0a:58:0a:81:02:19\",\n \"default\": true,\n \"dns\": {}\n}]",
|
||||
"__meta_kubernetes_pod_annotation_openshift_io_scc": "restricted-v2",
|
||||
"__meta_kubernetes_pod_annotation_seccomp_security_alpha_kubernetes_io_pod": "runtime/default",
|
||||
"__meta_kubernetes_pod_annotationpresent_k8s_ovn_org_pod_networks": "true",
|
||||
"__meta_kubernetes_pod_annotationpresent_k8s_v1_cni_cncf_io_network_status": "true",
|
||||
"__meta_kubernetes_pod_annotationpresent_openshift_io_scc": "true",
|
||||
"__meta_kubernetes_pod_annotationpresent_seccomp_security_alpha_kubernetes_io_pod": "true",
|
||||
"__meta_kubernetes_pod_controller_kind": "ReplicaSet",
|
||||
"__meta_kubernetes_pod_controller_name": "prometheus-coo-example-app-5d8cd498c7",
|
||||
"__meta_kubernetes_pod_host_ip": "10.0.128.2",
|
||||
"__meta_kubernetes_pod_ip": "10.129.2.25",
|
||||
"__meta_kubernetes_pod_label_app": "prometheus-coo-example-app",
|
||||
"__meta_kubernetes_pod_label_pod_template_hash": "5d8cd498c7",
|
||||
"__meta_kubernetes_pod_labelpresent_app": "true",
|
||||
"__meta_kubernetes_pod_labelpresent_pod_template_hash": "true",
|
||||
"__meta_kubernetes_pod_name": "prometheus-coo-example-app-5d8cd498c7-9j2gj",
|
||||
"__meta_kubernetes_pod_node_name": "ci-ln-8tt8vxb-72292-6cxjr-worker-a-wdfnz",
|
||||
"__meta_kubernetes_pod_phase": "Running",
|
||||
"__meta_kubernetes_pod_ready": "true",
|
||||
"__meta_kubernetes_pod_uid": "054c11b6-9a76-4827-a860-47f3a4596871",
|
||||
"__meta_kubernetes_service_label_app": "prometheus-coo-example-app",
|
||||
"__meta_kubernetes_service_labelpresent_app": "true",
|
||||
"__meta_kubernetes_service_name": "prometheus-coo-example-app",
|
||||
"__metrics_path__": "/metrics",
|
||||
"__scheme__": "http",
|
||||
"__scrape_interval__": "30s",
|
||||
"__scrape_timeout__": "10s",
|
||||
"job": "serviceMonitor/ns1-coo/prometheus-coo-example-monitor/0"
|
||||
}
|
||||
----
|
||||
+
|
||||
[NOTE]
|
||||
====
|
||||
The above example uses link:https://jqlang.github.io/jq/[`jq` command-line JSON processor] to format the output for convenience.
|
||||
====
|
||||
@@ -0,0 +1,88 @@
|
||||
// Module included in the following assemblies:
|
||||
//
|
||||
// * observability/cluster-observability-operator/configuring-the-cluster-observability-operator-to-monitor-a-service.adoc
|
||||
|
||||
:_mod-docs-content-type: PROCEDURE
|
||||
[id="deploying-a-sample-service-for-cluster-observability-operator_{context}"]
|
||||
= Deploying a sample service for {coo-full}
|
||||
|
||||
This configuration deploys a sample service named `prometheus-coo-example-app` in the user-defined `ns1-coo` project.
|
||||
The service exposes the custom `version` metric.
|
||||
|
||||
.Prerequisites
|
||||
|
||||
* You have access to the cluster as a user with the `cluster-admin` cluster role or as a user with administrative permissions for the namespace.
|
||||
|
||||
.Procedure
|
||||
|
||||
. Create a YAML file named `prometheus-coo-example-app.yaml` that contains the following configuration details for a namespace, deployment, and service:
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: v1
|
||||
kind: Namespace
|
||||
metadata:
|
||||
name: ns1-coo
|
||||
---
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
labels:
|
||||
app: prometheus-coo-example-app
|
||||
name: prometheus-coo-example-app
|
||||
namespace: ns1-coo
|
||||
spec:
|
||||
replicas: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
app: prometheus-coo-example-app
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: prometheus-coo-example-app
|
||||
spec:
|
||||
containers:
|
||||
- image: ghcr.io/rhobs/prometheus-example-app:0.4.2
|
||||
imagePullPolicy: IfNotPresent
|
||||
name: prometheus-coo-example-app
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
labels:
|
||||
app: prometheus-coo-example-app
|
||||
name: prometheus-coo-example-app
|
||||
namespace: ns1-coo
|
||||
spec:
|
||||
ports:
|
||||
- port: 8080
|
||||
protocol: TCP
|
||||
targetPort: 8080
|
||||
name: web
|
||||
selector:
|
||||
app: prometheus-coo-example-app
|
||||
type: ClusterIP
|
||||
----
|
||||
|
||||
. Save the file.
|
||||
|
||||
. Apply the configuration to the cluster by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc apply -f prometheus-coo-example-app.yaml
|
||||
----
|
||||
|
||||
. Verify that the pod is running by running the following command and observing the output:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc -n ns1-coo get pod
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
prometheus-coo-example-app-0927545cb7-anskj 1/1 Running 0 81m
|
||||
----
|
||||
@@ -0,0 +1,35 @@
|
||||
// Module included in the following assemblies:
|
||||
|
||||
// * observability/cluster_observability_operator/installing-the-cluster-observability-operator.adoc
|
||||
|
||||
:_mod-docs-content-type: PROCEDURE
|
||||
[id="installing-the-cluster-observability-operator-in-the-web-console-_{context}"]
|
||||
= Installing the {coo-full} in the web console
|
||||
Install the {coo-first} from the software catalog by using the {product-title} web console.
|
||||
|
||||
.Prerequisites
|
||||
|
||||
* You have access to the cluster as a user with the `cluster-admin` cluster role.
|
||||
* You have logged in to the {product-title} web console.
|
||||
|
||||
.Procedure
|
||||
|
||||
. In the {product-title} web console, click *Ecosystem* -> *Software Catalog*.
|
||||
. Type `cluster observability operator` in the *Filter by keyword* box.
|
||||
. Click *{coo-full}* in the list of results.
|
||||
. Read the information about the Operator, and configure the following installation settings:
|
||||
+
|
||||
* *Update channel* -> *stable*
|
||||
* *Version* -> *1.0.0* or later
|
||||
* *Installation mode* -> *All namespaces on the cluster (default)*
|
||||
* *Installed Namespace* -> *Operator recommended Namespace: openshift-cluster-observability-operator*
|
||||
* Select *Enable Operator recommended cluster monitoring on this Namespace*
|
||||
* *Update approval* -> *Automatic*
|
||||
|
||||
. Optional: You can change the installation settings to suit your requirements.
|
||||
For example, you can select to subscribe to a different update channel, to install an older released version of the Operator, or to require manual approval for updates to new versions of the Operator.
|
||||
. Click *Install*.
|
||||
|
||||
.Verification
|
||||
|
||||
* Go to *Ecosystem* -> *Installed Operators*, and verify that the *{coo-full}* entry appears in the list.
|
||||
206
modules/monitoring-scrape-targets-in-multiple-namespaces.adoc
Normal file
206
modules/monitoring-scrape-targets-in-multiple-namespaces.adoc
Normal file
@@ -0,0 +1,206 @@
|
||||
// Module included in the following assemblies:
|
||||
//
|
||||
// * observability/cluster-observability-operator/configuring-the-cluster-observability-operator-to-monitor-a-service.adoc
|
||||
|
||||
:_mod-docs-content-type: PROCEDURE
|
||||
[id="monitoring-scrape-targets-in-multiple-namespaces_{context}"]
|
||||
= Scrape targets in multiple namespaces
|
||||
|
||||
To scrape targets in multiple namespaces, set the namespace and resource selector in the `MonitoringStack` object.
|
||||
|
||||
.Prerequisites
|
||||
|
||||
* You have access to the cluster as a user with the `cluster-admin` cluster role or as a user with administrative permissions for the namespace.
|
||||
* You have installed the {coo-full}.
|
||||
|
||||
.Procedure
|
||||
|
||||
. Deploy the following namespace object and `MonitoringStack` YAML file:
|
||||
+
|
||||
.Example `MonitoringStack`
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: v1
|
||||
kind: Namespace
|
||||
metadata:
|
||||
name: ns1-coo
|
||||
labels:
|
||||
monitoring.rhobs/stack: multi-ns
|
||||
---
|
||||
apiVersion: monitoring.rhobs/v1alpha1
|
||||
kind: MonitoringStack
|
||||
metadata:
|
||||
name: example-coo-monitoring-stack
|
||||
namespace: ns1-coo
|
||||
spec:
|
||||
logLevel: debug
|
||||
retention: 1d
|
||||
resourceSelector:
|
||||
matchLabels:
|
||||
k8s-app: prometheus-coo-example-monitor
|
||||
namespaceSelector:
|
||||
matchLabels:
|
||||
monitoring.rhobs/stack: multi-ns
|
||||
----
|
||||
|
||||
. Deploy a sample application in the namespace `ns1-coo`, with an alert that is always firing:
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
labels:
|
||||
app: prometheus-coo-example-app
|
||||
name: prometheus-coo-example-app
|
||||
namespace: ns1-coo
|
||||
spec:
|
||||
replicas: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
app: prometheus-coo-example-app
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: prometheus-coo-example-app
|
||||
spec:
|
||||
containers:
|
||||
- image: ghcr.io/rhobs/prometheus-example-app:0.4.2
|
||||
imagePullPolicy: IfNotPresent
|
||||
name: prometheus-coo-example-app
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
labels:
|
||||
app: prometheus-coo-example-app
|
||||
name: prometheus-coo-example-app
|
||||
namespace: ns1-coo
|
||||
spec:
|
||||
ports:
|
||||
- port: 8080
|
||||
protocol: TCP
|
||||
targetPort: 8080
|
||||
name: web
|
||||
selector:
|
||||
app: prometheus-coo-example-app
|
||||
type: ClusterIP
|
||||
---
|
||||
apiVersion: monitoring.rhobs/v1
|
||||
kind: ServiceMonitor
|
||||
metadata:
|
||||
labels:
|
||||
k8s-app: prometheus-coo-example-monitor
|
||||
name: prometheus-coo-example-monitor
|
||||
namespace: ns1-coo
|
||||
spec:
|
||||
endpoints:
|
||||
- interval: 30s
|
||||
port: web
|
||||
scheme: http
|
||||
selector:
|
||||
matchLabels:
|
||||
app: prometheus-coo-example-app
|
||||
---
|
||||
apiVersion: monitoring.rhobs/v1
|
||||
kind: PrometheusRule
|
||||
metadata:
|
||||
name: example-alert
|
||||
namespace: ns1-coo
|
||||
labels:
|
||||
k8s-app: prometheus-coo-example-monitor
|
||||
spec:
|
||||
groups:
|
||||
- name: example
|
||||
rules:
|
||||
- alert: VersionAlert
|
||||
for: 1m
|
||||
expr: version{job="prometheus-coo-example-app"} > 0
|
||||
labels:
|
||||
severity: warning
|
||||
----
|
||||
|
||||
. Deploy the same example application in another namespace labeled with `monitoring.rhobs/stack: multi-ns`:
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: v1
|
||||
kind: Namespace
|
||||
metadata:
|
||||
name: ns2-coo
|
||||
labels:
|
||||
monitoring.rhobs/stack: multi-ns
|
||||
---
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
labels:
|
||||
app: prometheus-coo-example-app
|
||||
name: prometheus-coo-example-app
|
||||
namespace: ns2-coo
|
||||
spec:
|
||||
replicas: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
app: prometheus-coo-example-app
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: prometheus-coo-example-app
|
||||
spec:
|
||||
containers:
|
||||
- image: ghcr.io/rhobs/prometheus-example-app:0.4.2
|
||||
imagePullPolicy: IfNotPresent
|
||||
name: prometheus-coo-example-app
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
labels:
|
||||
app: prometheus-coo-example-app
|
||||
name: prometheus-coo-example-app
|
||||
namespace: ns2-coo
|
||||
spec:
|
||||
ports:
|
||||
- port: 8080
|
||||
protocol: TCP
|
||||
targetPort: 8080
|
||||
name: web
|
||||
selector:
|
||||
app: prometheus-coo-example-app
|
||||
type: ClusterIP
|
||||
---
|
||||
apiVersion: monitoring.rhobs/v1
|
||||
kind: ServiceMonitor
|
||||
metadata:
|
||||
labels:
|
||||
k8s-app: prometheus-coo-example-monitor
|
||||
name: prometheus-coo-example-monitor
|
||||
namespace: ns2-coo
|
||||
spec:
|
||||
endpoints:
|
||||
- interval: 30s
|
||||
port: web
|
||||
scheme: http
|
||||
selector:
|
||||
matchLabels:
|
||||
app: prometheus-coo-example-app
|
||||
----
|
||||
|
||||
.Verification
|
||||
|
||||
. Verify that the Prometheus instance adds new targets and that the alert are firing. Use a port-forward command to expose the Prometheus or the Alertmanager user interface that has been deployed by the `Monitoringstack` instance.
|
||||
+
|
||||
.Prometheus
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc port-forward -n ns1-coo pod/prometheus-example-coo-monitoring-stack-0 9090
|
||||
----
|
||||
+
|
||||
.Alertmanager
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc port-forward -n ns1-coo pod/alertmanager-example-coo-monitoring-stack-0 9093
|
||||
----
|
||||
|
||||
. Verify that the targets are being scraped and that the alerts are firing by browsing to `http://localhost:9090/targets` or `http://localhost:9093/#/alerts`.
|
||||
@@ -0,0 +1,71 @@
|
||||
// Module included in the following assemblies:
|
||||
//
|
||||
// * observability/cluster-observability-operator/configuring-the-cluster-observability-operator-to-monitor-a-service.adoc
|
||||
|
||||
:_mod-docs-content-type: PROCEDURE
|
||||
[id="specifying-how-a-service-is-monitored-by-cluster-observability-operator_{context}"]
|
||||
= Specifying how a service is monitored by {coo-full}
|
||||
|
||||
To use the metrics exposed by the sample service you created in the "Deploying a sample service for {coo-full}" section, you must configure monitoring components to scrape metrics from the `/metrics` endpoint.
|
||||
|
||||
You can create this configuration by using a `ServiceMonitor` object that specifies how the service is to be monitored, or a `PodMonitor` object that specifies how a pod is to be monitored.
|
||||
The `ServiceMonitor` object requires a `Service` object. The `PodMonitor` object does not, which enables the `MonitoringStack` object to scrape metrics directly from the metrics endpoint exposed by a pod.
|
||||
|
||||
This procedure shows how to create a `ServiceMonitor` object for a sample service named `prometheus-coo-example-app` in the `ns1-coo` namespace.
|
||||
|
||||
.Prerequisites
|
||||
|
||||
* You have access to the cluster as a user with the `cluster-admin` cluster role or as a user with administrative permissions for the namespace.
|
||||
* You have installed the {coo-full}.
|
||||
* You have deployed the `prometheus-coo-example-app` sample service in the `ns1-coo` namespace.
|
||||
+
|
||||
[NOTE]
|
||||
====
|
||||
The `prometheus-coo-example-app` sample service does not support TLS authentication.
|
||||
====
|
||||
|
||||
.Procedure
|
||||
|
||||
. Create a YAML file named `example-coo-app-service-monitor.yaml` that contains the following `ServiceMonitor` object configuration details:
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: monitoring.rhobs/v1
|
||||
kind: ServiceMonitor
|
||||
metadata:
|
||||
labels:
|
||||
k8s-app: prometheus-coo-example-monitor
|
||||
name: prometheus-coo-example-monitor
|
||||
namespace: ns1-coo
|
||||
spec:
|
||||
endpoints:
|
||||
- interval: 30s
|
||||
port: web
|
||||
scheme: http
|
||||
selector:
|
||||
matchLabels:
|
||||
app: prometheus-coo-example-app
|
||||
----
|
||||
+
|
||||
This configuration defines a `ServiceMonitor` object that the `MonitoringStack` object will reference to scrape the metrics data exposed by the `prometheus-coo-example-app` sample service.
|
||||
|
||||
. Apply the configuration to the cluster by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc apply -f example-coo-app-service-monitor.yaml
|
||||
----
|
||||
|
||||
. Verify that the `ServiceMonitor` resource is created by running the following command and observing the output:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc -n ns1-coo get servicemonitors.monitoring.rhobs
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
NAME AGE
|
||||
prometheus-coo-example-monitor 81m
|
||||
----
|
||||
@@ -0,0 +1,32 @@
|
||||
//Module included in the following assemblies:
|
||||
//
|
||||
// * observability/cluster_observability_operator/cluster-observability-operator-overview.adoc
|
||||
|
||||
:_mod-docs-content-type: CONCEPT
|
||||
[id="understanding-the-cluster-observability-operator_{context}"]
|
||||
= Understanding the {coo-full}
|
||||
|
||||
A default monitoring stack created by the {coo-first} includes a highly available Prometheus instance capable of sending metrics to an external endpoint by using remote write.
|
||||
|
||||
Each {coo-short} stack also includes an optional Thanos Querier component, which you can use to query a highly available Prometheus instance from a central location, and an optional Alertmanager component, which you can use to set up alert configurations for different services.
|
||||
|
||||
[id="advantages-of-using-cluster-observability-operator_{context}"]
|
||||
== Advantages of using the {coo-full}
|
||||
|
||||
The `MonitoringStack` CRD used by the {coo-short} offers an opinionated default monitoring configuration for {coo-short}-deployed monitoring components, but you can customize it to suit more complex requirements.
|
||||
|
||||
Deploying a {coo-short}-managed monitoring stack can help meet monitoring needs that are difficult or impossible to address by using the core platform monitoring stack deployed by the {cmo-first}.
|
||||
A monitoring stack deployed using {coo-short} has the following advantages over core platform and user workload monitoring:
|
||||
|
||||
Extendability:: Users can add more metrics to a {coo-short}-deployed monitoring stack, which is not possible with core platform monitoring without losing support.
|
||||
In addition, {coo-short}-managed stacks can receive certain cluster-specific metrics from core platform monitoring by using federation.
|
||||
Multi-tenancy support:: The {coo-short} can create a monitoring stack per user namespace.
|
||||
You can also deploy multiple stacks per namespace or a single stack for multiple namespaces.
|
||||
For example, cluster administrators, SRE teams, and development teams can all deploy their own monitoring stacks on a single cluster, rather than having to use a single shared stack of monitoring components.
|
||||
Users on different teams can then independently configure features such as separate alerts, alert routing, and alert receivers for their applications and services.
|
||||
Scalability:: You can create {coo-short}-managed monitoring stacks as needed.
|
||||
Multiple monitoring stacks can run on a single cluster, which can facilitate the monitoring of very large clusters by using manual sharding. This ability addresses cases where the number of metrics exceeds the monitoring capabilities of a single Prometheus instance.
|
||||
Flexibility:: Deploying the {coo-short} with Operator Lifecycle Manager (OLM) decouples {coo-short} releases from {product-title} release cycles.
|
||||
This method of deployment enables faster release iterations and the ability to respond rapidly to changing requirements and issues.
|
||||
Additionally, by deploying a {coo-short}-managed monitoring stack, users can manage alerting rules independently of {product-title} release cycles.
|
||||
Highly customizable:: The {coo-short} can delegate ownership of single configurable fields in custom resources to users by using Server-Side Apply (SSA), which enhances customization.
|
||||
@@ -0,0 +1,25 @@
|
||||
// Module included in the following assemblies:
|
||||
|
||||
// * observability/cluster_observability_operator/installing-the-cluster-observability-operator.adoc
|
||||
|
||||
:_mod-docs-content-type: PROCEDURE
|
||||
[id="uninstalling-the-cluster-observability-operator-using-the-web-console_{context}"]
|
||||
= Uninstalling the {coo-full} using the web console
|
||||
If you have installed the {coo-first} by using the software catalog, you can uninstall it in the {product-title} web console.
|
||||
|
||||
.Prerequisites
|
||||
|
||||
* You have access to the cluster as a user with the `cluster-admin` cluster role.
|
||||
* You have logged in to the {product-title} web console.
|
||||
|
||||
.Procedure
|
||||
|
||||
. Go to *Ecosystem* -> *Installed Operators*.
|
||||
|
||||
. Locate the *{coo-full}* entry in the list.
|
||||
|
||||
. Click {kebab} for this entry and select *Uninstall Operator*.
|
||||
|
||||
.Verification
|
||||
|
||||
* Go to *Ecosystem* -> *Installed Operators*, and verify that the *{coo-full}* entry no longer appears in the list.
|
||||
@@ -0,0 +1,81 @@
|
||||
// Module included in the following assemblies:
|
||||
//
|
||||
// * observability/cluster-observability-operator/configuring-the-cluster-observability-operator-to-monitor-a-service.adoc
|
||||
|
||||
:_mod-docs-content-type: PROCEDURE
|
||||
[id="monitoring-validating-a-monitoringstack-for-cluster-observability-operator_{context}"]
|
||||
= Validating the monitoring stack
|
||||
|
||||
To validate that the monitoring stack is working correctly, access the example service and then view the gathered metrics.
|
||||
|
||||
.Prerequisites
|
||||
|
||||
* You have access to the cluster as a user with the `cluster-admin` cluster role or as a user with administrative permissions for the namespace.
|
||||
* You have installed the {coo-full}.
|
||||
* You have deployed the `prometheus-coo-example-app` sample service in the `ns1-coo` namespace.
|
||||
* You have created a `ServiceMonitor` object named `prometheus-coo-example-monitor` in the `ns1-coo` namespace.
|
||||
* You have created a `MonitoringStack` object named `example-coo-monitoring-stack` in the `ns1-coo` namespace.
|
||||
|
||||
.Procedure
|
||||
|
||||
. Create a route to expose the example `prometheus-coo-example-app` service. From your terminal, run the command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc expose svc prometheus-coo-example-app -n ns1-coo
|
||||
----
|
||||
. Access the route from your browser, or command line, to generate metrics.
|
||||
|
||||
. Execute a query on the Prometheus pod to return the total HTTP requests metric:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc -n ns1-coo exec -c prometheus prometheus-example-coo-monitoring-stack-0 -- curl -s 'http://localhost:9090/api/v1/query?query=http_requests_total'
|
||||
----
|
||||
+
|
||||
.Example output (formatted using `jq` for convenience)
|
||||
[source,json]
|
||||
----
|
||||
{
|
||||
"status": "success",
|
||||
"data": {
|
||||
"resultType": "vector",
|
||||
"result": [
|
||||
{
|
||||
"metric": {
|
||||
"__name__": "http_requests_total",
|
||||
"code": "200",
|
||||
"endpoint": "web",
|
||||
"instance": "10.129.2.25:8080",
|
||||
"job": "prometheus-coo-example-app",
|
||||
"method": "get",
|
||||
"namespace": "ns1-coo",
|
||||
"pod": "prometheus-coo-example-app-5d8cd498c7-9j2gj",
|
||||
"service": "prometheus-coo-example-app"
|
||||
},
|
||||
"value": [
|
||||
1730807483.632,
|
||||
"3"
|
||||
]
|
||||
},
|
||||
{
|
||||
"metric": {
|
||||
"__name__": "http_requests_total",
|
||||
"code": "404",
|
||||
"endpoint": "web",
|
||||
"instance": "10.129.2.25:8080",
|
||||
"job": "prometheus-coo-example-app",
|
||||
"method": "get",
|
||||
"namespace": "ns1-coo",
|
||||
"pod": "prometheus-coo-example-app-5d8cd498c7-9j2gj",
|
||||
"service": "prometheus-coo-example-app"
|
||||
},
|
||||
"value": [
|
||||
1730807483.632,
|
||||
"0"
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
----
|
||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,637 @@
|
||||
:_mod-docs-content-type: ASSEMBLY
|
||||
[id="api-observability-package"]
|
||||
= observability.openshift.io/v1alpha1
|
||||
include::_attributes/common-attributes.adoc[]
|
||||
:context: api-observability-package
|
||||
|
||||
toc::[]
|
||||
|
||||
The resource types are xref:#clusterobservability[`ClusterObservability`] and xref:#uiplugin[`UIPlugin`].
|
||||
|
||||
[[clusterobservability]]
|
||||
== ClusterObservability
|
||||
|
||||
ClusterObservability defines the desired state of the observability stack.
|
||||
|
||||
[cols="2,1,4,1"]
|
||||
|===
|
||||
|Name |Type |Description |Required
|
||||
|
||||
|`apiVersion`
|
||||
|string
|
||||
|`observability.openshift.io/v1alpha1`
|
||||
|true
|
||||
|
||||
|`kind`
|
||||
|string
|
||||
|`ClusterObservability`
|
||||
|true
|
||||
|
||||
|link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.32/#objectmeta-v1-meta[`metadata`]
|
||||
|object
|
||||
|Refer to the Kubernetes API documentation for the fields of the `metadata` field.
|
||||
|true
|
||||
|
||||
|xref:#clusterobservabilityspec[`spec`]
|
||||
|object
|
||||
|`Spec` defines the desired state of the cluster observability.
|
||||
|false
|
||||
|
||||
|`status`
|
||||
|object
|
||||
|Status of the signal manager.
|
||||
|false
|
||||
|===
|
||||
|
||||
[[clusterobservabilityspec]]
|
||||
== ClusterObservability.spec
|
||||
|
||||
|
||||
`Spec` defines the desired state of the cluster observability.
|
||||
|
||||
[cols="2,1,4,1"]
|
||||
|===
|
||||
|Name |Type |Description |Required
|
||||
|
||||
|xref:#clusterobservabilityspeccapabilities[`capabilities`]
|
||||
|object
|
||||
|`Capabilities` defines the observability capabilities. Each capability has to be enabled explicitly.
|
||||
|false
|
||||
|
||||
|xref:#clusterobservabilityspecstorage[`storage`]
|
||||
|object
|
||||
|`Storage` defines the storage for the capabilities that require a storage.
|
||||
|false
|
||||
|===
|
||||
|
||||
[[clusterobservabilityspeccapabilities]]
|
||||
== ClusterObservability.spec.capabilities
|
||||
|
||||
|
||||
`Capabilities` defines the observability capabilities. Each capability has to be enabled explicitly.
|
||||
|
||||
[cols="2,1,4,1"]
|
||||
|===
|
||||
|Name |Type |Description |Required
|
||||
|
||||
|xref:#clusterobservabilityspeccapabilitiesopentelemetry[`opentelemetry`]
|
||||
|object
|
||||
|`OpenTelemetry` defines the OpenTelemetry capabilities.
|
||||
|false
|
||||
|
||||
|xref:#clusterobservabilityspeccapabilitiestracing[`tracing`]
|
||||
|object
|
||||
|`Tracing` defines the tracing capabilities.
|
||||
|false
|
||||
|===
|
||||
|
||||
[[clusterobservabilityspeccapabilitiesopentelemetry]]
|
||||
== ClusterObservability.spec.capabilities.opentelemetry
|
||||
|
||||
|
||||
`OpenTelemetry` defines the OpenTelemetry capabilities.
|
||||
|
||||
[cols="2,1,4,1"]
|
||||
|===
|
||||
|Name |Type |Description |Required
|
||||
|
||||
|`enabled`
|
||||
|boolean
|
||||
|`Enabled` indicates whether the capability is enabled and it operator should deploy an instance. By default, it is set to false.
|
||||
|
||||
_Default_: false
|
||||
|false
|
||||
|
||||
|xref:#clusterobservabilityspeccapabilitiesopentelemetryexporter[`exporter`]
|
||||
|object
|
||||
|`Exporter` defines the OpenTelemetry exporter configuration. When defined the collector will export telemetry data to the specified endpoint.
|
||||
|false
|
||||
|
||||
|xref:#clusterobservabilityspeccapabilitiesopentelemetryoperators[`operators`]
|
||||
|object
|
||||
|`Operators` defines the operators installation for the capability.
|
||||
|false
|
||||
|===
|
||||
|
||||
[[clusterobservabilityspeccapabilitiesopentelemetryexporter]]
|
||||
== ClusterObservability.spec.capabilities.opentelemetry.exporter
|
||||
|
||||
`Exporter` defines the OpenTelemetry exporter configuration. When defined the collector will export telemetry data to the specified endpoint.
|
||||
|
||||
[cols="1,1,3,1"]
|
||||
|===
|
||||
|Name |Type |Description |Required
|
||||
|
||||
|`endpoint`
|
||||
|string
|
||||
|`Endpoint` is the OTLP endpoint.
|
||||
|false
|
||||
|===
|
||||
|
||||
[[clusterobservabilityspeccapabilitiesopentelemetryoperators]]
|
||||
== ClusterObservability.spec.capabilities.opentelemetry.operators
|
||||
|
||||
|
||||
`Operators` defines the operators installation for the capability.
|
||||
|
||||
[cols="1,1,3,1"]
|
||||
|===
|
||||
|Name |Type |Description |Required
|
||||
|
||||
|`install`
|
||||
|boolean
|
||||
|`Install` indicates whether the operator(s) used by the capability should be installed via OLM. When the capability is enabled, the install is set to true, otherwise it is set to false.
|
||||
|false
|
||||
|===
|
||||
|
||||
[[clusterobservabilityspeccapabilitiestracing]]
|
||||
== ClusterObservability.spec.capabilities.tracing
|
||||
|
||||
|
||||
`Tracing` defines the tracing capabilities.
|
||||
|
||||
[cols="2,1,4,1"]
|
||||
|===
|
||||
|Name |Type |Description |Required
|
||||
|
||||
|`enabled`
|
||||
|boolean
|
||||
|`Enabled` indicates whether the capability is enabled and it operator should deploy an instance. By default, it is set to false.
|
||||
|
||||
_Default_: false
|
||||
|false
|
||||
|
||||
|xref:#clusterobservabilityspeccapabilitiestracingoperators[`operators`]
|
||||
|object
|
||||
|`Operators` defines the operators installation for the capability.
|
||||
|false
|
||||
|===
|
||||
|
||||
[[clusterobservabilityspeccapabilitiestracingoperators]]
|
||||
== ClusterObservability.spec.capabilities.tracing.operators
|
||||
|
||||
|
||||
`Operators` defines the operators installation for the capability.
|
||||
|
||||
[cols="1,1,3,1"]
|
||||
|===
|
||||
|Name |Type |Description |Required
|
||||
|
||||
|`install`
|
||||
|boolean
|
||||
|`Install` indicates whether the operator(s) used by the capability should be installed via OLM. When the capability is enabled, the install is set to true, otherwise it is set to false.
|
||||
|false
|
||||
|===
|
||||
|
||||
[[clusterobservabilityspecstorage]]
|
||||
== ClusterObservability.spec.storage
|
||||
|
||||
|
||||
`Storage` defines the storage for the capabilities that require a storage.
|
||||
|
||||
[cols="2,1,4,1"]
|
||||
|===
|
||||
|Name |Type |Description |Required
|
||||
|
||||
|xref:#clusterobservabilityspecstoragesecret[`secret`]
|
||||
|object
|
||||
|`SecretSpec` defines the secret for the storage.
|
||||
|false
|
||||
|===
|
||||
|
||||
[[clusterobservabilityspecstoragesecret]]
|
||||
== ClusterObservability.spec.storage.secret
|
||||
|
||||
|
||||
`SecretSpec` defines the secret for the storage.
|
||||
|
||||
[cols="1,1,3,1"]
|
||||
|===
|
||||
|Name |Type |Description |Required
|
||||
|
||||
|`name`
|
||||
|string
|
||||
|`Name` is the name of the secret for the storage.
|
||||
|false
|
||||
|===
|
||||
|
||||
[[uiplugin]]
|
||||
== UIPlugin
|
||||
|
||||
|
||||
UIPlugin defines an observability console plugin.
|
||||
|
||||
[cols="2,1,4,1"]
|
||||
|===
|
||||
|Name |Type |Description |Required
|
||||
|
||||
|`apiVersion`
|
||||
|string
|
||||
|`observability.openshift.io/v1alpha1`
|
||||
|true
|
||||
|
||||
|`kind`
|
||||
|string
|
||||
|`UIPlugin`
|
||||
|true
|
||||
|
||||
|link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.32/#objectmeta-v1-meta[`metadata`]
|
||||
|object
|
||||
|Refer to the Kubernetes API documentation for the fields of the `metadata` field.
|
||||
|true
|
||||
|
||||
|xref:#uipluginspec[`spec`]
|
||||
|object
|
||||
|`UIPluginSpec` is the specification for desired state of UIPlugin.
|
||||
|false
|
||||
|
||||
|xref:#uipluginstatus[`status`]
|
||||
|object
|
||||
|`UIPluginStatus` defines the observed state of UIPlugin. It should always be reconstructable from the state of the cluster and/or outside world.
|
||||
|false
|
||||
|===
|
||||
|
||||
[[uipluginspec]]
|
||||
== UIPlugin.spec
|
||||
|
||||
`UIPluginSpec` is the specification for desired state of UIPlugin.
|
||||
|
||||
[cols="2,1,4,1"]
|
||||
|===
|
||||
|Name |Type |Description |Required
|
||||
|
||||
|`type`
|
||||
|enum
|
||||
|Type defines the UI plugin.
|
||||
|
||||
_Enum_: `Dashboards`, `TroubleshootingPanel`, `DistributedTracing`, `Logging`, `Monitoring`
|
||||
|true
|
||||
|
||||
|xref:#uipluginspecdeployment[`deployment`]
|
||||
|object
|
||||
|`Deployment` allows customizing aspects of the generated deployment hosting the UI Plugin.
|
||||
|false
|
||||
|
||||
|xref:#uipluginspecdistributedtracing[`distributedTracing`]
|
||||
|object
|
||||
|`DistributedTracing` contains configuration for the distributed tracing console plugin.
|
||||
|false
|
||||
|
||||
|xref:#uipluginspeclogging[`logging`]
|
||||
|object
|
||||
|`Logging` contains configuration for the logging console plugin.
|
||||
|
||||
It only applies to UIPlugin Type: `Logging`.
|
||||
|false
|
||||
|
||||
|xref:#uipluginspecmonitoring[`monitoring`]
|
||||
|object
|
||||
|`Monitoring` contains configuration for the monitoring console plugin.
|
||||
|false
|
||||
|
||||
|xref:#uipluginspectroubleshootingpanel[`troubleshootingPanel`]
|
||||
|object
|
||||
|`TroubleshootingPanel` contains configuration for the troubleshooting console plugin.
|
||||
|false
|
||||
|===
|
||||
|
||||
[[uipluginspecdeployment]]
|
||||
== UIPlugin.spec.deployment
|
||||
|
||||
|
||||
`Deployment` allows customizing aspects of the generated deployment hosting the UI Plugin.
|
||||
|
||||
[cols="2,2,4,1"]
|
||||
|===
|
||||
|Name |Type |Description |Required
|
||||
|
||||
|`nodeSelector`
|
||||
|`map[string]string`
|
||||
|Define a label-selector for nodes which the Pods should be scheduled on.
|
||||
|
||||
When no selector is specified it will default to a value only selecting Linux nodes (`"kubernetes.io/os=linux"`).
|
||||
|false
|
||||
|
||||
|xref:#uipluginspecdeploymenttolerationsindex[`tolerations`]
|
||||
|`[]object`
|
||||
|Define the tolerations used for the deployment.
|
||||
|false
|
||||
|===
|
||||
|
||||
[[uipluginspecdeploymenttolerationsindex]]
|
||||
== UIPlugin.spec.deployment.tolerations[index]
|
||||
|
||||
|
||||
The pod this `Toleration` is attached to tolerates any taint that matches the triple `<key,value,effect>` using the matching operator `<operator>`.
|
||||
|
||||
[cols="2,1,4,1"]
|
||||
|===
|
||||
|Name |Type |Description |Required
|
||||
|
||||
|`effect`
|
||||
|string
|
||||
|`Effect` indicates the taint effect to match. Empty means match all taint effects. When specified, allowed values are `NoSchedule`, `PreferNoSchedule` and `NoExecute`.
|
||||
|false
|
||||
|
||||
|`key`
|
||||
|string
|
||||
|`Key` is the taint key that the toleration applies to. Empty means match all taint keys. If the key is empty, operator must be `Exists`; this combination means to match all values and all keys.
|
||||
|false
|
||||
|
||||
|`operator`
|
||||
|string
|
||||
|`Operator` represents a key's relationship to the value. Valid operators are `Exists` and `Equal`. Defaults to `Equal`. `Exists` is equivalent to wildcard for value, so that a pod can tolerate all taints of a particular category.
|
||||
|false
|
||||
|
||||
|`tolerationSeconds`
|
||||
|integer
|
||||
|`TolerationSeconds` represents the period of time the toleration (which must be of effect `NoExecute`, otherwise this field is ignored) tolerates the taint. By default, it is not set, which means tolerate the taint forever (do not evict). Zero and negative values will be treated as 0 (evict immediately) by the system.
|
||||
|
||||
_Format_: int64
|
||||
|false
|
||||
|
||||
|`value`
|
||||
|string
|
||||
|`Value` is the taint value the toleration matches to. If the operator is `Exists`, the value should be empty, otherwise just a regular string.
|
||||
|false
|
||||
|===
|
||||
|
||||
[[uipluginspecdistributedtracing]]
|
||||
== UIPlugin.spec.distributedTracing
|
||||
|
||||
|
||||
`DistributedTracing` contains configuration for the distributed tracing console plugin.
|
||||
|
||||
[cols="1,1,4,1"]
|
||||
|===
|
||||
|Name |Type |Description |Required
|
||||
|
||||
|`timeout`
|
||||
|string
|
||||
|`Timeout` is the maximum duration before a query timeout.
|
||||
|
||||
The value is expected to be a sequence of digits followed by a unit suffix, which can be 's' (seconds) or 'm' (minutes).
|
||||
|false
|
||||
|===
|
||||
|
||||
[[uipluginspeclogging]]
|
||||
== UIPlugin.spec.logging
|
||||
|
||||
|
||||
`Logging` contains configuration for the logging console plugin.
|
||||
|
||||
* It only applies to UIPlugin Type: `Logging`.
|
||||
|
||||
[cols="2,1,4,1"]
|
||||
|===
|
||||
|Name |Type |Description |Required
|
||||
|
||||
|`logsLimit`
|
||||
|integer
|
||||
|`LogsLimit` is the max number of entries returned for a query.
|
||||
|
||||
_Format_: int32
|
||||
|
||||
_Minimum_: 0
|
||||
|false
|
||||
|
||||
|xref:#uipluginspeclogginglokistack[`lokiStack`]
|
||||
|object
|
||||
|`LokiStack` points to the `LokiStack` instance of which logs should be displayed. It always references a `LokiStack` in the "openshift-logging" namespace.
|
||||
|false
|
||||
|
||||
|`schema`
|
||||
|enum
|
||||
|`Schema` is the schema to use for logs querying and display.
|
||||
|
||||
Defaults to "viaq" if not specified.
|
||||
|
||||
_Enum_: `viaq`, `otel`, `select`
|
||||
|
||||
_Default_: `viaq`
|
||||
|false
|
||||
|
||||
|`timeout`
|
||||
|string
|
||||
|`Timeout` is the maximum duration before a query timeout.
|
||||
|
||||
The value is expected to be a sequence of digits followed by an optional unit suffix, which can be 's' (seconds) or 'm' (minutes). If the unit is omitted, it defaults to seconds.
|
||||
|false
|
||||
|===
|
||||
|
||||
[[uipluginspeclogginglokistack]]
|
||||
== UIPlugin.spec.logging.lokiStack
|
||||
|
||||
|
||||
`LokiStack` points to the LokiStack instance of which logs should be displayed. It always references a LokiStack in the "openshift-logging" namespace.
|
||||
|
||||
[cols="1,1,4,1"]
|
||||
|===
|
||||
|Name |Type |Description |Required
|
||||
|
||||
|`name`
|
||||
|string
|
||||
|Name of the `LokiStack` resource.
|
||||
|false
|
||||
|
||||
|`namespace`
|
||||
|string
|
||||
|
|
||||
|false
|
||||
|===
|
||||
|
||||
[[uipluginspecmonitoring]]
|
||||
== UIPlugin.spec.monitoring
|
||||
|
||||
|
||||
`Monitoring` contains configuration for the monitoring console plugin.
|
||||
|
||||
[cols="2,1,4,1"]
|
||||
|===
|
||||
|Name |Type |Description |Required
|
||||
|
||||
|xref:#uipluginspecmonitoringacm[`acm`]
|
||||
|object
|
||||
|`ACM` points to the alertmanager and thanosQuerier instance services of which it should create a proxy to.
|
||||
|false
|
||||
|
||||
|xref:#uipluginspecmonitoringincidents[`incidents`]
|
||||
|object
|
||||
|`Incidents` feature flag enablement
|
||||
|false
|
||||
|
||||
|xref:#uipluginspecmonitoringperses[`perses`]
|
||||
|object
|
||||
|`Perses` points to the perses instance service of which it should create a proxy to.
|
||||
|false
|
||||
|===
|
||||
|
||||
[[uipluginspecmonitoringacm]]
|
||||
== UIPlugin.spec.monitoring.acm
|
||||
|
||||
|
||||
`ACM` points to the alertmanager and thanosQuerier instance services of which it should create a proxy to.
|
||||
|
||||
[cols="2,1,4,1"]
|
||||
|===
|
||||
|Name |Type |Description |Required
|
||||
|
||||
|xref:#uipluginspecmonitoringacmalertmanager[`alertmanager`]
|
||||
|object
|
||||
|`Alertmanager` points to the alertmanager instance of which it should create a proxy to.
|
||||
|true
|
||||
|
||||
|`enabled`
|
||||
|boolean
|
||||
|Indicates if ACM-related feature(s) should be enabled
|
||||
|true
|
||||
|
||||
|xref:#uipluginspecmonitoringacmthanosquerier[`thanosQuerier`]
|
||||
|object
|
||||
|`ThanosQuerier` points to the thanos-querier service of which it should create a proxy to.
|
||||
|true
|
||||
|===
|
||||
|
||||
[[uipluginspecmonitoringacmalertmanager]]
|
||||
== UIPlugin.spec.monitoring.acm.alertmanager
|
||||
|
||||
`Alertmanager` points to the alertmanager instance of which it should create a proxy to.
|
||||
|
||||
[cols="1,1,3,1"]
|
||||
|===
|
||||
|Name |Type |Description |Required
|
||||
|
||||
|`url`
|
||||
|string
|
||||
|Url of the Alertmanager to proxy to.
|
||||
|true
|
||||
|===
|
||||
|
||||
[[uipluginspecmonitoringacmthanosquerier]]
|
||||
== UIPlugin.spec.monitoring.acm.thanosQuerier
|
||||
|
||||
|
||||
`ThanosQuerier` points to the thanos-querier service of which it should create a proxy to.
|
||||
|
||||
[cols="1,1,3,1"]
|
||||
|===
|
||||
|Name |Type |Description |Required
|
||||
|
||||
|`url`
|
||||
|string
|
||||
|Url of the ThanosQuerier to proxy to.
|
||||
|true
|
||||
|===
|
||||
|
||||
[[uipluginspecmonitoringincidents]]
|
||||
== UIPlugin.spec.monitoring.incidents
|
||||
|
||||
|
||||
`Incidents` feature flag enablement
|
||||
|
||||
[cols="1,1,3,1"]
|
||||
|===
|
||||
|Name |Type |Description |Required
|
||||
|
||||
|`enabled`
|
||||
|boolean
|
||||
|Indicates if incidents-related feature(s) should be enabled.
|
||||
|true
|
||||
|===
|
||||
|
||||
[[uipluginspecmonitoringperses]]
|
||||
== UIPlugin.spec.monitoring.perses
|
||||
|
||||
|
||||
`Perses` points to the perses instance service of which it should create a proxy to.
|
||||
|
||||
[cols="1,1,3,1"]
|
||||
|===
|
||||
|Name |Type |Description |Required
|
||||
|
||||
|`enabled`
|
||||
|boolean
|
||||
|Indicates if perses-related feature(s) should be enabled
|
||||
|true
|
||||
|===
|
||||
|
||||
[[uipluginspectroubleshootingpanel]]
|
||||
== UIPlugin.spec.troubleshootingPanel
|
||||
|
||||
|
||||
`TroubleshootingPanel` contains configuration for the troubleshooting console plugin.
|
||||
|
||||
[cols="1,1,4,1"]
|
||||
|===
|
||||
|Name |Type |Description |Required
|
||||
|
||||
|`timeout`
|
||||
|string
|
||||
|`Timeout` is the maximum duration before a query timeout.
|
||||
|
||||
The value is expected to be a sequence of digits followed by a unit suffix, which can be 's' (seconds) or 'm' (minutes).
|
||||
|false
|
||||
|===
|
||||
|
||||
[[uipluginstatus]]
|
||||
== UIPlugin.status
|
||||
|
||||
|
||||
`UIPluginStatus` defines the observed state of UIPlugin. It should always be reconstructable from the state of the cluster and/or outside world.
|
||||
|
||||
[cols="2,1,4,1"]
|
||||
|===
|
||||
|Name |Type |Description |Required
|
||||
|
||||
|xref:#uipluginstatusconditionsindex[`conditions`]
|
||||
|`[]object`
|
||||
|`Conditions` provide status information about the plugin.
|
||||
|true
|
||||
|===
|
||||
|
||||
[[uipluginstatusconditionsindex]]
|
||||
== UIPlugin.status.conditions[index]
|
||||
|
||||
|
||||
[cols="2,1,4,1"]
|
||||
|===
|
||||
|Name |Type |Description |Required
|
||||
|
||||
|`lastTransitionTime`
|
||||
|string
|
||||
|`lastTransitionTime` is the last time the condition transitioned from one status to another. This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable.
|
||||
|
||||
_Format_: date-time
|
||||
|true
|
||||
|
||||
|`message`
|
||||
|string
|
||||
|`message` is a human readable message indicating details about the transition. This may be an empty string.
|
||||
|true
|
||||
|
||||
|`reason`
|
||||
|string
|
||||
|`reason` contains a programmatic identifier indicating the reason for the condition's last transition. Producers of specific condition types may define expected values and meanings for this field, and whether the values are considered a guaranteed API. The value should be a CamelCase string. This field may not be empty.
|
||||
|true
|
||||
|
||||
|`status`
|
||||
|enum
|
||||
|status of the condition
|
||||
|
||||
_Enum_: `True`, `False`, `Unknown`, `Degraded`
|
||||
|true
|
||||
|
||||
|`type`
|
||||
|string
|
||||
|`type` of condition in CamelCase or in `foo.example.com/CamelCase`. The regex it matches is `(dns1123SubdomainFmt/)?(qualifiedNameFmt)`
|
||||
|true
|
||||
|
||||
|`observedGeneration`
|
||||
|integer
|
||||
|`observedGeneration` represents the `.metadata.generation` that the condition was set based upon. For instance, if `.metadata.generation` is currently 12, but the `.status.conditions[x].observedGeneration` is 9, the condition is out of date with respect to the current state of the instance.
|
||||
|
||||
_Format_: int64
|
||||
|
||||
_Minimum_: 0
|
||||
|false
|
||||
|===
|
||||
@@ -1,11 +1,34 @@
|
||||
:_mod-docs-content-type: ASSEMBLY
|
||||
[id="cluster-observability-operator-overview"]
|
||||
include::_attributes/common-attributes.adoc[]
|
||||
= {coo-full} overview
|
||||
|
||||
include::_attributes/common-attributes.adoc[]
|
||||
:context: cluster_observability_operator_overview
|
||||
|
||||
toc::[]
|
||||
|
||||
|
||||
The {coo-first} is an optional component of the {product-title} designed for creating and managing highly customizable monitoring stacks. It enables cluster administrators to automate configuration and management of monitoring needs extensively, offering a more tailored and detailed view of each namespace compared to the default {product-title} monitoring system.
|
||||
|
||||
The standalone {coo-short} documentation is available at link:https://docs.redhat.com/en/documentation/red_hat_openshift_cluster_observability_operator/1-latest/html/about_red_hat_openshift_cluster_observability_operator/index[].
|
||||
The {coo-short} deploys the following monitoring components:
|
||||
|
||||
* **Prometheus** - A highly available Prometheus instance capable of sending metrics to an external endpoint by using remote write.
|
||||
* **Thanos Querier** (optional) - Enables querying of Prometheus instances from a central location.
|
||||
* **Alertmanager** (optional) - Provides alert configuration capabilities for different services.
|
||||
* **xref:../../observability/cluster_observability_operator/ui_plugins/observability-ui-plugins-overview.adoc#observability-ui-plugins-overview[UI plugins]** (optional) - Enhances the observability capabilities with plugins for monitoring, logging, distributed tracing and troubleshooting.
|
||||
* **Korrel8r** (optional) - Provides observability signal correlation, powered by the open source Korrel8r project.
|
||||
* **xref:../../observability/cluster_observability_operator/ui_plugins/monitoring-ui-plugin.adoc#coo-incident-detection-overview_monitoring-ui-plugin[Incident detection]** (optional) - Groups related alerts into incidents, to help you identify the root causes of alert bursts.
|
||||
|
||||
include::modules/coo-versus-default-ocp-monitoring.adoc[leveloffset=+1]
|
||||
|
||||
include::modules/coo-advantages.adoc[leveloffset=+1]
|
||||
|
||||
include::modules/coo-target-users.adoc[leveloffset=+1]
|
||||
|
||||
//include::modules/monitoring-understanding-the-cluster-observability-operator.adoc[leveloffset=+1]
|
||||
|
||||
include::modules/coo-server-side-apply.adoc[leveloffset=+1]
|
||||
|
||||
[role="_additional-resources"]
|
||||
.Additional resources
|
||||
|
||||
* link:https://kubernetes.io/docs/reference/using-api/server-side-apply/[Kubernetes documentation for Server-Side Apply (SSA)]
|
||||
|
||||
@@ -0,0 +1,399 @@
|
||||
// Cluster Observability Operator Release Notes
|
||||
:_mod-docs-content-type: ASSEMBLY
|
||||
include::_attributes/common-attributes.adoc[]
|
||||
[id="cluster-observability-operator-release-notes"]
|
||||
= {coo-full} release notes
|
||||
:context: cluster-observability-operator-release-notes
|
||||
|
||||
toc::[]
|
||||
|
||||
The {coo-first} is an optional {product-title} Operator that enables administrators to create standalone monitoring stacks that are independently configurable for use by different services and users.
|
||||
|
||||
The {coo-short} complements the built-in monitoring capabilities of {product-title}. You can deploy it in parallel with the default platform and user workload monitoring stacks managed by the {cmo-first}.
|
||||
|
||||
These release notes track the development of the {coo-full} in {product-title}.
|
||||
|
||||
The following table provides information about which features are available depending on the version of {coo-full} and {product-title}:
|
||||
|
||||
[cols="1,1,1,1,1,1,1", options="header"]
|
||||
|===
|
||||
| COO Version | OCP Versions | Distributed tracing | Logging | Troubleshooting panel | ACM alerts | Incident detection
|
||||
| 1.1+ | 4.12 - 4.14 | ✔ | ✔ | ✘ | ✘ | ✘
|
||||
| 1.1+ | 4.15 | ✔ | ✔ | ✘ | ✔ | ✘
|
||||
| 1.1+ | 4.16 - 4.18 | ✔ | ✔ | ✔ | ✔ | ✘
|
||||
| 1.2+ | 4.19+ | ✔ | ✔ | ✔ | ✔ | ✔
|
||||
|===
|
||||
|
||||
include::snippets/unified-perspective-web-console.adoc[]
|
||||
|
||||
[id="cluster-observability-operator-release-notes-1-2-2_{context}"]
|
||||
== {coo-full} 1.2.2
|
||||
|
||||
The following advisory is available for {coo-full} 1.2.2:
|
||||
|
||||
* link:https://access.redhat.com/errata/RHBA-2025:11689[RHBA-2025:11689 {coo-full} 1.2.2]
|
||||
|
||||
[id="cluster-observability-operator-1-2-2-bug-fixes_{context}"]
|
||||
=== Bug fixes
|
||||
|
||||
* Before this update, the installation of the incident detection feature could fail intermittently. The symptoms include the incident detection UI being visible but not including any data. In addition, the health-analyzer `ServiceMonitor` resource is in a failed state, with the error message `tls: failed to verify certificate: x509`. With this release, the incident detection feature installs correctly. (link:https://issues.redhat.com/browse/COO-1062[COO-1062])
|
||||
+
|
||||
If you are upgrading from 1.2.1 where the bug was occurring, you must recreate the monitoring UI plugin to resolve the issue.
|
||||
|
||||
[id="cluster-observability-operator-1-2-2-known-issues_{context}"]
|
||||
=== Known issues
|
||||
|
||||
These are the known issues in {coo-full} 1.2.2:
|
||||
|
||||
* When installing version 1.2.2 or when upgrading from version 1.2.0, the monitoring plugin's `UIPlugin` resource can be corrupted. This occurs when you have also deployed distributed tracing, the troubleshooting panel, and Advance Cluster Management (ACM), together with the monitoring UI plugin. You can resolve this issue by recreating the UI plugin. (link:https://issues.redhat.com/browse/COO-1051[COO-1051])
|
||||
+
|
||||
If you have previously resolved the issue in 1.2.1 and then upgrade to 1.2.2, this issue will not reoccur.
|
||||
|
||||
[id="cluster-observability-operator-release-notes-1-2-1_{context}"]
|
||||
== {coo-full} 1.2.1
|
||||
|
||||
The following advisory is available for {coo-full} 1.2.1:
|
||||
|
||||
* link:https://access.redhat.com/errata/RHBA-2025:10696[RHBA-2025:10696 {coo-full} 1.2.1]
|
||||
|
||||
[id="cluster-observability-operator-1-2-1-bug-fixes_{context}"]
|
||||
=== Bug fixes
|
||||
|
||||
* Before this update, an old version label matcher was retained during the Operator version 1.2 upgrade. This caused Perses dashboards to become unavailable. With this release, the version label is removed and Perses dashboards are correctly reconciled.
|
||||
|
||||
[id="cluster-observability-operator-1-2-1-known-issues_{context}"]
|
||||
=== Known issues
|
||||
|
||||
These are the known issues in {coo-full} 1.2.1:
|
||||
|
||||
* The installation of the incident detection feature could fail intermittently. The symptoms include the incident detection UI being visible but not including any data. In addition, the health-analyzer `ServiceMonitor` resource is in a failed state, with the error message `tls: failed to verify certificate: x509`. You can resolve this issue by upgrading to 1.2.2 and recreating the monitoring UI plugin. (link:https://issues.redhat.com/browse/COO-1062[COO-1062])
|
||||
|
||||
* When installing version 1.2.1 or when upgrading from version 1.2.0, the monitoring plugin's `UIPlugin` resource can be corrupted. This occurs when you have also deployed distributed tracing, the troubleshooting panel, and Advance Cluster Management (ACM), together with the monitoring UI plugin. You can resolve this issue by recreating the UI plugin. (link:https://issues.redhat.com/browse/COO-1051[COO-1051])
|
||||
|
||||
[id="cluster-observability-operator-release-notes-1-2_{context}"]
|
||||
== {coo-full} 1.2
|
||||
|
||||
The following advisory is available for {coo-full} 1.2:
|
||||
|
||||
* link:https://access.redhat.com/errata/RHBA-2025:8940[RHBA-2025:8940 {coo-full} 1.2]
|
||||
|
||||
[id="cluster-observability-operator-1-2-new-features-enhancements_{context}"]
|
||||
=== New features and enhancements
|
||||
|
||||
* The logging UI plugin now supports the OTEL format, in addition to the previously supported ViaQ scheme. (link:https://issues.redhat.com/browse/COO-816[COO-816])
|
||||
|
||||
* Accelerators Perses dashboards are deployed by default when you install the monitoring UI plugin. (link:https://issues.redhat.com/browse/COO-942[COO-942])
|
||||
|
||||
* Multiple results per graph node are now displayed for Korrel8r. (link:https://issues.redhat.com/browse/COO-785[COO-785])
|
||||
|
||||
* Direct navigation to individual incident detail is now supported in the incident detection panel, and this enables the incidents overview functionality in {rh-rhacm-first} 2.14. (link:https://issues.redhat.com/browse/COO-977[COO-977], link:https://issues.redhat.com/browse/ACM-18751[ACM-18751])
|
||||
|
||||
* Advanced filters have been added to the tracing view. (link:https://issues.redhat.com/browse/COO-979[COO-979])
|
||||
|
||||
* The status of the distributed tracing UI plugin is now General Availability (GA), supporting Patternfly 4, 5 and 6. (link:https://issues.redhat.com/browse/COO-873[COO-873])
|
||||
|
||||
[id="cluster-observability-operator-1-2-bug-fixes_{context}"]
|
||||
=== Bug fixes
|
||||
|
||||
* Previously, LokiStack was a prerequisite for installing the logging UI plugin. With this release, you can install the logging UI plugin without LokiStack. (link:https://issues.redhat.com/browse/COO-760[COO-760])
|
||||
|
||||
* Previously, the *Silence Alert* button in the **Incidents** -> **Component** section did not pre-populate the fields and was not usable. This release resolves the issue. (link:https://issues.redhat.com/browse/COO-970[COO-970])
|
||||
|
||||
|
||||
[id="cluster-observability-operator-1-2-known-issues_{context}"]
|
||||
=== Known issues
|
||||
|
||||
These are the known issues in {coo-full} 1.2.0:
|
||||
|
||||
* When upgrading from {coo-short} 1.1.1 to {coo-short} 1.2, the Perses dashboard is not correctly reconciled, and this requires the monitoring UI plugin to be reinstalled. (link:https://issues.redhat.com/browse/COO-978[COO-978])
|
||||
|
||||
|
||||
[id="cluster-observability-operator-release-notes-1-1-1_{context}"]
|
||||
== {coo-full} 1.1.1
|
||||
|
||||
// Back fill advisory when it is created by KOnflux
|
||||
// The following advisory is available for {coo-full} 1.1.1:
|
||||
//
|
||||
// * link:https://access.redhat.com/errata/RHBA-2025:????[RHBA-2025:??? {coo-full} 1.1.1]
|
||||
|
||||
|
||||
[id="cluster-observability-operator-1-1-1-bug-fixes_{context}"]
|
||||
=== Bug fixes
|
||||
|
||||
* Previously, `observability-operator` and `perses-operator` pods on many clusters entered a `CrashLoopBackOff` state due to `OutOfMemory` errors, after upgrading from {coo-full} 1.0. This release resolves the issue. (link:https://issues.redhat.com/browse/COO-784[*COO-784*])
|
||||
|
||||
[id="cluster-observability-operator-release-notes-1-1_{context}"]
|
||||
== {coo-full} 1.1
|
||||
|
||||
|
||||
The following advisory is available for {coo-full} 1.1:
|
||||
|
||||
* link:https://access.redhat.com/errata/RHBA-2025:4360[RHBA-2025:4360 {coo-full} 1.1]
|
||||
|
||||
|
||||
|
||||
[id="cluster-observability-operator-1-1-new-features-enhancements_{context}"]
|
||||
=== New features and enhancements
|
||||
|
||||
* You can now install the monitoring UI plugin using {coo-short}. (link:https://issues.redhat.com/browse/COO-262[*COO-262*])
|
||||
|
||||
* You can enable incident detection in the monitoring UI plugin. (link:https://issues.redhat.com/browse/COO-690[*COO-690*])
|
||||
|
||||
* TLS support for the Thanos web endpoint has been added. (link:https://issues.redhat.com/browse/COO-222[*COO-222*])
|
||||
|
||||
|
||||
|
||||
[id="cluster-observability-operator-1-1-known-issues_{context}"]
|
||||
=== Known issues
|
||||
|
||||
These are the known issues in {coo-full} 1.1.0:
|
||||
|
||||
* `observability-operator` and `perses-operator` pods enter a `CrashLoopBackOff` state due to `OutOfMemory` errors, after upgrading from {coo-full} 1.0.
|
||||
+
|
||||
A workaround is provided in the knowledge base article link:https://access.redhat.com/solutions/7113898[ClusterObservability and perses operator pod in CrashLoopBackOff due to OOMKilled in RHOCP4].
|
||||
+
|
||||
This issue is being tracked in link:https://issues.redhat.com/browse/COO-784[*COO-784*].
|
||||
|
||||
|
||||
[id="cluster-observability-operator-1-1-bug-fixes_{context}"]
|
||||
=== Bug fixes
|
||||
|
||||
* Previously, the logging UI plugin did not support setting a custom LokiStack name or namespace. This release resolves the issue. (link:https://issues.redhat.com/browse/COO-332[*COO-332*])
|
||||
|
||||
|
||||
[id="cluster-observability-operator-release-notes-1-0_{context}"]
|
||||
== {coo-full} 1.0
|
||||
|
||||
// Need to check if there is an advisory generated now that the build system has moved to Konflux
|
||||
// The following advisory is available for {coo-full} 1.0:
|
||||
//
|
||||
// * link:https://access.redhat.com/errata/RHSA-2024:????[RHEA-2024:??? {coo-full} 1.0]
|
||||
|
||||
|
||||
[id="cluster-observability-operator-1-0-new-features-enhancements_{context}"]
|
||||
=== New features and enhancements
|
||||
|
||||
* {coo-short} is now enabled for {product-title} platform monitoring. (link:https://issues.redhat.com/browse/COO-476[*COO-476*])
|
||||
** Implements HTTPS support for {coo-short} web server. (link:https://issues.redhat.com/browse/COO-480[*COO-480*])
|
||||
** Implements authn/authz for {coo-short} web server. (link:https://issues.redhat.com/browse/COO-481[*COO-481*])
|
||||
** Configures ServiceMonitor resource to collect metrics from {coo-short}. (link:https://issues.redhat.com/browse/COO-482[*COO-482*])
|
||||
** Adds `operatorframework.io/cluster-monitoring=true` annotation to the OLM bundle. (link:https://issues.redhat.com/browse/COO-483[*COO-483*])
|
||||
** Defines the alerting strategy for {coo-short} . (link:https://issues.redhat.com/browse/COO-484[*COO-484*])
|
||||
** Configures PrometheusRule for alerting. (link:https://issues.redhat.com/browse/COO-485[*COO-485*])
|
||||
|
||||
* Support level annotations have been added to the `UIPlugin` CR when created. The support level is based on the plugin type, with values of `DevPreview`, `TechPreview`, or `GeneralAvailability`. (link:https://issues.redhat.com/browse/COO-318[*COO-318*])
|
||||
|
||||
// must-gather postponed to 1.1
|
||||
//* You can now gather debugging information about {coo-short} by using the `oc adm must-gather` CLI command. (link:https://issues.redhat.com/browse/COO-194[*COO-194*])
|
||||
|
||||
* You can now configure the Alertmanager `scheme` and `tlsConfig` fields in the Prometheus CR. (link:https://issues.redhat.com/browse/COO-219[*COO-219*])
|
||||
|
||||
// Dev preview so cannot document
|
||||
//* You can now install the Monitoring UI plugin using {coo-short}. (link:https://issues.redhat.com/browse/COO-262[*COO-262*])
|
||||
|
||||
* The extended Technical Preview for the troubleshooting panel adds support for correlating traces with Kubernetes resources and directly with other observable signals including logs, alerts, metrics, and network events. (link:https://issues.redhat.com/browse/COO-450[*COO-450*])
|
||||
** You can select a Tempo instance and tenant when you navigate to the tracing page by clicking *Observe -> Tracing* in the web console. The preview troubleshooting panel only works with the `openshift-tracing / platform` instance and the `platform` tenant.
|
||||
** The troubleshooting panel works best in the *Administrator* perspective. It has limited functionality in the Developer perspective due to authorization issues with some back ends, most notably Prometheus for metrics and alerts. This will be addressed in a future release.
|
||||
|
||||
The following table provides information about which features are available depending on the version of {coo-full} and {product-title}:
|
||||
|
||||
[cols="1,1,1,1,1", options="header"]
|
||||
|===
|
||||
| COO Version | OCP Versions | Distributed Tracing | Logging | Troubleshooting Panel
|
||||
| 1.0 | 4.12 - 4.15 | ✔ | ✔ | ✘
|
||||
| 1.0 | 4.16+ | ✔ | ✔ | ✔
|
||||
|===
|
||||
|
||||
|
||||
[id="cluster-observability-operator-1-0-CVEs"]
|
||||
=== CVEs
|
||||
|
||||
* link:https://access.redhat.com/security/cve/CVE-2023-26159[CVE-2023-26159]
|
||||
* link:https://access.redhat.com/security/cve/CVE-2024-28849[CVE-2024-28849]
|
||||
* link:https://access.redhat.com/security/cve/CVE-2024-45338[CVE-2024-45338]
|
||||
|
||||
[id="cluster-observability-operator-1-0-bug-fixes_{context}"]
|
||||
=== Bug fixes
|
||||
|
||||
* Previously, the default namespace for the {coo-short} installation was `openshift-operators`. With this release, the defaullt namespace changes to `openshift-cluster-observability-operator`. (link:https://issues.redhat.com/browse/COO-32[*COO-32*])
|
||||
|
||||
* Previously, `korrel8r` was only able to parse time series selector expressions. With this release, `korrel8r` can parse any valid PromQL expression to extract the time series selectors that it uses for correlation. (link:https://issues.redhat.com/browse/COO-558[*COO-558*])
|
||||
|
||||
* Previously, when viewing a Tempo instance from the Distributed Tracing UI plugin, the scatter plot graph showing the traces duration was not rendered correctly. The bubble size was too large and overlapped the x and y axis. With this release, the graph is rendered correctly. (link:https://issues.redhat.com/browse/COO-319[*COO-319*])
|
||||
|
||||
== Features available on older, Technology Preview releases
|
||||
|
||||
The following table provides information about which features are available depending on older version of {coo-full} and {product-title}:
|
||||
|
||||
[cols="1,1,1,1,1,1", options="header"]
|
||||
|===
|
||||
| COO Version | OCP Versions | Dashboards | Distributed Tracing | Logging | Troubleshooting Panel
|
||||
|
||||
| 0.2.0 | 4.11 | ✔ | ✘ | ✘ | ✘
|
||||
| 0.3.0+, 0.4.0+ | 4.11 - 4.15 | ✔ | ✔ | ✔ | ✘
|
||||
| 0.3.0+, 0.4.0+ | 4.16+ | ✔ | ✔ | ✔ | ✔
|
||||
|===
|
||||
|
||||
[id="cluster-observability-operator-release-notes-0-4-1_{context}"]
|
||||
== {coo-full} 0.4.1
|
||||
|
||||
The following advisory is available for {coo-full} 0.4.1:
|
||||
|
||||
* link:https://access.redhat.com/errata/RHSA-2024:8040[RHEA-2024:8040 {coo-full} 0.4.1]
|
||||
|
||||
[id="cluster-observability-operator-0-4-1-new-features-enhancements_{context}"]
|
||||
=== New features and enhancements
|
||||
|
||||
* You can now configure WebTLS for Prometheus and Alertmanager.
|
||||
|
||||
[id="cluster-observability-operator-0-4-1-CVEs"]
|
||||
=== CVEs
|
||||
|
||||
* link:https://access.redhat.com/security/cve/CVE-2024-6104[CVE-2024-6104]
|
||||
* link:https://access.redhat.com/security/cve/CVE-2024-24786[CVE-2024-24786]
|
||||
|
||||
[id="cluster-observability-operator-0-4-1-bug-fixes_{context}"]
|
||||
=== Bug fixes
|
||||
|
||||
* Previously, when you deleted the dashboard UI plugin, the `consoles.operator.openshift.io` resource still contained `console-dashboards-plugin`. This release resolves the issue. (link:https://issues.redhat.com/browse/COO-152[*COO-152*])
|
||||
|
||||
* Previously, the web console did not display the correct icon for Red Hat {coo-short} . This release resolves the issue. (link:https://issues.redhat.com/browse/COO-353[*COO-353*])
|
||||
|
||||
* Previously, when you installed the {coo-short} from the web console, the support section contained an invalid link. This release resolves the issue. (link:https://issues.redhat.com/browse/COO-354[*COO-354*])
|
||||
|
||||
* Previously, the cluster service version (CSV) for {coo-short} linked to an unofficial version of the documentation. This release resolves the issue. (link:https://issues.redhat.com/browse/COO-356[*COO-356*])
|
||||
|
||||
[id="cluster-observability-operator-release-notes-0-4-0_{context}"]
|
||||
== {coo-full} 0.4.0
|
||||
|
||||
The following advisory is available for {coo-full} 0.4.0:
|
||||
|
||||
// Test the errata link just before publishing
|
||||
* link:https://access.redhat.com/errata/RHEA-2024:6699[RHEA-2024:6699 {coo-full} 0.4.0]
|
||||
|
||||
|
||||
[id="cluster-observability-operator-0-4-0-new-features-enhancements_{context}"]
|
||||
=== New features and enhancements
|
||||
|
||||
// COO-264
|
||||
==== Troubleshooting UI plugin
|
||||
|
||||
* The troubleshooting UI panel has been improved so you can now select and focus on a specific starting signal.
|
||||
* There is more visibility into Korrel8r queries, with the option of selecting the depth.
|
||||
* Users of {product-title} version 4.17+ can access the troubleshooting UI panel from the Application Launcher {launch}. Alternatively, on versions 4.16+, you can access it in the web console by clicking on **Observe** -> **Alerting**.
|
||||
|
||||
For more information, see xref:../../observability/cluster_observability_operator/ui_plugins/troubleshooting-ui-plugin.adoc#troubleshooting-ui-plugin[troubleshooting UI plugin].
|
||||
|
||||
// COO-263
|
||||
==== Distributed tracing UI plugin
|
||||
|
||||
* The distributed tracing UI plugin has been enhanced, with a Gantt chart now available for exploring traces.
|
||||
|
||||
For more information, see xref:../../observability/cluster_observability_operator/ui_plugins/distributed-tracing-ui-plugin.adoc#distributed-tracing-ui-plugin[distributed tracing UI plugin].
|
||||
|
||||
[id="cluster-observability-operator-0-4-0-bug-fixes_{context}"]
|
||||
=== Bug fixes
|
||||
|
||||
* Previously, metrics were not available to normal users when accessed in the Developer perspective of the web console, by clicking on **Observe** -> **Logs**.
|
||||
This release resolves the issue. (link:https://issues.redhat.com/browse/COO-288[*COO-288*])
|
||||
|
||||
* Previously, the troubleshooting UI plugin used the wrong filter for network observability.
|
||||
This release resolves the issue. (link:https://issues.redhat.com/browse/COO-299[*COO-299*])
|
||||
|
||||
* Previously, the troubleshooting UI plugin generated an incorrect URL for pod label searches.
|
||||
This release resolves the issue. (link:https://issues.redhat.com/browse/COO-298[*COO-298*])
|
||||
|
||||
* Previously, there was an authorization vulnerability in the Distributed tracing UI plugin.
|
||||
This release resolves the issue and the Distributed tracing UI plugin has been hardened by using only multi-tenant `TempoStack` and `TempoMonolithic` instances going forward.
|
||||
|
||||
[id="cluster-observability-operator-release-notes-0-3-2_{context}"]
|
||||
== {coo-full} 0.3.2
|
||||
|
||||
The following advisory is available for {coo-full} 0.3.2:
|
||||
|
||||
* link:https://access.redhat.com/errata/RHEA-2024:5985[RHEA-2024:5985 {coo-full} 0.3.2]
|
||||
|
||||
|
||||
[id="cluster-observability-operator-0-3-2-new-features-enhancements_{context}"]
|
||||
=== New features and enhancements
|
||||
|
||||
* With this release, you can now use tolerations and node selectors with `MonitoringStack` components.
|
||||
|
||||
|
||||
[id="cluster-observability-operator-0-3-2-bug-fixes_{context}"]
|
||||
=== Bug fixes
|
||||
|
||||
* Previously, the logging UIPlugin was not in the `Available` state and the logging pod was not created, when installed on a specific version of {product-title}.
|
||||
This release resolves the issue. (link:https://issues.redhat.com/browse/COO-260[*COO-260*])
|
||||
|
||||
[id="cluster-observability-operator-release-notes-0-3-0"]
|
||||
== {coo-full} 0.3.0
|
||||
The following advisory is available for {coo-full} 0.3.0:
|
||||
|
||||
* link:https://access.redhat.com/errata/RHEA-2024:4399[RHEA-2024:4399 {coo-full} 0.3.0]
|
||||
|
||||
[id="cluster-observability-operator-0-3-0-new-features-enhancements"]
|
||||
=== New features and enhancements
|
||||
* With this release, the {coo-full} adds backend support for future {product-title} observability web console UI plugins and observability components.
|
||||
|
||||
[id="cluster-observability-operator-release-notes-0-2-0"]
|
||||
== {coo-full} 0.2.0
|
||||
The following advisory is available for {coo-full} 0.2.0:
|
||||
|
||||
* link:https://access.redhat.com/errata/RHEA-2024:2662[RHEA-2024:2662 {coo-full} 0.2.0]
|
||||
|
||||
[id="cluster-observability-operator-0-2-0-new-features-enhancements"]
|
||||
=== New features and enhancements
|
||||
* With this release, the {coo-full} supports installing and managing observability-related plugins for the {product-title} web console user interface (UI). (link:https://issues.redhat.com/browse/COO-58[*COO-58*])
|
||||
|
||||
[id="cluster-observability-operator-release-notes-0-1-3"]
|
||||
== {coo-full} 0.1.3
|
||||
The following advisory is available for {coo-full} 0.1.3:
|
||||
|
||||
* link:https://access.redhat.com/errata/RHEA-2024:1744[RHEA-2024:1744 {coo-full} 0.1.3]
|
||||
|
||||
[id="cluster-observability-operator-0-1-3-bug-fixes"]
|
||||
=== Bug fixes
|
||||
|
||||
* Previously, if you tried to access the Prometheus web user interface (UI) at `\http://<prometheus_url>:9090/graph`, the following error message would display: `Error opening React index.html: open web/ui/static/react/index.html: no such file or directory`.
|
||||
This release resolves the issue, and the Prometheus web UI now displays correctly. (link:https://issues.redhat.com/browse/COO-34[*COO-34*])
|
||||
|
||||
[id="cluster-observability-operator-release-notes-0-1-2"]
|
||||
== {coo-full} 0.1.2
|
||||
The following advisory is available for {coo-full} 0.1.2:
|
||||
|
||||
* link:https://access.redhat.com/errata/RHEA-2024:1534[RHEA-2024:1534 {coo-full} 0.1.2]
|
||||
|
||||
[id="cluster-observability-operator-0-1-2-CVEs"]
|
||||
=== CVEs
|
||||
|
||||
* link:https://access.redhat.com/security/cve/CVE-2023-45142[CVE-2023-45142]
|
||||
|
||||
[id="cluster-observability-operator-0-1-2-bug-fixes"]
|
||||
=== Bug fixes
|
||||
|
||||
* Previously, certain cluster service version (CSV) annotations were not included in the metadata for {coo-short}.
|
||||
Because of these missing annotations, certain {coo-short} features and capabilities did not appear in the package manifest or in the OperatorHub user interface.
|
||||
This release adds the missing annotations, thereby resolving this issue. (link:https://issues.redhat.com/browse/COO-11[*COO-11*])
|
||||
|
||||
* Previously, automatic updates of the {coo-short} did not work, and a newer version of the Operator did not automatically replace the older version, even though the newer version was available in OperatorHub.
|
||||
This release resolves the issue. (link:https://issues.redhat.com/browse/COO-12[*COO-12*])
|
||||
|
||||
* Previously, Thanos Querier only listened for network traffic on port 9090 of 127.0.0.1 (`localhost`), which resulted in a `502 Bad Gateway` error if you tried to reach the Thanos Querier service.
|
||||
With this release, the Thanos Querier configuration has been updated so that the component now listens on the default port (10902), thereby resolving the issue.
|
||||
As a result of this change, you can also now modify the port via server side apply (SSA) and add a proxy chain, if required. (link:https://issues.redhat.com/browse/COO-14[*COO-14*])
|
||||
|
||||
[id="cluster-observability-operator-release-notes-0-1-1"]
|
||||
== {coo-full} 0.1.1
|
||||
The following advisory is available for {coo-full} 0.1.1:
|
||||
|
||||
* link:https://access.redhat.com/errata/RHEA-2024:0550[2024:0550 {coo-full} 0.1.1]
|
||||
|
||||
[id="cluster-observability-operator-0-1-1-new-features-enhancements"]
|
||||
=== New features and enhancements
|
||||
This release updates the {coo-full} to support installing the Operator in restricted networks or disconnected environments.
|
||||
|
||||
[id="cluster-observability-operator-release-notes-0-1"]
|
||||
== {coo-full} 0.1
|
||||
|
||||
This release makes a Technology Preview version of the {coo-full} available on OperatorHub.
|
||||
@@ -0,0 +1,30 @@
|
||||
:_mod-docs-content-type: ASSEMBLY
|
||||
[id="configuring-the-cluster-observability-operator-to-monitor-a-service"]
|
||||
= Configuring the {coo-full} to monitor a service
|
||||
include::_attributes/common-attributes.adoc[]
|
||||
:context: configuring_the_cluster_observability_operator_to_monitor_a_service
|
||||
|
||||
toc::[]
|
||||
|
||||
You can monitor metrics for a service by configuring monitoring stacks managed by the {coo-first}.
|
||||
|
||||
To test monitoring a service, follow these steps:
|
||||
|
||||
* Deploy a sample service that defines a service endpoint.
|
||||
* Create a `ServiceMonitor` object that specifies how the service is to be monitored by the {coo-short}.
|
||||
* Create a `MonitoringStack` object to discover the `ServiceMonitor` object.
|
||||
|
||||
// Deploy a sample service for Cluster Observability Operator
|
||||
include::modules/monitoring-deploying-a-sample-service-for-cluster-observability-operator.adoc[leveloffset=+1]
|
||||
|
||||
// Specify how the sample COO service is monitored
|
||||
include::modules/monitoring-specifying-how-a-service-is-monitored-by-cluster-observability-operator.adoc[leveloffset=+1]
|
||||
|
||||
// Create a MonitoringStack object to discover the service monitor
|
||||
include::modules/monitoring-creating-a-monitoringstack-object-for-cluster-observability-operator.adoc[leveloffset=+1]
|
||||
|
||||
// Validate a MonitoringStack
|
||||
include::modules/monitoring-validating-a-monitoringstack-for-cluster-observability-operator.adoc[leveloffset=+1]
|
||||
|
||||
// Scrape targets in multiple namespaces
|
||||
include::modules/monitoring-scrape-targets-in-multiple-namespaces.adoc[leveloffset=+1]
|
||||
@@ -0,0 +1,19 @@
|
||||
:_mod-docs-content-type: ASSEMBLY
|
||||
[id="installing-cluster-observability-operators"]
|
||||
= Installing the {coo-full}
|
||||
include::_attributes/common-attributes.adoc[]
|
||||
:context: installing_the_cluster_observability_operator
|
||||
|
||||
toc::[]
|
||||
|
||||
As a cluster administrator, you can install or remove the {coo-first} from the software catalog by using the {product-title} web console.
|
||||
The software catalog is a user interface that works in conjunction with Operator Lifecycle Manager (OLM), which installs and manages Operators on a cluster.
|
||||
|
||||
// Installing the COO using the OCP web console
|
||||
include::modules/monitoring-installing-cluster-observability-operator-using-the-web-console.adoc[leveloffset=+1]
|
||||
|
||||
.Additional resources
|
||||
xref:../../operators/admin/olm-adding-operators-to-cluster.adoc#olm-adding-operators-to-a-cluster[Adding Operators to a cluster]
|
||||
|
||||
// Uninstalling COO using the OCP web console
|
||||
include::modules/monitoring-uninstalling-cluster-observability-operator-using-the-web-console.adoc[leveloffset=+1]
|
||||
@@ -0,0 +1 @@
|
||||
../../../_attributes/
|
||||
@@ -0,0 +1,21 @@
|
||||
:_mod-docs-content-type: ASSEMBLY
|
||||
[id="dashboard-ui-plugin"]
|
||||
= Dashboard UI plugin
|
||||
include::_attributes/common-attributes.adoc[]
|
||||
:context: dashboard-ui-plugin
|
||||
|
||||
toc::[]
|
||||
|
||||
The dashboard UI plugin supports enhanced dashboards in the {product-title} web console at *Observe* -> *Dashboards* . You can add other Prometheus datasources from the cluster to the default dashboards, in addition to the in-cluster datasource. This results in a unified observability experience across different data sources.
|
||||
|
||||
The plugin searches for datasources from `ConfigMap` resources in the `openshift-config-managed` namespace, that have the label `console.openshift.io/dashboard-datasource: 'true'`.
|
||||
|
||||
include::modules/coo-dashboard-ui-plugin-install.adoc[leveloffset=+1]
|
||||
|
||||
include::modules/coo-dashboard-ui-plugin-configure.adoc[leveloffset=+1]
|
||||
|
||||
[role="_additional-resources"]
|
||||
[id="additional-resources_{context}"]
|
||||
== Additional resources
|
||||
|
||||
* See how to link:https://github.com/openshift/console-dashboards-plugin/blob/main/docs/add-datasource.md[add a new datasource] in the link:https://github.com/openshift/console-dashboards-plugin[console-dashboards-plugin] GitHub repository.
|
||||
@@ -0,0 +1,15 @@
|
||||
:_mod-docs-content-type: ASSEMBLY
|
||||
[id="distributed-tracing-ui-plugin"]
|
||||
= Distributed tracing UI plugin
|
||||
include::_attributes/common-attributes.adoc[]
|
||||
:context: distributed-tracing-ui-plugin
|
||||
|
||||
toc::[]
|
||||
|
||||
include::snippets/unified-perspective-web-console.adoc[]
|
||||
|
||||
The distributed tracing UI plugin adds tracing-related features to the {product-title} web console at **Observe** -> **Traces**. You can follow requests through the front end and into the backend of microservices, helping you identify code errors and performance bottlenecks in distributed systems.
|
||||
|
||||
include::modules/coo-distributed-tracing-ui-plugin-install.adoc[leveloffset=+1]
|
||||
|
||||
include::modules/coo-distributed-tracing-ui-plugin-using.adoc[leveloffset=+1]
|
||||
1
observability/cluster_observability_operator/ui_plugins/images
Symbolic link
1
observability/cluster_observability_operator/ui_plugins/images
Symbolic link
@@ -0,0 +1 @@
|
||||
../../../images/
|
||||
@@ -0,0 +1,34 @@
|
||||
:_mod-docs-content-type: ASSEMBLY
|
||||
[id="logging-ui-plugin"]
|
||||
= Logging UI plugin
|
||||
include::_attributes/common-attributes.adoc[]
|
||||
:context: logging-ui-plugin
|
||||
|
||||
toc::[]
|
||||
|
||||
include::snippets/unified-perspective-web-console.adoc[]
|
||||
|
||||
The logging UI plugin surfaces logging data in the {product-title} web console on the *Observe* -> *Logs* page.
|
||||
You can specify filters, queries, time ranges and refresh rates, with the results displayed as a list of collapsed logs, which can then be expanded to show more detailed information for each log.
|
||||
|
||||
If you also deploy the Troubleshooting UI plugin on {product-title} version 4.16+, it connects to the Korrel8r service and adds direct links to the web console, from the **Observe** -> **Logs** page, to the **Observe** -> **Metrics** page with a correlated PromQL query. The plugin also adds a **See Related Logs** link from the web console alerting detail page, at **Observe** -> **Alerting**, to the **Observe** -> **Logs** page with a correlated filter set selected.
|
||||
|
||||
The features of the plugin are categorized as:
|
||||
|
||||
dev-console:: Adds the logging view to the web console.
|
||||
alerts:: Merges the web console alerts with log-based alerts defined in the Loki ruler. Adds a log-based metrics chart in the alert detail view.
|
||||
dev-alerts:: Merges the web console alerts with log-based alerts defined in the Loki ruler. Adds a log-based metrics chart in the alert detail view for the web console.
|
||||
|
||||
|
||||
For {coo-first} versions, the support for these features in {product-title} versions is shown in the following table:
|
||||
|
||||
[cols="1,1,3", options="header"]
|
||||
|===
|
||||
| COO version | OCP versions | Features
|
||||
|
||||
| 0.3.0+ | 4.12 | `dev-console`
|
||||
| 0.3.0+ | 4.13 | `dev-console`, `alerts`
|
||||
| 0.3.0+ | 4.14+ | `dev-console`, `alerts`, `dev-alerts`
|
||||
|===
|
||||
|
||||
include::modules/coo-logging-ui-plugin-install.adoc[leveloffset=+1]
|
||||
1
observability/cluster_observability_operator/ui_plugins/modules
Symbolic link
1
observability/cluster_observability_operator/ui_plugins/modules
Symbolic link
@@ -0,0 +1 @@
|
||||
../../../modules/
|
||||
@@ -0,0 +1,26 @@
|
||||
:_mod-docs-content-type: ASSEMBLY
|
||||
[id="monitoring-ui-plugin"]
|
||||
= Monitoring UI plugin
|
||||
include::_attributes/common-attributes.adoc[]
|
||||
:context: monitoring-ui-plugin
|
||||
|
||||
toc::[]
|
||||
|
||||
:FeatureName: The {coo-full} monitoring UI plugin
|
||||
include::snippets/technology-preview.adoc[leveloffset=+2]
|
||||
|
||||
include::snippets/unified-perspective-web-console.adoc[]
|
||||
|
||||
The monitoring UI plugin adds monitoring features to the {product-title} web console.
|
||||
|
||||
* **{rh-rhacm}:** The monitoring plugin in {coo-first} allows it to function in {rh-rhacm-first} environments, providing {rh-rhacm} with the same alerting capabilities as {product-title}. You can configure the plugin to fetch alerts from the {rh-rhacm} Alertmanager backend. This enables seamless integration and user experience by aligning {rh-rhacm} and {product-title} monitoring workflows.
|
||||
|
||||
* **Incident detection:** The incident detection feature groups related alerts into incidents, to help you identify the root causes of alert bursts, instead of being overwhelmed by individual alerts. It presents a timeline of incidents, color-coded by severity, and you can drill down into the individual alerts within an incident. The system also categorizes alerts by affected component, grouped by severity. This helps you focus on the most critical areas first.
|
||||
+
|
||||
The incident detection feature is available in the {product-title} web console at **Observe** → **Incidents**.
|
||||
|
||||
include::modules/coo-monitoring-ui-plugin-install.adoc[leveloffset=+1]
|
||||
|
||||
include::modules/coo-incident-detection-overview.adoc[leveloffset=+1]
|
||||
|
||||
include::modules/coo-incident-detection-using.adoc[leveloffset=+1]
|
||||
@@ -0,0 +1,64 @@
|
||||
:_mod-docs-content-type: ASSEMBLY
|
||||
[id="observability-ui-plugins-overview"]
|
||||
= Observability UI plugins overview
|
||||
include::_attributes/common-attributes.adoc[]
|
||||
:context: observability-ui-plugins-overview
|
||||
|
||||
toc::[]
|
||||
|
||||
You can use the {coo-first} to install and manage UI plugins to enhance the observability capabilities of the {product-title} web console.
|
||||
The plugins extend the default functionality, providing new UI features for troubleshooting, distributed tracing, and cluster logging.
|
||||
|
||||
[id="monitoring_{context}"]
|
||||
== Monitoring
|
||||
|
||||
The monitoring UI plugin adds monitoring related UI features to the {product-title} web console, for the Advance Cluster Management (ACM) perspective and for incident detection.
|
||||
|
||||
* **ACM:** The monitoring plugin in {coo-first} allows it to function in {rh-rhacm-first} environments, providing ACM with the same monitoring capabilities as {product-title}.
|
||||
|
||||
* **Incident Detection:** The incident detection feature groups alerts into incidents to help you identify the root causes of alert bursts instead of being overwhelmed by individual alerts. It presents a timeline of incidents, color-coded by severity, and you can drill down into the individual alerts within an incident. The system also categorizes alerts by affected component to help you focus on the most critical areas first.
|
||||
|
||||
For more information, see the xref:../../../observability/cluster_observability_operator/ui_plugins/monitoring-ui-plugin.adoc#monitoring-ui-plugin[monitoring UI plugin] page.
|
||||
|
||||
[id="cluster-logging_{context}"]
|
||||
== Cluster logging
|
||||
|
||||
The logging UI plugin surfaces logging data in the web console on the *Observe* -> *Logs* page.
|
||||
You can specify filters, queries, time ranges and refresh rates. The results displayed a list of collapsed logs, which can then be expanded to show more detailed information for each log.
|
||||
|
||||
For more information, see the xref:../../../observability/cluster_observability_operator/ui_plugins/logging-ui-plugin.adoc#logging-ui-plugin[logging UI plugin] page.
|
||||
|
||||
[id="troubleshooting_{context}"]
|
||||
== Troubleshooting
|
||||
|
||||
:FeatureName: The {coo-full} troubleshooting panel UI plugin
|
||||
include::snippets/technology-preview.adoc[leveloffset=+2]
|
||||
|
||||
The troubleshooting panel UI plugin for {product-title} version 4.16+ provides observability signal correlation, powered by the open source Korrel8r project.
|
||||
You can use the troubleshooting panel available from the *Observe* -> *Alerting* page to easily correlate metrics, logs, alerts, netflows, and additional observability signals and resources, across different data stores.
|
||||
Users of {product-title} version 4.17+ can also access the troubleshooting UI panel from the Application Launcher {launch}.
|
||||
|
||||
The output of Korrel8r is displayed as an interactive node graph. When you click on a node, you are automatically redirected to the corresponding web console page with the specific information for that node, for example, metric, log, or pod.
|
||||
|
||||
For more information, see the xref:../../../observability/cluster_observability_operator/ui_plugins/troubleshooting-ui-plugin.adoc#troubleshooting-ui-plugin[troubleshooting UI plugin] page.
|
||||
|
||||
[id="distributed-tracing_{context}"]
|
||||
== Distributed tracing
|
||||
|
||||
The distributed tracing UI plugin adds tracing-related features to the web console on the *Observe* -> *Traces* page.
|
||||
You can follow requests through the front end and into the backend of microservices, helping you identify code errors and performance bottlenecks in distributed systems.
|
||||
You can select a supported `TempoStack` or `TempoMonolithic` multi-tenant instance running in the cluster and set a time range and query to view the trace data.
|
||||
|
||||
For more information, see the xref:../../../observability/cluster_observability_operator/ui_plugins/distributed-tracing-ui-plugin.adoc#distributed-tracing-ui-plugin[distributed tracing UI plugin] page.
|
||||
|
||||
////
|
||||
[id="dashboards_{context}"]
|
||||
== Dashboards
|
||||
|
||||
The dashboard UI plugin supports enhanced dashboards in the {product-title} web console at *Observe* -> *Dashboards*.
|
||||
You can add other Prometheus data sources from the cluster to the default dashboards, in addition to the in-cluster data source.
|
||||
This results in a unified observability experience across different data sources.
|
||||
|
||||
For more information, see the xref :../../../observability/cluster_observability_operator/ui_plugins/dashboard-ui-plugin.adoc#dashboard-ui-plugin[dashboard UI plugin] page.
|
||||
|
||||
////
|
||||
1
observability/cluster_observability_operator/ui_plugins/snippets
Symbolic link
1
observability/cluster_observability_operator/ui_plugins/snippets
Symbolic link
@@ -0,0 +1 @@
|
||||
../../../snippets/
|
||||
@@ -0,0 +1,26 @@
|
||||
:_mod-docs-content-type: ASSEMBLY
|
||||
[id="troubleshooting-ui-plugin"]
|
||||
= Troubleshooting UI plugin
|
||||
include::_attributes/common-attributes.adoc[]
|
||||
:context: troubleshooting-ui-plugin
|
||||
|
||||
toc::[]
|
||||
|
||||
:FeatureName: The {coo-full} troubleshooting panel UI plugin
|
||||
include::snippets/technology-preview.adoc[leveloffset=+2]
|
||||
|
||||
The troubleshooting UI plugin for {product-title} version 4.16+ provides observability signal correlation, powered by the open source Korrel8r project.
|
||||
With the troubleshooting panel that is available under *Observe* -> *Alerting*, you can easily correlate metrics, logs, alerts, netflows, and additional observability signals and resources, across different data stores.
|
||||
Users of {product-title} version 4.17+ can also access the troubleshooting UI panel from the Application Launcher {launch}.
|
||||
|
||||
When you install the troubleshooting UI plugin, a link:https://github.com/korrel8r/korrel8r[Korrel8r] service named `korrel8r` is deployed in the same namespace, and it is able to locate related observability signals and Kubernetes resources from its correlation engine.
|
||||
|
||||
The output of Korrel8r is displayed in the form of an interactive node graph in the {product-title} web console.
|
||||
Nodes in the graph represent a type of resource or signal, while edges represent relationships.
|
||||
When you click on a node, you are automatically redirected to the corresponding web console page with the specific information for that node, for example, metric, log, pod.
|
||||
|
||||
include::modules/coo-troubleshooting-ui-plugin-install.adoc[leveloffset=+1]
|
||||
|
||||
include::modules/coo-troubleshooting-ui-plugin-using.adoc[leveloffset=+1]
|
||||
|
||||
include::modules/coo-troubleshooting-ui-plugin-creating-alert.adoc[leveloffset=+1]
|
||||
@@ -12,7 +12,7 @@ include::modules/distr-tracing-tempo-key-concepts-in-distributed-tracing.adoc[le
|
||||
.Additional resources
|
||||
// xreffing to the installation page until further notice because OTEL content is currently planned for internal restructuring across pages that is likely to result in renamed page files
|
||||
* xref:../../observability/otel/otel-installing.adoc#install-otel[{OTELName}]
|
||||
* link:https://docs.redhat.com/en/documentation/red_hat_openshift_cluster_observability_operator/1-latest/html/ui_plugins_for_red_hat_openshift_cluster_observability_operator/distributed-tracing-ui-plugin[Distributed tracing UI plugin]
|
||||
* xref:../../observability/cluster_observability_operator/ui_plugins/distributed-tracing-ui-plugin.adoc#distributed-tracing-ui-plugin[Distributed tracing UI plugin]
|
||||
|
||||
include::modules/distr-tracing-features.adoc[leveloffset=+1]
|
||||
|
||||
@@ -25,5 +25,5 @@ include::modules/distr-tracing-architecture.adoc[leveloffset=+1]
|
||||
|
||||
// xreffing to the installation page until further notice because OTEL content is currently planned for internal restructuring across pages that is likely to result in renamed page files
|
||||
* xref:../../observability/otel/otel-installing.adoc#install-otel[{OTELName}]
|
||||
* link:https://docs.redhat.com/en/documentation/red_hat_openshift_cluster_observability_operator/1-latest/html/ui_plugins_for_red_hat_openshift_cluster_observability_operator/distributed-tracing-ui-plugin[Distributed tracing UI plugin]
|
||||
* xref:../../observability/cluster_observability_operator/ui_plugins/distributed-tracing-ui-plugin.adoc#distributed-tracing-ui-plugin[Distributed tracing UI plugin]
|
||||
////
|
||||
@@ -37,7 +37,7 @@ include::modules/distr-tracing-tempo-coo-ui-plugin.adoc[leveloffset=+1]
|
||||
|
||||
[role="_additional-resources"]
|
||||
.Additional resources
|
||||
* link:https://docs.redhat.com/en/documentation/red_hat_openshift_cluster_observability_operator/1-latest/html/ui_plugins_for_red_hat_openshift_cluster_observability_operator/distributed-tracing-ui-plugin[Distributed tracing UI plugin]
|
||||
* xref:../../observability/cluster_observability_operator/ui_plugins/distributed-tracing-ui-plugin.adoc#distributed-tracing-ui-plugin[Distributed tracing UI plugin]
|
||||
|
||||
include::modules/distr-tracing-tempo-config-spanmetrics.adoc[leveloffset=+1]
|
||||
|
||||
@@ -64,7 +64,7 @@ include::modules/distr-tracing-tempo-config-receiver-tls-for-tempomonolithic.ado
|
||||
.Additional resources
|
||||
|
||||
* xref:../../security/certificates/service-serving-certificate.adoc#understanding-service-serving_service-serving-certificate[Understanding service serving certificates]
|
||||
* xref:../../security/certificate_types_descriptions/service-ca-certificates.adoc#cert-types-service-ca-certificates[Service CA certificates]
|
||||
* xref:../../security/certificate_types_descriptions/service-ca-certificates.adoc#cert-types-service-ca-certificates[Service CA certificates]
|
||||
|
||||
include::modules/distr-tracing-tempo-config-query-rbac.adoc[leveloffset=+1]
|
||||
|
||||
|
||||
@@ -8,7 +8,7 @@ include::_attributes/attributes-openshift-dedicated.adoc[]
|
||||
toc::[]
|
||||
|
||||
ifndef::openshift-rosa,openshift-rosa-hcp[]
|
||||
Visualization for logging is provided by deploying the link:https://docs.redhat.com/en/documentation/red_hat_openshift_cluster_observability_operator/1-latest/html/ui_plugins_for_red_hat_openshift_cluster_observability_operator/logging-ui-plugin[Logging UI Plugin] of the link:link:https://docs.redhat.com/en/documentation/red_hat_openshift_cluster_observability_operator/1-latest/html/about_red_hat_openshift_cluster_observability_operator/index[Cluster Observability Operator], which requires Operator installation.
|
||||
Visualization for logging is provided by deploying the xref:../../../observability/cluster_observability_operator/ui_plugins/logging-ui-plugin.adoc#logging-ui-plugin[Logging UI Plugin] of the xref:../../../observability/cluster_observability_operator/cluster-observability-operator-overview.adoc#cluster-observability-operator-overview[Cluster Observability Operator], which requires Operator installation.
|
||||
endif::openshift-rosa,openshift-rosa-hcp[]
|
||||
ifdef::openshift-rosa,openshift-rosa-hcp[]
|
||||
Visualization for logging is provided by deploying the Logging UI Plugin of the Cluster Observability Operator, which requires Operator installation.
|
||||
|
||||
@@ -8,7 +8,7 @@ include::_attributes/attributes-openshift-dedicated.adoc[]
|
||||
toc::[]
|
||||
|
||||
ifndef::openshift-rosa,openshift-rosa-hcp[]
|
||||
Visualization for logging is provided by deploying the link:https://docs.redhat.com/en/documentation/red_hat_openshift_cluster_observability_operator/1-latest/html/ui_plugins_for_red_hat_openshift_cluster_observability_operator/logging-ui-plugin[Logging UI Plugin] of the link:link:https://docs.redhat.com/en/documentation/red_hat_openshift_cluster_observability_operator/1-latest/html/about_red_hat_openshift_cluster_observability_operator/index[Cluster Observability Operator], which requires Operator installation.
|
||||
Visualization for logging is provided by deploying the xref:../../../observability/cluster_observability_operator/ui_plugins/logging-ui-plugin.adoc#logging-ui-plugin[Logging UI Plugin] of the xref:../../../observability/cluster_observability_operator/cluster-observability-operator-overview.adoc#cluster-observability-operator-overview[Cluster Observability Operator], which requires Operator installation.
|
||||
endif::openshift-rosa,openshift-rosa-hcp[]
|
||||
ifdef::openshift-rosa,openshift-rosa-hcp[]
|
||||
Visualization for logging is provided by deploying the Logging UI Plugin of the Cluster Observability Operator, which requires Operator installation.
|
||||
|
||||
@@ -8,7 +8,7 @@ include::_attributes/attributes-openshift-dedicated.adoc[]
|
||||
toc::[]
|
||||
|
||||
ifndef::openshift-rosa,openshift-rosa-hcp[]
|
||||
Visualization for logging is provided by deploying the link:https://docs.redhat.com/en/documentation/red_hat_openshift_cluster_observability_operator/1-latest/html/ui_plugins_for_red_hat_openshift_cluster_observability_operator/logging-ui-plugin[Logging UI Plugin] of the link:link:https://docs.redhat.com/en/documentation/red_hat_openshift_cluster_observability_operator/1-latest/html/about_red_hat_openshift_cluster_observability_operator/index[Cluster Observability Operator], which requires Operator installation.
|
||||
Visualization for logging is provided by deploying the xref:../../../observability/cluster_observability_operator/ui_plugins/logging-ui-plugin.adoc#logging-ui-plugin[Logging UI Plugin] of the xref:../../../observability/cluster_observability_operator/cluster-observability-operator-overview.adoc#cluster-observability-operator-overview[Cluster Observability Operator], which requires Operator installation.
|
||||
endif::openshift-rosa,openshift-rosa-hcp[]
|
||||
ifdef::openshift-rosa,openshift-rosa-hcp[]
|
||||
Visualization for logging is provided by deploying the Logging UI Plugin of the Cluster Observability Operator, which requires Operator installation.
|
||||
|
||||
@@ -1,9 +1,9 @@
|
||||
:_mod-docs-content-type: ASSEMBLY
|
||||
[id="shiftstack-prometheus-configuration"]
|
||||
= Monitoring clusters that run on RHOSO
|
||||
include::_attributes/common-attributes.adoc[]
|
||||
include::_attributes/common-attributes.adoc[]
|
||||
:context: shiftstack-prometheus-configuration
|
||||
|
||||
|
||||
toc::[]
|
||||
|
||||
You can correlate observability metrics for clusters that run on {rhoso-first}. By collecting metrics from both environments, you can monitor and troubleshoot issues across the infrastructure and application layers.
|
||||
@@ -31,4 +31,4 @@ include::modules/monitoring-shiftstack-metrics.adoc[leveloffset=+1]
|
||||
[role="_additional-resources"]
|
||||
[id="additional-resources_{context}"]
|
||||
== Additional resources
|
||||
* link:https://docs.redhat.com/en/documentation/red_hat_openshift_cluster_observability_operator/1-latest/html/about_red_hat_openshift_cluster_observability_operator/index[Cluster Observability Operator overview]
|
||||
* xref:../../observability/cluster_observability_operator/cluster-observability-operator-overview.adoc#understanding-the-cluster-observability-operator_cluster_observability_operator_overview[Cluster Observability Operator overview]
|
||||
Reference in New Issue
Block a user