1
0
mirror of https://github.com/openshift/openshift-docs.git synced 2026-02-05 12:46:18 +01:00

Power Monitoring Tech Preview

This commit is contained in:
shreyasiddhartha
2023-11-16 19:33:43 +05:30
committed by openshift-cherrypick-robot
parent dc5d8b4589
commit f25c2be21c
25 changed files with 566 additions and 0 deletions

View File

@@ -148,6 +148,13 @@ endif::[]
:loki-op: Loki Operator
:es-op: OpenShift Elasticsearch Operator
:log-plug: logging subsystem Console plugin
//power monitoring
:PM-title-c: Power monitoring for Red Hat OpenShift
:PM-title: power monitoring for Red Hat OpenShift
:PM-shortname: power monitoring
:PM-shortname-c: Power monitoring
:PM-operator: Power monitoring Operator
:PM-kepler: Kepler
//serverless
:ServerlessProductName: OpenShift Serverless
:ServerlessProductShortName: Serverless

View File

@@ -2698,6 +2698,23 @@ Topics:
- Name: Configuring the Cluster Observability Operator to monitor a service
File: configuring-the-cluster-observability-operator-to-monitor-a-service
---
Name: Power monitoring
Dir: power_monitoring
Distros: openshift-enterprise,openshift-origin
Topics:
- Name: Power monitoring release notes
File: power-monitoring-release-notes
- Name: Power monitoring overview
File: power-monitoring-overview
- Name: Installing power monitoring
File: installing-power-monitoring
- Name: Configuring power monitoring
File: configuring-power-monitoring
- Name: Visualizing power monitoring metrics
File: visualizing-power-monitoring-metrics
- Name: Uninstalling power monitoring
File: uninstalling-power-monitoring
---
Name: Distributed tracing
Dir: distr_tracing
Distros: openshift-enterprise

Binary file not shown.

After

Width:  |  Height:  |  Size: 190 KiB

View File

@@ -0,0 +1,9 @@
// Module included in the following assemblies:
//
// * power_monitoring/power-monitoring-overview.adoc
:_mod-docs-content-type: CONCEPT
[id="power-monitoring-about-power-monitoring_{context}"]
= About {PM-shortname}
You can use {PM-title} to monitor the power usage and identify power-consuming containers running in an {product-title} cluster. {PM-shortname-c} collects and exports energy-related system statistics from various components, such as CPU and DRAM. It provides granular power consumption data for Kubernetes pods, namespaces, and nodes.

View File

@@ -0,0 +1,25 @@
// Module included in the following assemblies:
// * power_monitoring/visualizing-power-monitoring-metrics.adoc
:_mod-docs-content-type: PROCEDURE
[id="power-monitoring-accessing-dashboards_{context}"]
= Accessing {PM-shortname} dashboards
You can access {PM-shortname} dashboards from the *Administrator* perspective of the {product-title} web console.
.Prerequisites
* You have access to the {product-title} web console.
* You are logged in as a user with the `cluster-admin` role.
* You have installed the {PM-operator}.
* You have deployed {PM-kepler} in your cluster.
* You have enabled monitoring for user-defined projects.
.Procedure
. In the *Administrator* perspective of the web console, go to *Observe* -> *Dashboards*.
. From the *Dashboard* drop-down list, select the {PM-shortname} dashboard you want to see:
** *Power Monitoring / Overview*
** *Power Monitoring / Namespace*

View File

@@ -0,0 +1,45 @@
// Module included in the following assemblies:
// * power_monitoring/visualizing-power-monitoring-metrics.adoc
:_mod-docs-content-type: CONCEPT
[id="power-monitoring-dashboards-overview_{context}"]
= {PM-shortname-c} dashboards overview
There are two types of {PM-shortname} dashboards. Both provide different levels of details around power consumption metrics for a single cluster:
[discrete]
== Power Monitoring / Overview dashboard
With this dashboard, you can observe the following information:
* An aggregated view of CPU architecture and its power source (`rapl-sysfs`, `rapl-msr`, or `estimator`) along with total nodes with this configuration
* Total energy consumption by a cluster in the last 24 hours (measured in kilowatt-hour)
* The amount of power consumed by the top 10 namespaces in a cluster in the last 24 hours
* Detailed node information, such as its CPU architecture and component power source
These features allow you to effectively monitor the energy consumption of the cluster without needing to investigate each namespace separately.
[WARNING]
====
Ensure that the *Components Source* column does not display `estimator` as the power source.
.The Detailed Node Information table with `rapl-sysfs` as the component power source
image::power-monitoring-component-power-source.png[]
If {PM-kepler} is unable to obtain hardware power consumption metrics, the *Components Source* column displays `estimator` as the power source, which is not supported in Technology Preview. If that happens, then the values from the nodes are not accurate.
====
[discrete]
== Power Monitoring / Namespace dashboard
This dashboard allows you to view metrics by namespace and pod. You can observe the following information:
* The power consumption metrics, such as consumption in DRAM and PKG
* The energy consumption metrics in the last hour, such as consumption in DRAM and PKG for core and uncore components
This feature allows you to investigate key peaks and easily identify the primary root causes of high consumption.

View File

@@ -0,0 +1,26 @@
// Module included in the following assemblies:
// * power_monitoring/uninstalling-power-monitoring.adoc
:_mod-docs-content-type: PROCEDURE
[id="power-monitoring-deleting-kepler_{context}"]
= Deleting {PM-kepler}
You can delete {PM-kepler} by removing the {PM-kepler} instance of the `{PM-kepler}` custom resource definition (CRD) from the {product-title} web console.
.Prerequisites
* You have access to the {product-title} web console.
* You are logged in as a user with the `cluster-admin` role.
.Procedure
. In the *Administrator* perspective of the web console, go to *Operators* -> *Installed Operators*.
. Click *{PM-title-c}* from the *Installed Operators* list and go to the *{PM-kepler}* tab.
. Locate the {PM-kepler} instance entry in the list.
. Click {kebab} for this entry and select *Delete {PM-kepler}*.
. In the *Delete {PM-kepler}?* dialog, click *Delete* to delete the {PM-kepler} instance.

View File

@@ -0,0 +1,31 @@
// Module included in the following assemblies:
// * power_monitoring/installing-power-monitoring.adoc
:_mod-docs-content-type: PROCEDURE
[id="power-monitoring-deploying-kepler_{context}"]
= Deploying {PM-kepler}
You can deploy {PM-kepler} by creating an instance of the `{PM-kepler}` custom resource definition (CRD) by using the {PM-operator}.
.Prerequisites
* You have access to the {product-title} web console.
* You are logged in as a user with the `cluster-admin` role.
* You have installed the {PM-operator}.
.Procedure
. In the *Administrator* perspective of the web console, go to *Operators* -> *Installed Operators*.
. Click *{PM-title-c}* from the *Installed Operators* list and go to the *{PM-kepler}* tab.
. Click *Create {PM-kepler}*.
. On the *Create {PM-kepler}* page, ensure the *Name* is set to `kepler`.
+
[IMPORTANT]
====
The name of your {PM-kepler} instance must be set to `kepler`. All other instances are ignored by the {PM-operator}.
====
. Click *Create* to deploy {PM-kepler} and {PM-shortname} dashboards.

View File

@@ -0,0 +1,28 @@
// Module included in the following assemblies:
//
// * power_monitoring/power-monitoring-overview.adoc
:_mod-docs-content-type: CONCEPT
[id="power-monitoring-hardware-virtualization-support_{context}"]
= {PM-kepler} hardware and virtualization support
{PM-kepler} is the key component of {PM-shortname} that collects real-time power consumption data from a node through one of the following power estimation methods:
Kernel Power Management Subsystem (preferred)::
* `rapl-sysfs`: This requires access to the `/sys/class/powercap/intel-rapl` host file.
* `rapl-msr`: This requires access to the `/dev/cpu/*/msr` host file.
The `estimator` power source::
Without access to the kernel's power cap subsystem, {PM-kepler} uses a machine learning model to estimate the power usage of the CPU on the node.
+
[WARNING]
====
The `estimator` feature is experimental, not supported, and should not be relied upon.
====
You can identify the power estimation method for a node by using the *Power Monitoring / Overview* dashboard.
[WARNING]
====
{PM-shortname-c} Technology Preview works only in bare-metal deployments. Most public cloud vendors do not expose Kernel Power Management Subsystems to virtual machines.
====

View File

@@ -0,0 +1,36 @@
// Module included in the following assemblies:
// * power_monitoring/installing-power-monitoring.adoc
:_mod-docs-content-type: PROCEDURE
[id="power-monitoring-installing-pmo_{context}"]
= Installing the {PM-operator}
As a cluster administrator, you can install the {PM-operator} from OperatorHub by using the {product-title} web console.
[WARNING]
====
You must remove any previously installed versions of the {PM-operator} before installation.
====
.Prerequisites
* You have access to the {product-title} web console.
* You are logged in as a user with the `cluster-admin` role.
.Procedure
. In the *Administrator* perspective of the web console, go to *Operators* -> *OperatorHub*.
. Search for `{PM-shortname}`, click the *{PM-title-c}* tile, and then click *Install*.
//. On the *Install Operator* page:
//.. Select an *Update channel*.
//.. Select a {PM-shortname} *Version* to install.
// This can be included once the user has options there to choose. Not needed for now.
. Click *Install* again to install the {PM-operator}.
+
{PM-title-c} is now available in all namespaces of the {product-title} cluster.
.Verification
. Verify that the {PM-operator} is listed in *Operators* -> *Installed Operators*. The *Status* should resolve to *Succeeded*.

View File

@@ -0,0 +1,13 @@
// Module included in the following assemblies:
//
// * power_monitoring/power-monitoring-overview.adoc
:_mod-docs-content-type: CONCEPT
[id="power-monitoring-kepler-architecture_{context}"]
= {PM-shortname-c} architecture
{PM-shortname-c} is made up of the following major components:
The {PM-operator}:: For administrators, the {PM-operator} streamlines the monitoring of power usage for workloads by simplifying the deployment and management of {PM-kepler} in an {product-title} cluster. The setup and configuration for the {PM-operator} are simplified by adding a {PM-kepler} custom resource definition (CRD). The Operator also manages operations, such as upgrading, removing, configuring, and redeploying {PM-kepler}.
{PM-kepler}:: {PM-kepler} is a key component of {PM-shortname}. It is responsible for monitoring the power usage of containers running in {product-title}. It generates metrics related to the power usage of both nodes and containers.

View File

@@ -0,0 +1,48 @@
// Module included in the following assemblies:
// * power_monitoring/power-monitoring-configuration.adoc
:_mod-docs-content-type: REFERENCE
[id="power-monitoring-kepler-configuration_{context}"]
= The {PM-kepler} configuration
You can configure {PM-kepler} with the `spec` field of the `{PM-kepler}` resource.
[IMPORTANT]
====
Ensure that the name of your {PM-kepler} instance is `kepler`. All other instances are ignored by the {PM-operator}.
====
The following is the list of configuration options:
.{PM-kepler} configuration options
[options="header"]
|===
|Name |Spec |Description |Default
|`port` |`exporter.deployment` |The port on the node where the Prometheus metrics are exposed. |`9103`
|`nodeSelector` |`exporter.deployment` |The nodes on which {PM-kepler} exporter pods are scheduled. |`kubernetes.io/os: linux`
|`tolerations` |`exporter.deployment` |The tolerations for {PM-kepler} exporter that allow the pods to be scheduled on nodes with specific characteristics. |`- operator: "Exists"`
|===
.Example `{PM-kepler}` resource with default configuration
[source,yaml]
----
apiVersion: kepler.system.sustainable.computing.io/v1alpha1
kind: Kepler
metadata:
name: kepler
spec:
exporter:
deployment:
port: 9103 # <1>
nodeSelector:
kubernetes.io/os: linux # <2>
Tolerations: # <3>
- key: ""
operator: "Exists"
value: ""
effect: ""
----
<1> The Prometheus metrics are exposed on port 9103.
<2> {PM-kepler} pods are scheduled on Linux nodes.
<3> The default tolerations allow {PM-kepler} to be scheduled on any node.

View File

@@ -0,0 +1,72 @@
// Module included in the following assemblies:
// * power_monitoring/visualizing-power-monitoring-metrics.adoc
:_mod-docs-content-type: REFERENCE
[id="power-monitoring-metrics-overview_{context}"]
= {PM-shortname-c} metrics overview
The {PM-operator} exposes the following metrics, which you can view by using the {product-title} web console under the *Observe* -> *Metrics* tab.
[WARNING]
====
This list of exposed metrics is not definitive. Metrics might be added or removed in future releases.
====
.{PM-operator} metrics
[options="header"]
|===
|Metric name |Description
|`kepler_container_joules_total` |The aggregated package or socket energy consumption of CPU, DRAM, and other host components by a container.
|`kepler_container_core_joules_total` |The total energy consumption across CPU cores used by a container. If the system has access to `RAPL_` metrics, this metric reflects the proportional container energy consumption of the RAPL Power Plan 0 (PP0), which is the energy consumed by all CPU cores in the socket.
|`kepler_container_dram_joules_total` |The total energy consumption of DRAM by a container.
|`kepler_container_uncore_joules_total` |The cumulative energy consumption by uncore components used by a container. The number of components might vary depending on the system. The uncore metric is processor model-specific and might not be available on some server CPUs.
|`kepler_container_package_joules_total` |The cumulative energy consumed by the CPU socket used by a container. It includes all core and uncore components.
|`kepler_container_other_joules_total` |The cumulative energy consumption of host components, excluding CPU and DRAM, used by a container.
Generally, this metric is the energy consumption of ACPI hosts.
|`kepler_container_bpf_cpu_time_us_total` |The total CPU time used by the container that utilizes the BPF tracing.
|`kepler_container_cpu_cycles_total` |The total CPU cycles used by the container that utilizes hardware counters. CPU cycles is a metric directly related to CPU frequency. On systems where processors run at a fixed frequency, CPU cycles and total CPU time are roughly equivalent. On systems where processors run at varying frequencies, CPU cycles and total CPU time have different values.
|`kepler_container_cpu_instructions_total` |The total CPU instructions used by the container that utilizes hardware counters. CPU instructions is a metric that accounts how the CPU is used.
|`kepler_container_cache_miss_total` |The total cache miss that occurs for a container that uses hardware counters.
|`kepler_container_cgroupfs_cpu_usage_us_total` |The total CPU time used by a container reading from control group statistics.
|`kepler_container_cgroupfs_memory_usage_bytes_total` |The total memory in bytes used by a container reading from control group statistics.
|`kepler_container_cgroupfs_system_cpu_usage_us_total` |The total CPU time in kernel space used by the container reading from control group statistics.
|`kepler_container_cgroupfs_user_cpu_usage_us_total` |The total CPU time in user space used by a container reading from control group statistics.
|`kepler_container_bpf_net_tx_irq_total` |The total number of packets transmitted to network cards of a container that uses the BPF tracing.
|`kepler_container_bpf_net_rx_irq_total` |The total number of packets received from network cards of a container that uses the BPF tracing.
|`kepler_container_bpf_block_irq_total` |The total number of block I/O calls of a container that uses the BPF tracing.
|`kepler_node_info` |The node metadata, such as the node CPU architecture.
|`kepler_node_core_joules_total` |The total energy consumption across CPU cores used by all containers running on a node and operating system.
|`kepler_node_uncore_joules_total` |The cumulative energy consumption by uncore components used by all containers running on the node and operating system. The number of components might vary depending on the system.
|`kepler_node_dram_joules_total` |The total energy consumption of DRAM by all containers running on the node and operating system.
|`kepler_node_package_joules_total` |The cumulative energy consumed by the CPU socket used by all containers running on the node and operating system. It includes all core and uncore components.
|`kepler_node_other_host_components_joules_total` |The cumulative energy consumption of host components, excluding CPU and DRAM, used by all containers running on the node and operating system. Generally, this metric is the energy consumption of ACPI hosts.
|`kepler_node_platform_joules_total` |The total energy consumption of the host. Generally, this metric is the host energy consumption from Redfish BMC or ACPI.
|`kepler_node_energy_stat` |Multiple metrics from nodes labeled with container resource utilization control group metrics that are used in the model server.
|`kepler_node_accelerator_intel_qat` |The utilization of the accelerator Intel QAT on a certain node. If the system contains Intel QATs, {PM-kepler} can calculate the utilization of the node's QATs through telemetry.
|===

View File

@@ -0,0 +1,48 @@
// Module included in the following assemblies:
// * power_monitoring/power-monitoring-configuration.adoc
:_mod-docs-content-type: CONCEPT
[id="power-monitoring-monitoring-kepler-status_{context}"]
= Monitoring the {PM-kepler} status
You can monitor the state of the {PM-kepler} exporter with the `status` field of the `{PM-kepler}` resource.
The `status.exporter` field includes information, such as the following:
* The number of nodes currently running the {PM-kepler} pods
* The number of nodes that should be running the {PM-kepler} pods
* Conditions representing the health of the {PM-kepler} resource
This provides you with valuable insights into the changes made through the `spec` field.
.Example state of the `{PM-kepler}` resource
[source,yaml]
----
apiVersion: kepler.system.sustainable.computing.io/v1alpha1
kind: Kepler
metadata:
name: kepler
status:
exporter:
conditions: # <1>
- lastTransitionTime: '2024-01-11T11:07:39Z'
message: Reconcile succeeded
observedGeneration: 1
reason: ReconcileSuccess
status: 'True'
type: Reconciled
- lastTransitionTime: '2024-01-11T11:07:39Z'
message: >-
Kepler daemonset "kepler-operator/kepler" is deployed to all nodes and
available; ready 2/2
observedGeneration: 1
reason: DaemonSetReady
status: 'True'
type: Available
currentNumberScheduled: 2 # <2>
desiredNumberScheduled: 2 # <3>
----
<1> The health of the {PM-kepler} resource. In this example, {PM-kepler} is successfully reconciled and ready.
<2> The number of nodes currently running the {PM-kepler} pods is 2.
<3> The wanted number of nodes to run the {PM-kepler} pods is 2.

View File

@@ -0,0 +1,32 @@
// Module included in the following assemblies:
// * power_monitoring/uninstalling-power-monitoring.adoc
:_mod-docs-content-type: PROCEDURE
[id="power-monitoring-uninstalling-pmo_{context}"]
= Uninstalling the {PM-operator}
If you installed the {PM-operator} by using OperatorHub, you can uninstall it from the {product-title} web console.
.Prerequisites
* You have access to the {product-title} web console.
* You are logged in as a user with the `cluster-admin` role.
.Procedure
. Delete the {PM-kepler} instance.
+
[WARNING]
====
Ensure that you have deleted the {PM-kepler} instance before uninstalling the {PM-operator}.
====
. Go to *Operators* → *Installed Operators*.
. Locate the *{PM-title-c}* entry in the list.
. Click {kebab} for this entry and select *Uninstall Operator*.
. In the *Uninstall Operator?* dialog, click *Uninstall* to uninstall the {PM-operator}.

View File

@@ -0,0 +1 @@
../_attributes/

View File

@@ -0,0 +1,16 @@
:_mod-docs-content-type: ASSEMBLY
[id="configuring-power-monitoring"]
= Configuring {PM-shortname}
include::_attributes/common-attributes.adoc[]
:context: configuring-power-monitoring
toc::[]
:FeatureName: Power monitoring
include::snippets/technology-preview.adoc[leveloffset=+2]
The `{PM-kepler}` resource is a Kubernetes custom resource definition (CRD) that enables you to configure the deployment and monitor the status of the {PM-kepler} resource.
include::modules/power-monitoring-kepler-configuration.adoc[leveloffset=+1]
include::modules/power-monitoring-monitoring-kepler-status.adoc[leveloffset=+1]

1
power_monitoring/images Symbolic link
View File

@@ -0,0 +1 @@
../images/

View File

@@ -0,0 +1,18 @@
:_mod-docs-content-type: ASSEMBLY
[id="installing-power-monitoring"]
= Installing {PM-title}
include::_attributes/common-attributes.adoc[]
:context: installing-power-monitoring
toc::[]
:FeatureName: Power monitoring
include::snippets/technology-preview.adoc[leveloffset=+2]
You can install {PM-title} by deploying the {PM-operator} in the {product-title} web console.
//Installing power monitoring operator
include::modules/power-monitoring-installing-pmo.adoc[leveloffset=+1]
// Deploying Kepler
include::modules/power-monitoring-deploying-kepler.adoc[leveloffset=+1]

1
power_monitoring/modules Symbolic link
View File

@@ -0,0 +1 @@
../modules/

View File

@@ -0,0 +1,21 @@
:_mod-docs-content-type: ASSEMBLY
[id="power-monitoring-overview"]
= {PM-shortname-c} overview
include::_attributes/common-attributes.adoc[]
:context: power-monitoring-overview
toc::[]
:FeatureName: Power monitoring
include::snippets/technology-preview.adoc[leveloffset=+2]
include::modules/power-monitoring-about-power-monitoring.adoc[leveloffset=+1]
include::modules/power-monitoring-kepler-architecture.adoc[leveloffset=+1]
include::modules/power-monitoring-hardware-virtualization-support.adoc[leveloffset=+1]
[role="_additional-resources"]
[id="additional-resources_power-monitoring-overview"]
== Additional resources
* xref:../power_monitoring/visualizing-power-monitoring-metrics.adoc#power-monitoring-dashboards-overview_visualizing-power-monitoring-metrics[{PM-shortname-c} dashboards overview]

View File

@@ -0,0 +1,30 @@
:_mod-docs-content-type: ASSEMBLY
[id="power-monitoring-release-notes"]
= {PM-title-c} release notes
include::_attributes/common-attributes.adoc[]
:context: power-monitoring-release-notes
toc::[]
:FeatureName: Power monitoring
include::snippets/technology-preview.adoc[leveloffset=+2]
{PM-title-c} enables you to monitor the power usage of workloads and identify the most power-consuming namespaces running in an {product-title} cluster with key power consumption metrics, such as CPU or DRAM, measured at container level.
These release notes track the development of {PM-title} in the {product-title}.
For an overview of the {PM-operator}, see xref:../power_monitoring/power-monitoring-overview.adoc#power-monitoring-about-power-monitoring_power-monitoring-overview[About {PM-shortname}].
[id="power-monitoring-release-notes-0-1"]
== {PM-shortname-c} 0.1 (Technology Preview)
This release introduces a Technology Preview version of {PM-title}. The following advisory is available for {PM-shortname} 0.1:
* link:https://access.redhat.com/errata/RHEA-2024:0078[RHEA-2024:0078]
[id="power-monitoring-release-notes-0-1-features"]
=== Features
* Deployment and deletion of {PM-kepler}
* Power usage metrics from Intel-based bare-metal deployments
* Dashboards for plotting power usage

1
power_monitoring/snippets Symbolic link
View File

@@ -0,0 +1 @@
../snippets

View File

@@ -0,0 +1,18 @@
:_mod-docs-content-type: ASSEMBLY
[id="uninstalling-power-monitoring"]
= Uninstalling {PM-shortname}
include::_attributes/common-attributes.adoc[]
:context: uninstalling-power-monitoring
toc::[]
:FeatureName: Power monitoring
include::snippets/technology-preview.adoc[leveloffset=+2]
You can uninstall {PM-shortname} by deleting the {PM-kepler} instance and then the {PM-operator} in the {product-title} web console.
// Removing kepler
include::modules/power-monitoring-deleting-kepler.adoc[leveloffset=+1]
// Uninstalling power monitoring operator
include::modules/power-monitoring-uninstalling-pmo.adoc[leveloffset=+1]

View File

@@ -0,0 +1,22 @@
:_mod-docs-content-type: ASSEMBLY
[id="visualizing-power-monitoring-metrics"]
= Visualizing power monitoring metrics
include::_attributes/common-attributes.adoc[]
:context: visualizing-power-monitoring-metrics
toc::[]
:FeatureName: Power monitoring
include::snippets/technology-preview.adoc[leveloffset=+2]
You can visualize {PM-shortname} metrics in the {product-title} web console by accessing {PM-shortname} dashboards or by exploring *Metrics* under the *Observe* tab.
include::modules/power-monitoring-dashboards-overview.adoc[leveloffset=+1]
include::modules/power-monitoring-accessing-dashboards.adoc[leveloffset=+1]
include::modules/power-monitoring-metrics-overview.adoc[leveloffset=+1]
[role="_additional-resources"]
[id="additional-resources_visualizing-power-monitoring-metrics"]
== Additional resources
* xref:../monitoring/enabling-monitoring-for-user-defined-projects.adoc#enabling-monitoring-for-user-defined-projects_enabling-monitoring-for-user-defined-projects[Enabling monitoring for user-defined projects]