mirror of
https://github.com/openshift/openshift-docs.git
synced 2026-02-05 12:46:18 +01:00
Merge pull request #96951 from openshift-cherrypick-robot/cherry-pick-96947-to-enterprise-4.19
[enterprise-4.19] Power monitoring 0.5 integration
This commit is contained in:
@@ -3289,6 +3289,8 @@ Topics:
|
||||
Dir: power_monitoring
|
||||
Distros: openshift-enterprise,openshift-origin
|
||||
Topics:
|
||||
- Name: Power monitoring 0.5 release notes
|
||||
File: power-monitoring-release-notes-tp-0-5
|
||||
- Name: Power monitoring release notes
|
||||
File: power-monitoring-release-notes
|
||||
- Name: Power monitoring overview
|
||||
@@ -3301,6 +3303,10 @@ Topics:
|
||||
File: visualizing-power-monitoring-metrics
|
||||
- Name: Uninstalling power monitoring
|
||||
File: uninstalling-power-monitoring
|
||||
- Name: Power monitoring references
|
||||
File: power-monitoring-references
|
||||
- Name: Power monitoring API reference
|
||||
File: power-monitoring-api-reference
|
||||
---
|
||||
Name: Scalability and performance
|
||||
Dir: scalability_and_performance
|
||||
|
||||
@@ -6,7 +6,7 @@
|
||||
[id="power-monitoring-about-power-monitoring_{context}"]
|
||||
= About {PM-shortname}
|
||||
|
||||
You can use {PM-title} to monitor the power usage and identify power-consuming containers running in an {product-title} cluster. {PM-shortname-c} collects and exports energy-related system statistics from various components, such as CPU and DRAM. It provides granular power consumption data for Kubernetes pods, namespaces, and nodes.
|
||||
You can use {PM-title} to monitor the power usage and identify power-consuming containers running in an {product-title} cluster. {PM-shortname-c} collects and exports energy-related system statistics from various components, such as CPU and DRAM. It provides estimates and granular power consumption data for Kubernetes pods and namespaces, and reads the power consumption of nodes.
|
||||
|
||||
[WARNING]
|
||||
====
|
||||
|
||||
@@ -6,7 +6,7 @@
|
||||
[id="power-monitoring-accessing-dashboards-admin_{context}"]
|
||||
= Accessing {PM-shortname} dashboards as a cluster administrator
|
||||
|
||||
You can access {PM-shortname} dashboards from the *Administrator* perspective of the {product-title} web console.
|
||||
You can access {PM-shortname} dashboards of the {product-title} web console.
|
||||
|
||||
.Prerequisites
|
||||
|
||||
@@ -18,8 +18,8 @@ You can access {PM-shortname} dashboards from the *Administrator* perspective of
|
||||
|
||||
.Procedure
|
||||
|
||||
. In the *Administrator* perspective of the web console, go to *Observe* -> *Dashboards*.
|
||||
. In the web console, go to *Observe* -> *Dashboards*.
|
||||
|
||||
. From the *Dashboard* drop-down list, select the {PM-shortname} dashboard you want to see:
|
||||
** *Power Monitoring / Overview*
|
||||
** *Power Monitoring / Namespace*
|
||||
. From the *Dashboard* drop-down list, select the {PM-shortname} dashboard you want to see:
|
||||
** *Power Monitor / Overview*
|
||||
** *Power Monitor / Namespace (Pods)*
|
||||
@@ -6,7 +6,7 @@
|
||||
[id="power-monitoring-accessing-dashboards-developer_{context}"]
|
||||
= Accessing {PM-shortname} dashboards as a developer
|
||||
|
||||
You can access {PM-shortname} dashboards from the *Developer* perspective of the {product-title} web console.
|
||||
You can access {PM-shortname} dashboards from {product-title} web console.
|
||||
|
||||
.Prerequisites
|
||||
|
||||
@@ -19,7 +19,7 @@ You can access {PM-shortname} dashboards from the *Developer* perspective of the
|
||||
|
||||
.Procedure
|
||||
|
||||
. In the *Developer* perspective of the web console, go to *Observe* -> *Dashboard*.
|
||||
. In the web console, go to *Observe* -> *Dashboard*.
|
||||
|
||||
. From the *Dashboard* drop-down list, select the {PM-shortname} dashboard you want to see:
|
||||
** *Power Monitoring / Overview*
|
||||
** *Power Monitor / Overview*
|
||||
153
modules/power-monitoring-api-specifications.adoc
Normal file
153
modules/power-monitoring-api-specifications.adoc
Normal file
@@ -0,0 +1,153 @@
|
||||
// Automatically generated by 'kepler.system.sustainable.computing.io'. Do not edit.
|
||||
:_mod-docs-content-type: REFERENCE
|
||||
[id="power-monitoring-api-specifications_{context}"]
|
||||
= PowerMonitoring API specifications
|
||||
|
||||
PowerMonitor
|
||||
|
||||
|
||||
PowerMonitor is the schema for the PowerMonitor API.
|
||||
|
||||
[cols="1,1,4,1", options="header"]
|
||||
|===
|
||||
| Name
|
||||
| Type
|
||||
| Description
|
||||
| Required
|
||||
|
||||
| *apiVersion*
|
||||
| string
|
||||
| kepler.system.sustainable.computing.io/v1alpha1
|
||||
| true
|
||||
|
||||
| *kind*
|
||||
| string
|
||||
| PowerMonitor
|
||||
| true
|
||||
|
||||
| object
|
||||
| Refer to the Kubernetes API documentation for the fields of the metadata field.
|
||||
| true
|
||||
|
||||
| *spec*
|
||||
| object
|
||||
| PowerMonitorSpec defines the desired state of Power Monitor
|
||||
| false
|
||||
|
||||
| *status*
|
||||
| object
|
||||
| PowerMonitorStatus defines the observed state of the Power Monitor.
|
||||
| false
|
||||
|===
|
||||
|
||||
== PowerMonitor.spec
|
||||
|
||||
PowerMonitorSpec defines the desired state of Power Monitor
|
||||
|
||||
[cols="1,1,3,1", options="header"]
|
||||
|===
|
||||
| Name
|
||||
| Type
|
||||
| Description
|
||||
| Required
|
||||
|
||||
| *kepler*
|
||||
| object
|
||||
|
|
||||
| true
|
||||
|===
|
||||
|
||||
== PowerMonitor.status.conditions
|
||||
|
||||
[cols="1,1,4,1", options="header"]
|
||||
|===
|
||||
| Name
|
||||
| Type
|
||||
| Description
|
||||
| Required
|
||||
|
||||
| *lastTransitionTime*
|
||||
| string
|
||||
| The last time the condition transitioned from one status to another. This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable. +
|
||||
Format: date-time
|
||||
| true
|
||||
|
||||
| *message*
|
||||
| string
|
||||
| A human-readable message indicating details about the transition. This may be an empty string.
|
||||
| true
|
||||
|
||||
| *reason*
|
||||
| string
|
||||
| Contains a programmatic identifier indicating the reason for the condition's last transition.
|
||||
| true
|
||||
|
||||
| *status*
|
||||
| string
|
||||
| The status of the condition, which can be one of True, False, or Unknown.
|
||||
| true
|
||||
|
||||
| *type*
|
||||
| string
|
||||
| The type of Kepler Condition, such as Reconciled or Available.
|
||||
| true
|
||||
|
||||
| *observedGeneration*
|
||||
| integer
|
||||
| Represents the .metadata.generation that the condition was set based upon. For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date. +
|
||||
Format: int64 +
|
||||
Minimum: 0
|
||||
| false
|
||||
|===
|
||||
|
||||
== PowerMonitor.status.kepler
|
||||
|
||||
[cols="1,1,4,1", options="header"]
|
||||
|===
|
||||
| Name
|
||||
| Type
|
||||
| Description
|
||||
| Required
|
||||
|
||||
| *currentNumberScheduled*
|
||||
| integer
|
||||
| The number of nodes that are running at least one power-monitor pod and are supposed to run it. +
|
||||
Format: int32
|
||||
| true
|
||||
|
||||
| *desiredNumberScheduled*
|
||||
| integer
|
||||
| The total number of nodes that should be running the power-monitor pod. +
|
||||
Format: int32
|
||||
| true
|
||||
|
||||
| *numberMisscheduled*
|
||||
| integer
|
||||
| The number of nodes running the power-monitor pod that are not supposed to. +
|
||||
Format: int32
|
||||
| true
|
||||
|
||||
| *numberReady*
|
||||
| integer
|
||||
| The number of nodes that should be running the power-monitor pod and have at least one pod with a Ready condition. +
|
||||
Format: int32
|
||||
| true
|
||||
|
||||
| *numberAvailable*
|
||||
| integer
|
||||
| The number of nodes that should be running the power-monitor pod and have at least one pod running and available. +
|
||||
Format: int32
|
||||
| false
|
||||
|
||||
| *numberUnavailable*
|
||||
| integer
|
||||
| The number of nodes that should be running the power-monitor pod but have no pods running and available. +
|
||||
Format: int32
|
||||
| false
|
||||
|
||||
| *updatedNumberScheduled*
|
||||
| integer
|
||||
| The total number of nodes that are running an updated power-monitor pod. +
|
||||
Format: int32
|
||||
| false
|
||||
|===
|
||||
@@ -1,94 +0,0 @@
|
||||
// Module included in the following assemblies:
|
||||
|
||||
// * power_monitoring/configuring-power-monitoring.adoc
|
||||
|
||||
:_mod-docs-content-type: PROCEDURE
|
||||
[id="power-monitoring-configuring-kepler-redfish_{context}"]
|
||||
= Configuring {PM-kepler} to use Redfish
|
||||
|
||||
You can configure {PM-kepler} to use Redfish as the source for running or hosting containers. {PM-kepler} can then monitor the power usage of these containers.
|
||||
|
||||
.Prerequisites
|
||||
* You have access to the {product-title} web console.
|
||||
* You are logged in as a user with the `cluster-admin` role.
|
||||
* You have installed the {PM-operator}.
|
||||
|
||||
.Procedure
|
||||
|
||||
. In the *Administrator* perspective of the web console, click *Operators* -> *Installed Operators*.
|
||||
|
||||
. Click *{PM-title-c}* from the *Installed Operators* list and click the *{PM-kepler}* tab.
|
||||
|
||||
. Click *Create {PM-kepler}*. If you already have a {PM-kepler} instance created, click *Edit Kepler*.
|
||||
|
||||
. Configure `.spec.exporter.redfish` of the {PM-kepler} instance by specifying the mandatory `secretRef` field. You can also configure the optional `probeInterval` and `skipSSLVerify` fields to meet your needs.
|
||||
+
|
||||
.Example {PM-kepler} instance
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: kepler.system.sustainable.computing.io/v1alpha1
|
||||
kind: Kepler
|
||||
metadata:
|
||||
name: kepler
|
||||
spec:
|
||||
exporter:
|
||||
deployment:
|
||||
# ...
|
||||
redfish:
|
||||
secretRef: <secret_name> required <1>
|
||||
probeInterval: 60s <2>
|
||||
skipSSLVerify: false <3>
|
||||
# ...
|
||||
----
|
||||
<1> Required: Specifies the name of the secret that contains the credentials for accessing the Redfish server.
|
||||
<2> Optional: Controls the frequency at which the power information is queried from Redfish. The default value is `60s`.
|
||||
<3> Optional: Controls if {PM-kepler} skips verifying the Redfish server certificate. The default value is `false`.
|
||||
+
|
||||
[NOTE]
|
||||
====
|
||||
After {PM-kepler} is deployed, the `openshift-power-monitoring` namespace is created.
|
||||
====
|
||||
. Create the `redfish.csv` file with the following data format:
|
||||
+
|
||||
[source,csv]
|
||||
----
|
||||
<your_kubelet_node_name>,<redfish_username>,<redfish_password>,https://<redfish_ip_or_hostname>/
|
||||
----
|
||||
+
|
||||
.Example `redfish.csv` file
|
||||
[source,csv]
|
||||
----
|
||||
control-plane,exampleuser,examplepass,https://redfish.nodes.example.com
|
||||
worker-1,exampleuser,examplepass,https://redfish.nodes.example.com
|
||||
worker-2,exampleuser,examplepass,https://another.redfish.nodes.example.com
|
||||
----
|
||||
. Create the secret under the `openshift-power-monitoring` namespace. You must create the secret with the following conditions:
|
||||
+
|
||||
--
|
||||
* The secret type is `Opaque`.
|
||||
* The credentials are stored under the `redfish.csv` key in the `data` field of the secret.
|
||||
--
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc -n openshift-power-monitoring \
|
||||
create secret generic redfish-secret \
|
||||
--from-file=redfish.csv
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: v1
|
||||
kind: Secret
|
||||
metadata:
|
||||
name: redfish-secret
|
||||
data:
|
||||
redfish.csv: YmFyCg==
|
||||
# ...
|
||||
----
|
||||
+
|
||||
[IMPORTANT]
|
||||
====
|
||||
The {PM-kepler} deployment will not continue until the Redfish secret is created. You can find this information in the `status` of a {PM-kepler} instance.
|
||||
====
|
||||
@@ -8,38 +8,21 @@
|
||||
|
||||
There are two types of {PM-shortname} dashboards. Both provide different levels of details around power consumption metrics for a single cluster:
|
||||
|
||||
[discrete]
|
||||
== Power Monitoring / Overview dashboard
|
||||
[id="power-monitoring-overview-dashboard_{context}"]
|
||||
== Power Monitor / Overview dashboard
|
||||
|
||||
With this dashboard, you can observe the following information:
|
||||
This dashboard allows you to view the following information:
|
||||
|
||||
* An aggregated view of CPU architecture and its power source (`rapl-sysfs`, `rapl-msr`, or `estimator`) along with total nodes with this configuration
|
||||
Cluster-wide power consumption:: View current total, active, and idle CPU power consumption, grouped by zones.
|
||||
Node-level power details:: Analyze historical and current power consumption (total, active, and idle) for individual nodes.
|
||||
Hardware information:: Display CPU model and core counts for each node in the cluster.
|
||||
Time-series analysis:: Track power consumption trends over time with graphs that can be filtered by node and zone. This provides a comprehensive view of your cluster's energy usage.
|
||||
|
||||
* Total energy consumption by a cluster in the last 24 hours (measured in kilowatt-hour)
|
||||
[id="power-monitor-namespace-pods-dashboard_{context}"]
|
||||
== Power Monitor / Namespace (Pods) dashboard
|
||||
|
||||
* The amount of power consumed by the top 10 namespaces in a cluster in the last 24 hours
|
||||
This dashboard allows you to monitor and analyze power consumption for Kubernetes namespaces and pods. It provides the following information:
|
||||
|
||||
* Detailed node information, such as its CPU architecture and component power source
|
||||
|
||||
These features allow you to effectively monitor the energy consumption of the cluster without needing to investigate each namespace separately.
|
||||
|
||||
[WARNING]
|
||||
====
|
||||
Ensure that the *Components Source* column does not display `estimator` as the power source.
|
||||
|
||||
.The Detailed Node Information table with `rapl-sysfs` as the component power source
|
||||
image::power-monitoring-component-power-source.png[]
|
||||
|
||||
If {PM-kepler} is unable to obtain hardware power consumption metrics, the *Components Source* column displays `estimator` as the power source, which is not supported in Technology Preview. If that happens, then the values from the nodes are not accurate.
|
||||
====
|
||||
|
||||
[discrete]
|
||||
== Power Monitoring / Namespace dashboard
|
||||
|
||||
This dashboard allows you to view metrics by namespace and pod. You can observe the following information:
|
||||
|
||||
* The power consumption metrics, such as consumption in DRAM and PKG
|
||||
|
||||
* The energy consumption metrics in the last hour, such as consumption in DRAM and PKG for core and uncore components
|
||||
|
||||
This feature allows you to investigate key peaks and easily identify the primary root causes of high consumption.
|
||||
Top ten power consuming namespaces:: A real-time table showing the top ten namespaces based on their current power usage. This helps you quickly identify the most resource-intensive workloads.
|
||||
Total namespace power consumption:: A historical graph showing the total power consumption of pods within a selected namespace over time, grouped by zone. This helps you see trends and understand an application's or service's total power use.
|
||||
Individual pod power consumption:: A detailed graph showing the power consumption of individual pods, so you can analyze them in detail.
|
||||
@@ -8,13 +8,18 @@
|
||||
|
||||
You can delete {PM-kepler} by removing the {PM-kepler} instance of the `{PM-kepler}` custom resource definition (CRD) from the {product-title} web console.
|
||||
|
||||
[IMPORTANT]
|
||||
====
|
||||
Starting with {PM-title} 0.5 (Technology Preview), use the `PowerMonitor` CRD, and remove all instances of the `Kepler` CRD.
|
||||
====
|
||||
|
||||
.Prerequisites
|
||||
* You have access to the {product-title} web console.
|
||||
* You are logged in as a user with the `cluster-admin` role.
|
||||
|
||||
.Procedure
|
||||
|
||||
. In the *Administrator* perspective of the web console, go to *Operators* -> *Installed Operators*.
|
||||
. In the web console, go to *Operators* -> *Installed Operators*.
|
||||
|
||||
. Click *{PM-title-c}* from the *Installed Operators* list and go to the *{PM-kepler}* tab.
|
||||
|
||||
|
||||
@@ -0,0 +1,26 @@
|
||||
// Module included in the following assemblies:
|
||||
|
||||
// * power_monitoring/uninstalling-power-monitoring.adoc
|
||||
|
||||
:_mod-docs-content-type: PROCEDURE
|
||||
[id="power-monitoring-deleting-power-monitoring-custom-resource_{context}"]
|
||||
= Deleting the PowerMonitor custom resource
|
||||
|
||||
You can delete the `PowerMonitor` custom resource (CR) by removing the `power-monitor` instance of the `PowerMonitor` CR from the {product-title} web console.
|
||||
|
||||
.Prerequisites
|
||||
|
||||
* You have access to the {product-title} web console.
|
||||
* You are logged in as a user with the `cluster-admin` role.
|
||||
|
||||
.Procedure
|
||||
|
||||
. In the web console, go to *Operators* → *Installed Operators*.
|
||||
|
||||
. Click *{PM-title-c}* from the *Installed Operators* list and go to the *PowerMonitor* tab.
|
||||
|
||||
. Locate the *PowerMonitor* instance entry in the list.
|
||||
|
||||
. Click the {kebab} for this entry and select *Delete PowerMonitor*.
|
||||
|
||||
. In the *Delete PowerMonitor?* dialog, click *Delete* to delete the `PowerMonitor` instance.
|
||||
@@ -1,31 +0,0 @@
|
||||
// Module included in the following assemblies:
|
||||
|
||||
// * power_monitoring/installing-power-monitoring.adoc
|
||||
|
||||
:_mod-docs-content-type: PROCEDURE
|
||||
[id="power-monitoring-deploying-kepler_{context}"]
|
||||
= Deploying {PM-kepler}
|
||||
|
||||
You can deploy {PM-kepler} by creating an instance of the `{PM-kepler}` custom resource definition (CRD) by using the {PM-operator}.
|
||||
|
||||
.Prerequisites
|
||||
* You have access to the {product-title} web console.
|
||||
* You are logged in as a user with the `cluster-admin` role.
|
||||
* You have installed the {PM-operator}.
|
||||
|
||||
.Procedure
|
||||
|
||||
. In the *Administrator* perspective of the web console, go to *Operators* -> *Installed Operators*.
|
||||
|
||||
. Click *{PM-title-c}* from the *Installed Operators* list and go to the *{PM-kepler}* tab.
|
||||
|
||||
. Click *Create {PM-kepler}*.
|
||||
|
||||
. On the *Create {PM-kepler}* page, ensure the *Name* is set to `kepler`.
|
||||
+
|
||||
[IMPORTANT]
|
||||
====
|
||||
The name of your {PM-kepler} instance must be set to `kepler`. All other instances are ignored by the {PM-operator}.
|
||||
====
|
||||
|
||||
. Click *Create* to deploy {PM-kepler} and {PM-shortname} dashboards.
|
||||
@@ -0,0 +1,39 @@
|
||||
// Module included in the following assemblies:
|
||||
|
||||
// * power_monitoring/installing-power-monitoring.adoc
|
||||
|
||||
:_mod-docs-content-type: PROCEDURE
|
||||
[id="power-monitoring-deploying-power-monitor-custom-resource_{context}"]
|
||||
= Deploying PowerMonitor custom resource
|
||||
|
||||
You can deploy {PM-kepler} by creating an instance of the `PowerMonitor` custom resource (CR) using the {PM-operator}.
|
||||
|
||||
[IMPORTANT]
|
||||
====
|
||||
The `Kepler` custom resource definition (CRD) has been deprecated and will be removed in a future release. Use the `PowerMonitor` custom resource instead.
|
||||
====
|
||||
|
||||
.Prerequisites
|
||||
* You have access to the {product-title} web console.
|
||||
* You are logged in as a user with the `cluster-admin` role.
|
||||
* You have installed the {PM-operator}.
|
||||
|
||||
.Procedure
|
||||
|
||||
. In the web console, go to *Operators* -> *Installed Operators*.
|
||||
|
||||
. Click *{PM-title-c}* from the *Installed Operators* list and go to the *PowerMonitor* tab.
|
||||
|
||||
. Click *Create PowerMonitor*.
|
||||
|
||||
. On the *Create PowerMonitor* page, ensure the *Name* is set to `power-monitor`.
|
||||
+
|
||||
[IMPORTANT]
|
||||
====
|
||||
The name of your `PowerMonitor` instance must be set to `power-monitor`. All other instances are ignored by the {PM-operator}.
|
||||
====
|
||||
|
||||
. Click *Create* to deploy the PowerMonitor and {PM-shortname} dashboards.
|
||||
|
||||
//formerly Deploying Kepler.
|
||||
//Kepler CRDs are being removed from TP 0.5 and being replaced with PowerMonitor CRDs.
|
||||
12
modules/power-monitoring-hardware-support.adoc
Normal file
12
modules/power-monitoring-hardware-support.adoc
Normal file
@@ -0,0 +1,12 @@
|
||||
// Module included in the following assemblies:
|
||||
//
|
||||
// * power_monitoring/power-monitoring-overview.adoc
|
||||
|
||||
:_mod-docs-content-type: CONCEPT
|
||||
[id="power-monitoring-hardware-support_{context}"]
|
||||
= {PM-kepler} hardware support
|
||||
|
||||
{PM-kepler} is the key component of {PM-shortname} that collects real-time CPU power consumption data from a node through the RAPL Subsystem. By understanding the total power consumption of the node and calculating the percent of CPU time each process is using, it is able to estimate the power consumption at a per process and container level.
|
||||
|
||||
Kernel Power Management Subsystem::
|
||||
* `rapl-sysfs`: This requires access to the `/sys/class/powercap/intel-rapl` directory.
|
||||
@@ -1,23 +0,0 @@
|
||||
// Module included in the following assemblies:
|
||||
//
|
||||
// * power_monitoring/power-monitoring-overview.adoc
|
||||
|
||||
:_mod-docs-content-type: CONCEPT
|
||||
[id="power-monitoring-hardware-virtualization-support_{context}"]
|
||||
= {PM-kepler} hardware and virtualization support
|
||||
|
||||
{PM-kepler} is the key component of {PM-shortname} that collects real-time power consumption data from a node through one of the following methods:
|
||||
|
||||
Kernel Power Management Subsystem (preferred)::
|
||||
* `rapl-sysfs`: This requires access to the `/sys/class/powercap/intel-rapl` host file.
|
||||
* `rapl-msr`: This requires access to the `/dev/cpu/*/msr` host file.
|
||||
|
||||
The `estimator` power source::
|
||||
Without access to the kernel's power cap subsystem, {PM-kepler} uses a machine learning model to estimate the power usage of the CPU on the node.
|
||||
+
|
||||
[WARNING]
|
||||
====
|
||||
The `estimator` feature is experimental, not supported, and should not be relied upon.
|
||||
====
|
||||
|
||||
You can identify the power estimation method for a node by using the *Power Monitoring / Overview* dashboard.
|
||||
@@ -19,7 +19,7 @@ You must remove any previously installed versions of the {PM-operator} before in
|
||||
|
||||
.Procedure
|
||||
|
||||
. In the *Administrator* perspective of the web console, go to *Operators* -> *OperatorHub*.
|
||||
. In the web console, go to *Operators* -> *OperatorHub*.
|
||||
|
||||
. Search for `{PM-shortname}`, click the *{PM-title-c}* tile, and then click *Install*.
|
||||
//. On the *Install Operator* page:
|
||||
|
||||
@@ -8,6 +8,6 @@
|
||||
|
||||
{PM-shortname-c} is made up of the following major components:
|
||||
|
||||
The {PM-operator}:: For administrators, the {PM-operator} streamlines the monitoring of power usage for workloads by simplifying the deployment and management of {PM-kepler} in an {product-title} cluster. The setup and configuration for the {PM-operator} are simplified by adding a {PM-kepler} custom resource definition (CRD). The Operator also manages operations, such as upgrading, removing, configuring, and redeploying {PM-kepler}.
|
||||
The {PM-operator}:: For administrators, the {PM-operator} streamlines the monitoring of power usage for workloads by simplifying the deployment and management of {PM-kepler} in an {product-title} cluster. The setup and configuration for the {PM-operator} are simplified by adding a `PowerMonitor` custom resource definition (CRD). The Operator also manages operations, such as upgrading, removing, configuring, and redeploying {PM-kepler}.
|
||||
|
||||
{PM-kepler}:: {PM-kepler} is a key component of {PM-shortname}. It is responsible for monitoring the power usage of containers running in {product-title}. It generates metrics related to the power usage of both nodes and containers.
|
||||
|
||||
@@ -6,43 +6,92 @@
|
||||
[id="power-monitoring-kepler-configuration_{context}"]
|
||||
= The {PM-kepler} configuration
|
||||
|
||||
You can configure {PM-kepler} with the `spec` field of the `{PM-kepler}` resource.
|
||||
You can configure {PM-kepler} with the `spec` field of the `PowerMonitor` resource.
|
||||
|
||||
[IMPORTANT]
|
||||
====
|
||||
Ensure that the name of your {PM-kepler} instance is `kepler`. All other instances are rejected by the {PM-operator} Webhook.
|
||||
Ensure that the name of your `PowerMonitor` instance is `power-monitor`. All other instances are rejected by the {PM-operator} Webhook.
|
||||
====
|
||||
|
||||
The following is the list of configuration options:
|
||||
|
||||
.{PM-kepler} configuration options
|
||||
[options="header"]
|
||||
.PowerMonitor configuration options
|
||||
[cols="1,3,2", options="header"]
|
||||
|===
|
||||
|Name |Spec |Description |Default
|
||||
|`port` |`exporter.deployment` |The port on the node where the Prometheus metrics are exposed. |`9103`
|
||||
|`nodeSelector` |`exporter.deployment` |The nodes on which {PM-kepler} exporter pods are scheduled. |`kubernetes.io/os: linux`
|
||||
|`tolerations` |`exporter.deployment` |The tolerations for {PM-kepler} exporter that allow the pods to be scheduled on nodes with specific characteristics. |`- operator: "Exists"`
|
||||
| Name
|
||||
| Description
|
||||
| Default Behavior
|
||||
|
||||
| deployment.nodeSelector
|
||||
| The nodes on which Kepler (created by PowerMonitor) pods are scheduled.
|
||||
| kubernetes.io/os: linux
|
||||
|
||||
| deployment.tolerations
|
||||
| The tolerations for Power Monitor that allow the pods to be scheduled on nodes with specific characteristics.
|
||||
| - operator: "Exists"
|
||||
|
||||
| deployment.security.mode
|
||||
| Security mode can be set to either `none`, allowing unrestricted access to Kepler's metrics by any entity, or `rbac`, securing the metrics endpoint with TLS encryption and restricting access to authorized service accounts listed in `allowedSANames`.
|
||||
| Set to `rbac` by default and only user workload prometheus is allowed access.
|
||||
|
||||
| deployment.security.allowedSANames
|
||||
| A list of Service Account Names that can access Kepler’s metrics endpoint when security mode is `rbac`.
|
||||
| In OpenShift, set to `openshift-user-workload-monitoring:prometheus-user-workload` to allow user workload monitoring to scrape Kepler.
|
||||
|
||||
| config.logLevel
|
||||
| The level of logs to expose by Kepler.
|
||||
| Set to info.
|
||||
|
||||
| config.metricLevels
|
||||
| A list of energy metric levels to expose. Possible values include `node`, `process`, `container`, `vm`, and `pod`.
|
||||
| The default list includes `node`, `pod`, and `vm`.
|
||||
|
||||
| config.staleness
|
||||
| Specifies how long to wait before considering calculated power values as stale.
|
||||
| 500ms (500 milliseconds).
|
||||
|
||||
| config.sampleRate
|
||||
| Specifies the interval for monitoring resources such as processes, containers, and VMs.
|
||||
| 5s (5 seconds).
|
||||
|
||||
| config.maxTerminated
|
||||
| Controls terminated workload tracking. A negative value tracks unlimited workloads, zero disables tracking, and a positive value tracks the top N terminated workloads by energy consumption.
|
||||
| 500.
|
||||
|
||||
|===
|
||||
|
||||
.Example `{PM-kepler}` resource with default configuration
|
||||
.Example `PowerMonitor` resource with default configuration
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: kepler.system.sustainable.computing.io/v1alpha1
|
||||
kind: Kepler
|
||||
apiVersion: v1alpha1
|
||||
kind: PowerMonitor
|
||||
metadata:
|
||||
name: kepler
|
||||
labels:
|
||||
app.kubernetes.io/name: powermonitor
|
||||
app.kubernetes.io/instance: powermonitor
|
||||
app.kubernetes.io/part-of: kepler-operator
|
||||
name: power-monitor
|
||||
spec:
|
||||
exporter:
|
||||
kepler:
|
||||
deployment:
|
||||
port: 9103 # <1>
|
||||
nodeSelector:
|
||||
kubernetes.io/os: linux # <2>
|
||||
Tolerations: # <3>
|
||||
- key: ""
|
||||
operator: "Exists"
|
||||
value: ""
|
||||
effect: ""
|
||||
----
|
||||
<1> The Prometheus metrics are exposed on port 9103.
|
||||
<2> {PM-kepler} pods are scheduled on Linux nodes.
|
||||
<3> The default tolerations allow {PM-kepler} to be scheduled on any node.
|
||||
nodeSelector:
|
||||
kubernetes.io/os: linux
|
||||
|
||||
tolerations:
|
||||
- key: key1
|
||||
operator: Equal
|
||||
value: value1
|
||||
effect: NoSchedule
|
||||
|
||||
security:
|
||||
mode: rbac
|
||||
allowedSANames:
|
||||
- openshift-user-workload-monitoring:prometheus-user-workload
|
||||
|
||||
config:
|
||||
logLevel: info
|
||||
metricLevels: [node, pod, vm]
|
||||
staleness: 1s
|
||||
sampleRate: 10s
|
||||
maxTerminated: 1000
|
||||
----
|
||||
@@ -0,0 +1,7 @@
|
||||
:_mod-docs-content-type: REFERENCE
|
||||
[id="power-monitoring-kepler-power-attribution-guide_{context}"]
|
||||
== Power monitoring Kepler power attribution guide
|
||||
|
||||
Kepler's power attribution system provides practical, proportional distribution of hardware energy consumption to individual workloads. While CPU-time-based attribution has inherent limitations due to modern CPU complexity, it offers a good balance between accuracy, simplicity, and performance overhead for most monitoring and optimization use cases.
|
||||
|
||||
For more information about power attribution, see link:http://sustainable-computing.io/kepler/usage/power-attribution[Kepler Power Attribution Guide].
|
||||
@@ -6,25 +6,24 @@
|
||||
[id="power-monitoring-monitoring-kepler-status_{context}"]
|
||||
= Monitoring the {PM-kepler} status
|
||||
|
||||
You can monitor the state of the {PM-kepler} exporter with the `status` field of the `{PM-kepler}` resource.
|
||||
You can monitor the state of the {PM-kepler} exporter with the `status` field of the `PowerMonitor` resource.
|
||||
|
||||
The `status.exporter` field includes information, such as the following:
|
||||
The `status` field includes information, such as the following:
|
||||
|
||||
* The number of nodes currently running the {PM-kepler} pods
|
||||
* The number of nodes that should be running the {PM-kepler} pods
|
||||
* Conditions representing the health of the {PM-kepler} resource
|
||||
|
||||
This provides you with valuable insights into the changes made through the `spec` field.
|
||||
This provides you with valuable insights into the changes made through the `spec` field.
|
||||
|
||||
.Example state of the `{PM-kepler}` resource
|
||||
.Example state of the `PowerMonitor` resource
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: kepler.system.sustainable.computing.io/v1alpha1
|
||||
kind: Kepler
|
||||
kind: PowerMonitor
|
||||
metadata:
|
||||
name: kepler
|
||||
name: power-monitor
|
||||
status:
|
||||
exporter:
|
||||
conditions: # <1>
|
||||
- lastTransitionTime: '2024-01-11T11:07:39Z'
|
||||
message: Reconcile succeeded
|
||||
@@ -34,7 +33,7 @@ status:
|
||||
type: Reconciled
|
||||
- lastTransitionTime: '2024-01-11T11:07:39Z'
|
||||
message: >-
|
||||
Kepler daemonset "kepler-operator/kepler" is deployed to all nodes and
|
||||
power-monitor daemonset "openshift-power-monitoring/power-monitor" is deployed to all nodes and
|
||||
available; ready 2/2
|
||||
observedGeneration: 1
|
||||
reason: DaemonSetReady
|
||||
@@ -43,6 +42,6 @@ status:
|
||||
currentNumberScheduled: 2 # <2>
|
||||
desiredNumberScheduled: 2 # <3>
|
||||
----
|
||||
<1> The health of the {PM-kepler} resource. In this example, {PM-kepler} is successfully reconciled and ready.
|
||||
<1> The health of the `PowerMonitor` resource. In this example, the `PowerMonitor` resource is successfully reconciled and ready.
|
||||
<2> The number of nodes currently running the {PM-kepler} pods is 2.
|
||||
<3> The wanted number of nodes to run the {PM-kepler} pods is 2.
|
||||
<3> The wanted number of nodes to run the {PM-kepler} pods is 2.
|
||||
@@ -0,0 +1,11 @@
|
||||
// Module included in the following assemblies:
|
||||
|
||||
// * power_monitoring/power-monitoring-assembly-tp-0-5-release-notes.adoc
|
||||
|
||||
:_mod-docs-content-type: REFERENCE
|
||||
[id="power-monitoring-0-5-deprecated-removed-features_{context}"]
|
||||
= Power monitoring 0.5 (Technology Preview) deprecated and removed features
|
||||
|
||||
* In the Red Hat OpenShift power monitoring technology preview 0.5 release, the `Kepler` custom resource has been deprecated, and will be removed in a future release. Use the `PowerMonitor` custom resource instead.
|
||||
|
||||
* In the Red Hat OpenShift power monitoring technology preview 0.5 release, the Redfish configuration has been removed. It is no longer supported in previous versions of power monitoring.
|
||||
@@ -0,0 +1,41 @@
|
||||
// Module included in the following assemblies:
|
||||
|
||||
// * power_monitoring/power-monitoring-tp-0-5-release-notes.adoc
|
||||
|
||||
:_mod-docs-content-type: REFERENCE
|
||||
[id="power-monitoring-tp-0-5-enhancements_{context}"]
|
||||
= Power monitoring Technology Preview 0.5 enhancements
|
||||
|
||||
This release of {PM-title} and the {PM-operator}, based on the Kepler Project, includes the following enhancements:
|
||||
|
||||
* Dynamic detection of Nodes Running Average Power Limit (RAPL) zones
|
||||
* More accurate power measurement based on active CPU usage
|
||||
* Improved Virtual Machine (VM), container, and pod detection
|
||||
* More relevant label values for processes, containers, VMs, and pods
|
||||
* Requires only `readonly` access to host: `/proc` and `/sys`
|
||||
** No more `CAP_SYSADMIN` and `CAP_BPF`
|
||||
* Significantly reduced resource usage compared to earlier Kepler implementations
|
||||
* Multi-level energy tracking for the following levels:
|
||||
** node
|
||||
** process
|
||||
** container
|
||||
** VM
|
||||
** pod
|
||||
* Terminated workload tracking with configurable retention policies
|
||||
* Energy-based prioritization for terminated resources
|
||||
* Real-time data collection with configurable intervals and staleness detection
|
||||
|
||||
[id="updated-dashboards_{context}"]
|
||||
== Updated dashboards
|
||||
|
||||
With this update, {PM-title} has the following dashboard changes:
|
||||
|
||||
* Updated *Power Monitor / Overview* dashboard.
|
||||
* Updated *Power Monitor / Namespace (Pods)* dashboard.
|
||||
|
||||
[IMPORTANT]
|
||||
====
|
||||
The older metrics and dashboards are no longer supported. If you are managing your own custom dashboard or queries, you need to update to the newer versions.
|
||||
====
|
||||
|
||||
|
||||
@@ -0,0 +1,11 @@
|
||||
// Module included in the following assemblies:
|
||||
|
||||
// * power_monitoring/power-monitoring-tp-0-5-release-notes.adoc
|
||||
|
||||
:_mod-docs-content-type: REFERENCE
|
||||
[id="power-monitoring-release-notes-tp-0-5-new-features_{context}"]
|
||||
= Power monitoring Technology Preview 0.5 new features
|
||||
|
||||
This release of {PM-title} and the {PM-operator}, based on the Kepler Project, includes the following new feature:
|
||||
|
||||
* Deployment and deletion of `PowerMonitor` custom resource definition (CRD).
|
||||
21
modules/power-monitoring-release-notes-tp-0-5-overview.adoc
Normal file
21
modules/power-monitoring-release-notes-tp-0-5-overview.adoc
Normal file
@@ -0,0 +1,21 @@
|
||||
// Module included in the following assemblies:
|
||||
|
||||
// * power_monitoring/power-monitoring-assembly-tp-0-5-release-notes.adoc
|
||||
|
||||
:_mod-docs-content-type: REFERENCE
|
||||
[id="power-monitoring-tp-0-5-overview_{context}"]
|
||||
= Power monitoring 0.5 (Technology Preview) release notes overview
|
||||
|
||||
:FeatureName: Power monitoring
|
||||
include::snippets/technology-preview.adoc[leveloffset=+2]
|
||||
|
||||
{PM-title-c} enables you to monitor the power usage of workloads and identify the most power-consuming namespaces running in an {product-title} cluster with key power consumption metrics, such as CPU or DRAM, measured at container level.
|
||||
|
||||
This release of power monitoring and the {PM-operator} provides more accurate data, includes new dashboards, and removes some features and functionality.
|
||||
|
||||
This release of power monitoring and the {PM-operator} is supported on:
|
||||
|
||||
* {product-title} 4.17+
|
||||
* Bare metal deployments
|
||||
|
||||
//following new release notes template in GDoc from release notes team
|
||||
@@ -0,0 +1,35 @@
|
||||
// Module included in the following assemblies:
|
||||
|
||||
// * power_monitoring/power-monitoring-assembly-tp-0-5-release-notes.adoc
|
||||
|
||||
:_mod-docs-content-type: REFERENCE
|
||||
[id="power-monitoring-release-notes-tp-0-5-support-tables_{context}"]
|
||||
= {PM-shortname-c} 0.5 (Technology Preview) support tables
|
||||
//may need to update the title
|
||||
This release includes the following support updates:
|
||||
|
||||
.Power Monitoring Operator supported version table
|
||||
[cols="1,1"]
|
||||
|===
|
||||
|{PM-kepler}
|
||||
|0.10.2
|
||||
|{PM-operator}
|
||||
|0.20.0
|
||||
|===
|
||||
|
||||
.Power monitoring supported platforms
|
||||
[cols="1,1"]
|
||||
|===
|
||||
|{product-title}
|
||||
|4.17+
|
||||
|Bare metal
|
||||
| X
|
||||
|===
|
||||
|
||||
[IMPORTANT]
|
||||
====
|
||||
Installations in virtual machines are not supported and will not function.
|
||||
====
|
||||
|
||||
//* With this update, Red Hat OpenShift power monitoring is only supported on OpenShift Container Platform clusters that are installed on bare metal. Installations in virtual machines are not support and will not function.
|
||||
//will likely need to create a reference module for a Feature Support Table or some kind for this bullet point on supported cluster installation platforms.
|
||||
@@ -9,10 +9,8 @@ toc::[]
|
||||
:FeatureName: Power monitoring
|
||||
include::snippets/technology-preview.adoc[leveloffset=+2]
|
||||
|
||||
The `{PM-kepler}` resource is a Kubernetes custom resource definition (CRD) that enables you to configure the deployment and monitor the status of the {PM-kepler} resource.
|
||||
The `PowerMonitor` resource is a Kubernetes custom resource definition (CRD) that enables you to configure the deployment and monitor the status of the `PowerMonitor` resource.
|
||||
|
||||
include::modules/power-monitoring-kepler-configuration.adoc[leveloffset=+1]
|
||||
|
||||
include::modules/power-monitoring-monitoring-kepler-status.adoc[leveloffset=+1]
|
||||
|
||||
include::modules/power-monitoring-configuring-kepler-redfish.adoc[leveloffset=+1]
|
||||
include::modules/power-monitoring-monitoring-kepler-status.adoc[leveloffset=+1]
|
||||
@@ -15,4 +15,4 @@ You can install {PM-title} by deploying the {PM-operator} in the {product-title}
|
||||
include::modules/power-monitoring-installing-pmo.adoc[leveloffset=+1]
|
||||
|
||||
// Deploying Kepler
|
||||
include::modules/power-monitoring-deploying-kepler.adoc[leveloffset=+1]
|
||||
include::modules/power-monitoring-deploying-power-monitor-custom-resource.adoc[leveloffset=+1]
|
||||
|
||||
@@ -0,0 +1,14 @@
|
||||
:_mod-docs-content-type: ASSEMBLY
|
||||
[id="power-monitoring-api-reference_{context}"]
|
||||
= Power monitoring API reference
|
||||
include::_attributes/common-attributes.adoc[]
|
||||
:context: power-monitoring-kepler-power-attribution-guide
|
||||
|
||||
toc::[]
|
||||
|
||||
:FeatureName: Power monitoring
|
||||
include::snippets/technology-preview.adoc[leveloffset=+2]
|
||||
|
||||
PowerMonitor is the Schema for the PowerMonitor API.
|
||||
|
||||
include::modules/power-monitoring-api-specifications.adoc[leveloffset=+1]
|
||||
@@ -13,7 +13,7 @@ include::modules/power-monitoring-about-power-monitoring.adoc[leveloffset=+1]
|
||||
|
||||
include::modules/power-monitoring-kepler-architecture.adoc[leveloffset=+1]
|
||||
|
||||
include::modules/power-monitoring-hardware-virtualization-support.adoc[leveloffset=+1]
|
||||
include::modules/power-monitoring-hardware-support.adoc[leveloffset=+1]
|
||||
|
||||
include::modules/power-monitoring-fips-support.adoc[leveloffset=+1]
|
||||
|
||||
|
||||
@@ -0,0 +1,14 @@
|
||||
:_mod-docs-content-type: ASSEMBLY
|
||||
[id="power-monitoring-references_{context}"]
|
||||
= Power monitoring references
|
||||
include::_attributes/common-attributes.adoc[]
|
||||
:context: power-monitoring-references
|
||||
|
||||
toc::[]
|
||||
|
||||
:FeatureName: Power monitoring
|
||||
include::snippets/technology-preview.adoc[leveloffset=+2]
|
||||
|
||||
include::modules/power-monitoring-kepler-power-attribution-guide.adoc[Leveloffset=+1]
|
||||
|
||||
//move API reference here post release. There may need to be additional IA updates for GA to better incorporate reference material.
|
||||
@@ -0,0 +1,23 @@
|
||||
:_mod-docs-content-type: ASSEMBLY
|
||||
[id="power-monitoring-assembly-tp-0-5-release-notes_{context}"]
|
||||
include::_attributes/common-attributes.adoc[]
|
||||
= {PM-title-c} 0.5 (Technology Preview) release notes
|
||||
:context: power-monitoring-release-notes
|
||||
|
||||
toc::[]
|
||||
|
||||
//:FeatureName: Power monitoring
|
||||
//include::snippets/technology-preview.adoc[leveloffset=+2]
|
||||
////
|
||||
{PM-title-c} enables you to monitor the power usage of workloads and identify the most power-consuming namespaces running in an {product-title} cluster with key power consumption metrics, such as CPU or DRAM, measured at container level.
|
||||
////
|
||||
These release notes track the development of {PM-title} in the {product-title}.
|
||||
|
||||
For an overview of the {PM-operator}, see xref:../power_monitoring/power-monitoring-overview.adoc#power-monitoring-about-power-monitoring_power-monitoring-overview[About {PM-shortname}].
|
||||
|
||||
|
||||
include::modules/power-monitoring-release-notes-tp-0-5-overview.adoc[leveloffset=+1]
|
||||
include::modules/power-monitoring-release-notes-tp-0-5-new-features.adoc[leveloffset=+2]
|
||||
include::modules/power-monitoring-release-notes-tp-0-5-enhancements.adoc[leveloffset=+2]
|
||||
include::modules/power-monitoring-release-notes-tp-0-5-deprecated-removed-features.adoc[leveloffset=+2]
|
||||
include::modules/power-monitoring-release-notes-tp-0-5-support-tables.adoc[leveloffset=+2]
|
||||
@@ -15,6 +15,7 @@ These release notes track the development of {PM-title} in the {product-title}.
|
||||
|
||||
For an overview of the {PM-operator}, see xref:../power_monitoring/power-monitoring-overview.adoc#power-monitoring-about-power-monitoring_power-monitoring-overview[About {PM-shortname}].
|
||||
|
||||
|
||||
[id="power-monitoring-release-notes-0-4_{context}"]
|
||||
== {PM-shortname-c} 0.4 (Technology Preview)
|
||||
|
||||
@@ -118,7 +119,7 @@ This release introduces a Technology Preview version of {PM-title}. The followin
|
||||
|
||||
[id="power-monitoring-release-notes-0-1-features"]
|
||||
=== Features
|
||||
* Deployment and deletion of {PM-kepler}
|
||||
* Deployment and deletion of {PM-kepler}
|
||||
* Power usage metrics from Intel-based bare-metal deployments
|
||||
* Dashboards for plotting power usage
|
||||
|
||||
|
||||
@@ -13,6 +13,10 @@ You can uninstall {PM-shortname} by deleting the {PM-kepler} instance and then t
|
||||
|
||||
// Removing kepler
|
||||
include::modules/power-monitoring-deleting-kepler.adoc[leveloffset=+1]
|
||||
//might need Additional resource section to add link to configuring PowerMonitor CRD content when that content is ready
|
||||
|
||||
// Removing PowerMonitor CRD
|
||||
include::modules/power-monitoring-deleting-power-monitor-custom-resource.adoc[leveloffset=+1]
|
||||
|
||||
// Uninstalling power monitoring operator
|
||||
include::modules/power-monitoring-uninstalling-pmo.adoc[leveloffset=+1]
|
||||
@@ -14,7 +14,7 @@ You can visualize {PM-shortname} metrics in the {product-title} web console by a
|
||||
include::modules/power-monitoring-dashboards-overview.adoc[leveloffset=+1]
|
||||
include::modules/power-monitoring-accessing-dashboards-admin.adoc[leveloffset=+1]
|
||||
include::modules/power-monitoring-accessing-dashboards-developer.adoc[leveloffset=+1]
|
||||
include::modules/power-monitoring-metrics-overview.adoc[leveloffset=+1]
|
||||
//include::modules/power-monitoring-metrics-overview.adoc[leveloffset=+1]
|
||||
|
||||
[role="_additional-resources"]
|
||||
[id="additional-resources_visualizing-power-monitoring-metrics"]
|
||||
|
||||
Reference in New Issue
Block a user