From 825048c97fb007f0b66bcb2f4bfd42e92c0728f8 Mon Sep 17 00:00:00 2001 From: Gwynne Monahan Date: Tue, 1 Jul 2025 09:38:05 -0500 Subject: [PATCH] POWERMON-580 0.5 COnfiguring power metrics doc updates --- ...monitoring-configuring-kepler-redfish.adoc | 94 ------------------ ...power-monitoring-kepler-configuration.adoc | 99 ++++++++++++++----- ...r-monitoring-monitoring-kepler-status.adoc | 19 ++-- .../configuring-power-monitoring.adoc | 6 +- 4 files changed, 85 insertions(+), 133 deletions(-) delete mode 100644 modules/power-monitoring-configuring-kepler-redfish.adoc diff --git a/modules/power-monitoring-configuring-kepler-redfish.adoc b/modules/power-monitoring-configuring-kepler-redfish.adoc deleted file mode 100644 index fea1ac2d7e..0000000000 --- a/modules/power-monitoring-configuring-kepler-redfish.adoc +++ /dev/null @@ -1,94 +0,0 @@ -// Module included in the following assemblies: - -// * power_monitoring/configuring-power-monitoring.adoc - -:_mod-docs-content-type: PROCEDURE -[id="power-monitoring-configuring-kepler-redfish_{context}"] -= Configuring {PM-kepler} to use Redfish - -You can configure {PM-kepler} to use Redfish as the source for running or hosting containers. {PM-kepler} can then monitor the power usage of these containers. - -.Prerequisites -* You have access to the {product-title} web console. -* You are logged in as a user with the `cluster-admin` role. -* You have installed the {PM-operator}. - -.Procedure - -. In the *Administrator* perspective of the web console, click *Operators* -> *Installed Operators*. - -. Click *{PM-title-c}* from the *Installed Operators* list and click the *{PM-kepler}* tab. - -. Click *Create {PM-kepler}*. If you already have a {PM-kepler} instance created, click *Edit Kepler*. - -. Configure `.spec.exporter.redfish` of the {PM-kepler} instance by specifying the mandatory `secretRef` field. You can also configure the optional `probeInterval` and `skipSSLVerify` fields to meet your needs. -+ -.Example {PM-kepler} instance -[source,yaml] ----- -apiVersion: kepler.system.sustainable.computing.io/v1alpha1 -kind: Kepler -metadata: - name: kepler -spec: - exporter: - deployment: -# ... - redfish: - secretRef: required <1> - probeInterval: 60s <2> - skipSSLVerify: false <3> -# ... ----- -<1> Required: Specifies the name of the secret that contains the credentials for accessing the Redfish server. -<2> Optional: Controls the frequency at which the power information is queried from Redfish. The default value is `60s`. -<3> Optional: Controls if {PM-kepler} skips verifying the Redfish server certificate. The default value is `false`. -+ -[NOTE] -==== -After {PM-kepler} is deployed, the `openshift-power-monitoring` namespace is created. -==== -. Create the `redfish.csv` file with the following data format: -+ -[source,csv] ----- -,,,https:/// ----- -+ -.Example `redfish.csv` file -[source,csv] ----- -control-plane,exampleuser,examplepass,https://redfish.nodes.example.com -worker-1,exampleuser,examplepass,https://redfish.nodes.example.com -worker-2,exampleuser,examplepass,https://another.redfish.nodes.example.com ----- -. Create the secret under the `openshift-power-monitoring` namespace. You must create the secret with the following conditions: -+ --- -* The secret type is `Opaque`. -* The credentials are stored under the `redfish.csv` key in the `data` field of the secret. --- -+ -[source,terminal] ----- -$ oc -n openshift-power-monitoring \ - create secret generic redfish-secret \ - --from-file=redfish.csv ----- -+ -.Example output -[source,yaml] ----- -apiVersion: v1 -kind: Secret -metadata: - name: redfish-secret -data: - redfish.csv: YmFyCg== - # ... ----- -+ -[IMPORTANT] -==== -The {PM-kepler} deployment will not continue until the Redfish secret is created. You can find this information in the `status` of a {PM-kepler} instance. -==== \ No newline at end of file diff --git a/modules/power-monitoring-kepler-configuration.adoc b/modules/power-monitoring-kepler-configuration.adoc index 1a5af2e67c..de8d30551b 100644 --- a/modules/power-monitoring-kepler-configuration.adoc +++ b/modules/power-monitoring-kepler-configuration.adoc @@ -6,43 +6,92 @@ [id="power-monitoring-kepler-configuration_{context}"] = The {PM-kepler} configuration -You can configure {PM-kepler} with the `spec` field of the `{PM-kepler}` resource. +You can configure {PM-kepler} with the `spec` field of the `PowerMonitor` resource. [IMPORTANT] ==== -Ensure that the name of your {PM-kepler} instance is `kepler`. All other instances are rejected by the {PM-operator} Webhook. +Ensure that the name of your `PowerMonitor` instance is `power-monitor`. All other instances are rejected by the {PM-operator} Webhook. ==== The following is the list of configuration options: -.{PM-kepler} configuration options -[options="header"] +.PowerMonitor configuration options +[cols="1,3,2", options="header"] |=== -|Name |Spec |Description |Default -|`port` |`exporter.deployment` |The port on the node where the Prometheus metrics are exposed. |`9103` -|`nodeSelector` |`exporter.deployment` |The nodes on which {PM-kepler} exporter pods are scheduled. |`kubernetes.io/os: linux` -|`tolerations` |`exporter.deployment` |The tolerations for {PM-kepler} exporter that allow the pods to be scheduled on nodes with specific characteristics. |`- operator: "Exists"` +| Name +| Description +| Default Behavior + +| deployment.nodeSelector +| The nodes on which Kepler (created by PowerMonitor) pods are scheduled. +| kubernetes.io/os: linux + +| deployment.tolerations +| The tolerations for Power Monitor that allow the pods to be scheduled on nodes with specific characteristics. +| - operator: "Exists" + +| deployment.security.mode +| Security mode can be set to either `none`, allowing unrestricted access to Kepler's metrics by any entity, or `rbac`, securing the metrics endpoint with TLS encryption and restricting access to authorized service accounts listed in `allowedSANames`. +| Set to `rbac` by default and only user workload prometheus is allowed access. + +| deployment.security.allowedSANames +| A list of Service Account Names that can access Kepler’s metrics endpoint when security mode is `rbac`. +| In OpenShift, set to `openshift-user-workload-monitoring:prometheus-user-workload` to allow user workload monitoring to scrape Kepler. + +| config.logLevel +| The level of logs to expose by Kepler. +| Set to info. + +| config.metricLevels +| A list of energy metric levels to expose. Possible values include `node`, `process`, `container`, `vm`, and `pod`. +| The default list includes `node`, `pod`, and `vm`. + +| config.staleness +| Specifies how long to wait before considering calculated power values as stale. +| 500ms (500 milliseconds). + +| config.sampleRate +| Specifies the interval for monitoring resources such as processes, containers, and VMs. +| 5s (5 seconds). + +| config.maxTerminated +| Controls terminated workload tracking. A negative value tracks unlimited workloads, zero disables tracking, and a positive value tracks the top N terminated workloads by energy consumption. +| 500. + |=== -.Example `{PM-kepler}` resource with default configuration +.Example `PowerMonitor` resource with default configuration [source,yaml] ---- -apiVersion: kepler.system.sustainable.computing.io/v1alpha1 -kind: Kepler +apiVersion: v1alpha1 +kind: PowerMonitor metadata: - name: kepler + labels: + app.kubernetes.io/name: powermonitor + app.kubernetes.io/instance: powermonitor + app.kubernetes.io/part-of: kepler-operator + name: power-monitor spec: - exporter: + kepler: deployment: - port: 9103 # <1> - nodeSelector: - kubernetes.io/os: linux # <2> - Tolerations: # <3> - - key: "" - operator: "Exists" - value: "" - effect: "" ----- -<1> The Prometheus metrics are exposed on port 9103. -<2> {PM-kepler} pods are scheduled on Linux nodes. -<3> The default tolerations allow {PM-kepler} to be scheduled on any node. + nodeSelector: + kubernetes.io/os: linux + + tolerations: + - key: key1 + operator: Equal + value: value1 + effect: NoSchedule + + security: + mode: rbac + allowedSANames: + - openshift-user-workload-monitoring:prometheus-user-workload + + config: + logLevel: info + metricLevels: [node, pod, vm] + staleness: 1s + sampleRate: 10s + maxTerminated: 1000 +---- \ No newline at end of file diff --git a/modules/power-monitoring-monitoring-kepler-status.adoc b/modules/power-monitoring-monitoring-kepler-status.adoc index ad9b949ad6..36a1c5c68f 100644 --- a/modules/power-monitoring-monitoring-kepler-status.adoc +++ b/modules/power-monitoring-monitoring-kepler-status.adoc @@ -6,25 +6,24 @@ [id="power-monitoring-monitoring-kepler-status_{context}"] = Monitoring the {PM-kepler} status -You can monitor the state of the {PM-kepler} exporter with the `status` field of the `{PM-kepler}` resource. +You can monitor the state of the {PM-kepler} exporter with the `status` field of the `PowerMonitor` resource. -The `status.exporter` field includes information, such as the following: +The `status` field includes information, such as the following: * The number of nodes currently running the {PM-kepler} pods * The number of nodes that should be running the {PM-kepler} pods * Conditions representing the health of the {PM-kepler} resource -This provides you with valuable insights into the changes made through the `spec` field. +This provides you with valuable insights into the changes made through the `spec` field. -.Example state of the `{PM-kepler}` resource +.Example state of the `PowerMonitor` resource [source,yaml] ---- apiVersion: kepler.system.sustainable.computing.io/v1alpha1 -kind: Kepler +kind: PowerMonitor metadata: - name: kepler + name: power-monitor status: - exporter: conditions: # <1> - lastTransitionTime: '2024-01-11T11:07:39Z' message: Reconcile succeeded @@ -34,7 +33,7 @@ status: type: Reconciled - lastTransitionTime: '2024-01-11T11:07:39Z' message: >- - Kepler daemonset "kepler-operator/kepler" is deployed to all nodes and + power-monitor daemonset "openshift-power-monitoring/power-monitor" is deployed to all nodes and available; ready 2/2 observedGeneration: 1 reason: DaemonSetReady @@ -43,6 +42,6 @@ status: currentNumberScheduled: 2 # <2> desiredNumberScheduled: 2 # <3> ---- -<1> The health of the {PM-kepler} resource. In this example, {PM-kepler} is successfully reconciled and ready. +<1> The health of the `PowerMonitor` resource. In this example, the `PowerMonitor` resource is successfully reconciled and ready. <2> The number of nodes currently running the {PM-kepler} pods is 2. -<3> The wanted number of nodes to run the {PM-kepler} pods is 2. +<3> The wanted number of nodes to run the {PM-kepler} pods is 2. \ No newline at end of file diff --git a/observability/power_monitoring/configuring-power-monitoring.adoc b/observability/power_monitoring/configuring-power-monitoring.adoc index 732e032cb2..e303db3f05 100644 --- a/observability/power_monitoring/configuring-power-monitoring.adoc +++ b/observability/power_monitoring/configuring-power-monitoring.adoc @@ -9,10 +9,8 @@ toc::[] :FeatureName: Power monitoring include::snippets/technology-preview.adoc[leveloffset=+2] -The `{PM-kepler}` resource is a Kubernetes custom resource definition (CRD) that enables you to configure the deployment and monitor the status of the {PM-kepler} resource. +The `PowerMonitor` resource is a Kubernetes custom resource definition (CRD) that enables you to configure the deployment and monitor the status of the `PowerMonitor` resource. include::modules/power-monitoring-kepler-configuration.adoc[leveloffset=+1] -include::modules/power-monitoring-monitoring-kepler-status.adoc[leveloffset=+1] - -include::modules/power-monitoring-configuring-kepler-redfish.adoc[leveloffset=+1] +include::modules/power-monitoring-monitoring-kepler-status.adoc[leveloffset=+1] \ No newline at end of file