mirror of
https://github.com/openshift/openshift-docs.git
synced 2026-02-05 12:46:18 +01:00
196 lines
7.3 KiB
Plaintext
196 lines
7.3 KiB
Plaintext
// Module included in the following assemblies:
|
|
//
|
|
// * metering/metering-troubleshooting-debugging.adoc
|
|
|
|
:_mod-docs-content-type: PROCEDURE
|
|
[id="metering-troubleshooting_{context}"]
|
|
= Troubleshooting metering
|
|
|
|
A common issue with metering is pods failing to start. Pods might fail to start due to lack of resources or if they have a dependency on a resource that does not exist, such as a `StorageClass` or `Secret` resource.
|
|
|
|
[id="metering-not-enough-compute-resources_{context}"]
|
|
== Not enough compute resources
|
|
|
|
A common issue when installing or running metering is a lack of compute resources. As the cluster grows and more reports are created, the Reporting Operator pod requires more memory. If memory usage reaches the pod limit, the cluster considers the pod out of memory (OOM) and terminates it with an `OOMKilled` status. Ensure that metering is allocated the minimum resource requirements described in the installation prerequisites.
|
|
|
|
[NOTE]
|
|
====
|
|
The Metering Operator does not autoscale the Reporting Operator based on the load in the cluster. Therefore, CPU usage for the Reporting Operator pod does not increase as the cluster grows.
|
|
====
|
|
|
|
To determine if the issue is with resources or scheduling, follow the troubleshooting instructions included in the Kubernetes document https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#troubleshooting[Managing Compute Resources for Containers].
|
|
|
|
To troubleshoot issues due to a lack of compute resources, check the following within the `openshift-metering` namespace.
|
|
|
|
.Prerequisites
|
|
|
|
* You are currently in the `openshift-metering` namespace. Change to the `openshift-metering` namespace by running:
|
|
+
|
|
[source,terminal]
|
|
----
|
|
$ oc project openshift-metering
|
|
----
|
|
|
|
.Procedure
|
|
|
|
. Check for metering `Report` resources that fail to complete and show the status of `ReportingPeriodUnmetDependencies`:
|
|
+
|
|
[source,terminal]
|
|
----
|
|
$ oc get reports
|
|
----
|
|
+
|
|
.Example output
|
|
[source,terminal]
|
|
----
|
|
NAME QUERY SCHEDULE RUNNING FAILED LAST REPORT TIME AGE
|
|
namespace-cpu-utilization-adhoc-10 namespace-cpu-utilization Finished 2020-10-31T00:00:00Z 2m38s
|
|
namespace-cpu-utilization-adhoc-11 namespace-cpu-utilization ReportingPeriodUnmetDependencies 2m23s
|
|
namespace-memory-utilization-202010 namespace-memory-utilization ReportingPeriodUnmetDependencies 26s
|
|
namespace-memory-utilization-202011 namespace-memory-utilization ReportingPeriodUnmetDependencies 14s
|
|
----
|
|
|
|
. Check the `ReportDataSource` resources where the `NEWEST METRIC` is less than the report end date:
|
|
+
|
|
[source,terminal]
|
|
----
|
|
$ oc get reportdatasource
|
|
----
|
|
+
|
|
.Example output
|
|
[source,terminal]
|
|
----
|
|
NAME EARLIEST METRIC NEWEST METRIC IMPORT START IMPORT END LAST IMPORT TIME AGE
|
|
...
|
|
node-allocatable-cpu-cores 2020-04-23T09:14:00Z 2020-08-31T10:07:00Z 2020-04-23T09:14:00Z 2020-10-15T17:13:00Z 2020-12-09T12:45:10Z 230d
|
|
node-allocatable-memory-bytes 2020-04-23T09:14:00Z 2020-08-30T05:19:00Z 2020-04-23T09:14:00Z 2020-10-14T08:01:00Z 2020-12-09T12:45:12Z 230d
|
|
...
|
|
pod-usage-memory-bytes 2020-04-23T09:14:00Z 2020-08-24T20:25:00Z 2020-04-23T09:14:00Z 2020-10-09T23:31:00Z 2020-12-09T12:45:12Z 230d
|
|
----
|
|
|
|
. Check the health of the `reporting-operator` `Pod` resource for a high number of pod restarts:
|
|
+
|
|
[source,terminal]
|
|
----
|
|
$ oc get pods -l app=reporting-operator
|
|
----
|
|
+
|
|
.Example output
|
|
[source,terminal]
|
|
----
|
|
NAME READY STATUS RESTARTS AGE
|
|
reporting-operator-84f7c9b7b6-fr697 2/2 Running 542 8d <1>
|
|
----
|
|
<1> The Reporting Operator pod is restarting at a high rate.
|
|
|
|
. Check the `reporting-operator` `Pod` resource for an `OOMKilled` termination:
|
|
+
|
|
[source,terminal]
|
|
----
|
|
$ oc describe pod/reporting-operator-84f7c9b7b6-fr697
|
|
----
|
|
+
|
|
.Example output
|
|
[source,terminal]
|
|
----
|
|
Name: reporting-operator-84f7c9b7b6-fr697
|
|
Namespace: openshift-metering
|
|
Priority: 0
|
|
Node: ip-10-xx-xx-xx.ap-southeast-1.compute.internal/10.xx.xx.xx
|
|
...
|
|
Ports: 8080/TCP, 6060/TCP, 8082/TCP
|
|
Host Ports: 0/TCP, 0/TCP, 0/TCP
|
|
State: Running
|
|
Started: Thu, 03 Dec 2020 20:59:45 +1000
|
|
Last State: Terminated
|
|
Reason: OOMKilled <1>
|
|
Exit Code: 137
|
|
Started: Thu, 03 Dec 2020 20:38:05 +1000
|
|
Finished: Thu, 03 Dec 2020 20:59:43 +1000
|
|
----
|
|
<1> The Reporting Operator pod was terminated due to OOM kill.
|
|
|
|
|
|
[id="metering-check-and-increase-memory-limits_{context}"]
|
|
=== Increasing the reporting-operator pod memory limit
|
|
|
|
If you are experiencing an increase in pod restarts and OOM kill events, you can check the current memory limit set for the Reporting Operator pod. Increasing the memory limit allows the Reporting Operator pod to update the report data sources. If necessary, increase the memory limit in your `MeteringConfig` resource by 25% - 50%.
|
|
|
|
.Procedure
|
|
|
|
. Check the current memory limits of the `reporting-operator` `Pod` resource:
|
|
+
|
|
[source,terminal]
|
|
----
|
|
$ oc describe pod reporting-operator-67d6f57c56-79mrt
|
|
----
|
|
+
|
|
.Example output
|
|
[source,terminal]
|
|
----
|
|
Name: reporting-operator-67d6f57c56-79mrt
|
|
Namespace: openshift-metering
|
|
Priority: 0
|
|
...
|
|
Ports: 8080/TCP, 6060/TCP, 8082/TCP
|
|
Host Ports: 0/TCP, 0/TCP, 0/TCP
|
|
State: Running
|
|
Started: Tue, 08 Dec 2020 14:26:21 +1000
|
|
Ready: True
|
|
Restart Count: 0
|
|
Limits:
|
|
cpu: 1
|
|
memory: 500Mi <1>
|
|
Requests:
|
|
cpu: 500m
|
|
memory: 250Mi
|
|
Environment:
|
|
...
|
|
----
|
|
<1> The current memory limit for the Reporting Operator pod.
|
|
|
|
. Edit the `MeteringConfig` resource to update the memory limit:
|
|
+
|
|
[source,terminal]
|
|
----
|
|
$ oc edit meteringconfig/operator-metering
|
|
----
|
|
+
|
|
.Example `MeteringConfig` resource
|
|
[source,yaml]
|
|
----
|
|
kind: MeteringConfig
|
|
metadata:
|
|
name: operator-metering
|
|
namespace: openshift-metering
|
|
spec:
|
|
reporting-operator:
|
|
spec:
|
|
resources: <1>
|
|
limits:
|
|
cpu: 1
|
|
memory: 750Mi
|
|
requests:
|
|
cpu: 500m
|
|
memory: 500Mi
|
|
...
|
|
----
|
|
<1> Add or increase memory limits within the `resources` field of the `MeteringConfig` resource.
|
|
+
|
|
[NOTE]
|
|
====
|
|
If there continue to be numerous OOM killed events after memory limits are increased, this might indicate that a different issue is causing the reports to be in a pending state.
|
|
====
|
|
|
|
[id="metering-storageclass-not-configured_{context}"]
|
|
== StorageClass resource not configured
|
|
|
|
Metering requires that a default `StorageClass` resource be configured for dynamic provisioning.
|
|
|
|
See the documentation on configuring metering for information on how to check if there are any `StorageClass` resources configured for the cluster, how to set the default, and how to configure metering to use a storage class other than the default.
|
|
|
|
[id="metering-secret-not-configured-correctly_{context}"]
|
|
== Secret not configured correctly
|
|
|
|
A common issue with metering is providing the incorrect secret when configuring your persistent storage. Be sure to review the example configuration files and create you secret according to the guidelines for your storage provider.
|