openshift-docs/modules/virt-runbook-virtcontrollerdown.adoc

// Automatically generated by 'runbook-conversion.sh'. Do not edit.
// Module included in the following assemblies:
//
// * virt/logging_events_monitoring/virt-runbooks.adoc

:_content-type: REFERENCE
[id="virt-runbook-virtcontrollerdown_{context}"]
= VirtControllerDown

[discrete]
[id="meaning-virtcontrollerdown_{context}"]
== Meaning

No running `virt-controller` pod has been detected for 5 minutes.

[discrete]
[id="impact-virtcontrollerdown_{context}"]
== Impact

Any actions related to virtual machine (VM) lifecycle management fail.
This notably includes launching a new virtual machine instance (VMI)
or shutting down an existing VMI.

[discrete]
[id="diagnosis-virtcontrollerdown_{context}"]
== Diagnosis

. Set the `NAMESPACE` environment variable:
+
[source,terminal]
----
$ export NAMESPACE="$(oc get kubevirt -A \
  -o custom-columns="":.metadata.namespace)"
----

. Check the status of the `virt-controller` deployment:
+
[source,terminal]
----
$ oc get deployment -n $NAMESPACE virt-controller -o yaml
----

. Review the logs of the `virt-controller` pod:
+
[source,terminal]
----
$ oc get logs <virt-controller>
----

[discrete]
[id="mitigation-virtcontrollerdown_{context}"]
== Mitigation

This alert can have a variety of causes, including the following:

* Node resource exhaustion
* Not enough memory on the cluster
* Nodes are down
* The API server is overloaded. For example, the scheduler might be
under a heavy load and therefore not completely available.
* Networking issues

Identify the root cause and fix it, if possible.

If you cannot resolve the issue, log in to the
link:https://access.redhat.com[Customer Portal] and open a support case,
attaching the artifacts gathered during the diagnosis procedure.