CNV12285 Adding CNV alerts to downstream documentation

CNV12285 Adding CNV alerts to topic map CNV12285 Misc edits after initial inspection CNV12285 Correcting HCO alerts ID CNV12285 Removing kubectl and CDI_NAMESPACE references from files CNV12285 Fixing incorrectly formatted leveloffset CNV12285 Change in storage alerts -cnv to per SME review. CNV12285 Change in network alerts openshift-cnv to Namespace per SME review. CNV12285 Updates from SME review 9 Feb CNV12285 Updates from SME review 9 Feb 2nd CNV12285 Updates from SME review 9 Feb 3rd CNV12285 Updates from SME review 9 Feb 4th CNV12285 Updates from SME review 9 Feb 5th CNV12285 Build failure test 1 CNV12285 Build failure test 2 CNV12285 Build failure test 3 CNV12285 source:terminal correction CNV12285 Draft rewrite 1 CNV12285 Draft rewrite 2 CNV12285 Adding tech preview snippet CNV12285 Adding tech preview snippet 2 CNV12285 Adding tech preview snippet 3 CNV12285 Draft rewrite 3 CNV12285 Draft rewrite 4 CNV12285 Draft rewrite 5 CNV12285 Draft rewrite 6 CNV12285 Draft rewrite 7 CNV12285 Revisions based on QE review CNV12285 Changes based on QE review 2 CNV12285 Changes based on QE review 3 CNV12285 Changes based on QE review 4 CNV12285 Changes based on QE review 5 CNV12285 Changes based on QE review 6 CNV12285 Changes based on peer review CNV12285 Changes based on peer review 2
2026-02-05 12:46:18 +01:00 · 2022-01-17 21:07:57 -05:00
parent 56e0eeaa28
commit af4484f0f0
5 changed files with 819 additions and 0 deletions
--- a/_topic_maps/_topic_map.yml
+++ b/_topic_maps/_topic_map.yml
@@ -3237,6 +3237,8 @@ Topics:
    File: virt-prometheus-queries
  - Name: Exposing custom metrics for virtual machines
    File: virt-exposing-custom-metrics-for-vms
+  - Name: OpenShift Virtualization critical alerts
+    File: virt-virtualization-alerts
  - Name: Collecting OpenShift Virtualization data for Red Hat Support
    File: virt-collecting-virt-data
    Distros: openshift-enterprise
--- a/modules/virt-cnv-network-alerts.adoc
+++ b/modules/virt-cnv-network-alerts.adoc
@@ -0,0 +1,50 @@
+// Module included in the following assemblies:
+//
+// * virt/logging_events_monitoring/virt-events.html/virt-virtualization-alerts.adoc
+:_content-type: REFERENCE
+[id="virt-cnv-network-alerts_{context}"]
+= Network alerts
+
+Network alerts provide information about problems for the {VirtProductName} Network Operator.
+
+//KubeMacPoolDown Alert
+[id="KubeMacPoolDown_{context}"]
+== KubeMacPoolDown alert
+
+.Description
+
+The KubeMacPool component allocates MAC addresses and prevents MAC address conflicts.
+
+.Reason
+
+If the KubeMacPool-manager pod is down, then the creation of `VirtualMachine` objects fails.
+
+.Troubleshoot
+
+. Determine the Kubemacpool-manager pod namespace and name.
+
+[source,terminal]
+----
+$ export KMP_NAMESPACE="$(oc get pod -A --no-headers -l control-plane=mac-controller-manager | awk '{print $1}')"
+----
+
+[source,terminal]
+----
+$ export KMP_NAME="$(oc get pod -A --no-headers -l control-plane=mac-controller-manager | awk '{print $2}')"
+----
+
+. Check the Kubemacpool-manager pod description and logs to determine the source of the problem.
+
+[source,terminal]
+----
+$ oc describe pod -n $KMP_NAMESPACE $KMP_NAME
+----
+
+[source,terminal]
+----
+$ oc logs -n $KMP_NAMESPACE $KMP_NAME
+----
+
+.Resolution
+
+Open a support issue and provide the information gathered in the troubleshooting process.
--- a/modules/virt-cnv-ssp-alerts.adoc
+++ b/modules/virt-cnv-ssp-alerts.adoc
@@ -0,0 +1,168 @@
+// Module included in the following assemblies:
+//
+// * virt/logging_events_monitoring/virt-events.html/virt-virtualization-alerts.adoc
+:_content-type: REFERENCE
+[id="virt-cnv-ssp-alerts_{context}"]
+= SSP alerts
+
+SSP alerts provide information about problems for the {VirtProductName} SSP Operator.
+
+//SSPFailingToReconcile Alert
+[id="SSPFailingToReconcile_{context}"]
+== SSPFailingToReconcile alert
+
+.Description
+
+The SSP Operator’s pod is up, but the pod's reconcile cycle consistently fails. This failure includes failure to update the resources for which it is responsible, failure to deploy the template validator, or failure to deploy or update the common templates.
+
+.Reason
+
+If the SSP Operator fails to reconcile, then the deployment of dependent components fails, reconciliation of component changes fails, or both. Additionally, the updates to the common templates and template validator reset and fail.
+
+.Troubleshoot
+
+. Check the ssp-operator pod's logs for errors:
+
+[source,terminal]
+----
+$ export NAMESPACE="$(oc get deployment -A | grep ssp-operator | awk '{print $1}')"
+----
+
+[source,terminal]
+----
+$ oc -n $NAMESPACE describe pods -l control-plane=ssp-operator
+----
+
+[source,terminal]
+----
+$ oc -n $NAMESPACE logs --tail=-1 -l control-plane=ssp-operator
+----
+
+. Verify that the template validator is up. If the template validator is not up, then check the pod’s logs for errors.
+
+[source,terminal]
+----
+$ export NAMESPACE="$($ oc get deployment -A | grep ssp-operator | awk '{print $1}')"
+----
+
+[source,terminal]
+----
+$ oc -n $NAMESPACE get pods -l name=virt-template-validator
+----
+
+[source,terminal]
+----
+$ oc -n $NAMESPACE describe pods -l name=virt-template-validator
+----
+
+[source,terminal]
+----
+$ oc -n $NAMESPACE logs --tail=-1 -l name=virt-template-validator
+----
+
+.Resolution
+
+Open a support issue and provide the information gathered in the troubleshooting process.
+
+//SSPOperatorDown Alert
+[id="SSPOperatorDown_{context}"]
+== SSPOperatorDown alert
+
+.Description
+
+The SSP Operator deploys and reconciles the common templates and the template validator.
+
+.Reason
+
+If the SSP Operator is down, then the deployment of dependent components fails, reconciliation of component changes fails, or both. Additionally, the updates to the common template and template validator reset and fail.
+
+.Troubleshoot
+
+. Check ssp-operator's pod namespace:
+
+[source,terminal]
+----
+$ export NAMESPACE="$(oc get deployment -A | grep ssp-operator | awk '{print $1}')"
+----
+
+. Verify that the ssp-operator’s pod is currently down.
+
+[source,terminal]
+----
+$ oc -n $NAMESPACE get pods -l control-plane=ssp-operator
+----
+
+. Check the ssp-operator's pod description and logs.
+
+[source,terminal]
+----
+$ oc -n $NAMESPACE describe pods -l control-plane=ssp-operator
+----
+
+[source,terminal]
+----
+$ oc -n $NAMESPACE logs --tail=-1 -l control-plane=ssp-operator
+----
+
+.Resolution
+
+Open a support issue and provide the information gathered in the troubleshooting process.
+
+//SSPTemplateValidatorDown Alert
+[id="SSPTemplateValidatorDown_{context}"]
+== SSPTemplateValidatorDown alert
+
+.Description
+
+The template validator validates that virtual machines (VMs) do not violate their assigned templates.
+
+.Reason
+
+If every template validator pod is down, then the template validator fails to validate VMs against their assigned templates.
+
+.Troubleshoot
+
+. Check the namespaces of the ssp-operator pods and the virt-template-validator pods.
+
+[source,terminal]
+----
+$ export NAMESPACE_SSP="$(oc get deployment -A | grep ssp-operator | awk '{print $1}')"
+----
+
+[source,terminal]
+----
+$ export NAMESPACE="$(oc get deployment -A | grep virt-template-validator | awk '{print $1}')"
+----
+
+. Verify that the virt-template-validator’s pod is currently down.
+
+[source,terminal]
+----
+$ oc -n $NAMESPACE get pods -l name=virt-template-validator
+----
+
+. Check the pod description and logs of the ssp-operator and the virt-template-validator.
+
+[source,terminal]
+----
+$ oc -n $NAMESPACE_SSP describe pods -l name=ssp-operator
+----
+
+[source,terminal]
+----
+$ oc -n $NAMESPACE_SSP logs --tail=-1 -l name=ssp-operator
+----
+
+[source,terminal]
+----
+$ oc -n $NAMESPACE describe pods -l name=virt-template-validator
+----
+
+[source,terminal]
+----
+$ oc -n $NAMESPACE logs --tail=-1 -l name=virt-template-validator
+----
+
+.Resolution
+
+Open a support issue and provide the information gathered in the troubleshooting process.
--- a/modules/virt-cnv-virt-alerts.adoc
+++ b/modules/virt-cnv-virt-alerts.adoc
@@ -0,0 +1,574 @@
+// Module included in the following assemblies:
+//
+// * virt/logging_events_monitoring/virt-events.html/virt-virtualization-alerts.adoc
+:_content-type: REFERENCE
+[id="virt-cnv-virt-alerts_{context}"]
+= Virt alerts
+
+Virt alerts provide information about problems for the {VirtProductName} Virt Operator.
+
+//NoLeadingVirtOperator Alert
+[id="NoLeadingVirtOperator_{context}"]
+== NoLeadingVirtOperator alert
+
+.Description
+
+In the past 10 minutes, no virt-operator pod holds the leader lease, despite one or more virt-operator pods being in `Ready` state. The alert suggests no operating virt-operator pod exists.
+
+.Reason
+
+The virt-operator is the first Kubernetes Operator active in a {product-title} cluster. Its primary responsibilities are:
+
+* Installation
+* Live-update
+* Live-upgrade of a cluster
+* Monitoring the lifecycle of top-level controllers such as virt-controller, virt-handler, and virt-launcher
+* Managing the reconciliation of top-level controllers
+
+In addition, the virt-operator is responsible for cluster-wide tasks such as certificate rotation and some infrastructure management.
+
+The virt-operator deployment has a default replica of two pods with one leader pod holding a leader lease, indicating an operating virt-operator pod.
+
+This alert indicates a failure at the cluster level. Critical cluster-wide management functionalities such as certification rotation, upgrade, and reconciliation of controllers may be temporarily unavailable.
+
+.Troubleshoot
+
+Determine a virt-operator pod's leader status from the pod logs. The log messages containing `Started leading` and `acquire leader` indicate the leader status of a given virt-operator pod.
+
+Additionally, always check if there are any running virt-operator pods and the pods' statuses with these commands:
+[source,terminal]
+----
+$ export NAMESPACE="$(oc get kubevirt -A -o custom-columns="":.metadata.namespace)"
+----
+[source,terminal]
+----
+$ oc -n $NAMESPACE get pods -l kubevirt.io=virt-operator
+----
+[source,terminal]
+----
+$ oc -n $NAMESPACE logs <pod-name>
+----
+[source,terminal]
+----
+$ oc -n $NAMESPACE describe pod <pod-name>
+----
+
+*Leader pod example:*
+[source,terminal]
+----
+$ oc -n $NAMESPACE logs <pod-name> |grep lead
+----
+.Example output
+[source,terminal]
+----
+{"component":"virt-operator","level":"info","msg":"Attempting to acquire leader status","pos":"application.go:400","timestamp":"2021-11-30T12:15:18.635387Z"}
+I1130 12:15:18.635452       1 leaderelection.go:243] attempting to acquire leader lease <namespace>/virt-operator...
+I1130 12:15:19.216582       1 leaderelection.go:253] successfully acquired lease <namespace>/virt-operator
+----
+[source,terminal]
+----
+{"component":"virt-operator","level":"info","msg":"Started leading","pos":"application.go:385","timestamp":"2021-11-30T12:15:19.216836Z"}
+----
+
+*Non-leader pod example:*
+[source,terminal]
+----
+$ oc -n $NAMESPACE logs <pod-name> |grep lead
+----
+.Example output
+[source,terminal]
+----
+{"component":"virt-operator","level":"info","msg":"Attempting to acquire leader status","pos":"application.go:400","timestamp":"2021-11-30T12:15:20.533696Z"}
+I1130 12:15:20.533792       1 leaderelection.go:243] attempting to acquire leader lease <namespace>/virt-operator...
+----
+
+.Resolution
+
+There are several reasons for no virt-operator pod holding the leader lease, despite one or more virt-operator pods being in `Ready` state. Identify the root cause and take appropriate action.
+
+Otherwise, open a support issue and provide the information gathered in the troubleshooting process.
+
+//NoReadyVirtController Alert
+[id="NoReadyVirtController_{context}"]
+== NoReadyVirtController alert
+
+.Description
+
+The virt-controller monitors virtual machine instances (VMIs). The virt-controller also manages the associated pods by creating and managing the lifecycle of the pods associated with the VMI objects.
+
+A VMI object always associates with a pod during its lifetime. However, the pod instance can change over time because of VMI migration.
+
+This alert occurs when detection of no ready virt-controllers occurs for five minutes.
+
+.Reason
+
+If the virt-controller fails, then VM lifecycle management completely fails. Lifecycle management tasks include launching a new VMI or shutting down an existing VMI.
+
+.Troubleshoot
+
+. Check the vdeployment status of the virt-controller for available replicas and conditions.
+
+[source,terminal]
+----
+$ oc -n $NAMESPACE get deployment virt-controller -o yaml
+----
+
+. Check if the virt-controller pods exist and check their statuses.
+
+[source,terminal]
+----
+get pods -n $NAMESPACE |grep virt-controller
+----
+
+. Check the virt-controller pods' events.
+
+[source,terminal]
+----
+$ oc -n $NAMESPACE describe pods <virt-controller pod>
+----
+
+. Check the virt-controller pods' logs.
+
+[source,terminal]
+----
+$ oc -n $NAMESPACE logs <virt-controller pod>
+----
+
+. Check if there are issues with the nodes, such as if the nodes are in a `NotReady` state.
+
+[source,terminal]
+----
+$ oc get nodes
+----
+
+.Resolution
+
+There are several reasons for no virt-controller pods being in a `Ready` state. Identify the root cause and take appropriate action.
+
+Otherwise, open a support issue and provide the information gathered in the troubleshooting process.
+
+//NoReadyVirtOperator Alert
+[id="NoReadyVirtOperator_{context}"]
+== NoReadyVirtOperator alert
+
+.Description
+
+No detection of a virt-operator pod in the `Ready` state occurs in the past 10 minutes. The virt-operator deployment has a default replica of two pods.
+
+.Reason
+
+The virt-operator is the first Kubernetes Operator active in an {product-title} cluster. Its primary responsibilities are:
+
+* Installation
+* Live-update
+* Live-upgrade of a cluster
+* Monitoring the lifecycle of top-level controllers such as virt-controller, virt-handler, and virt-launcher
+* Managing the reconciliation of top-level controllers
+
+In addition, the virt-operator is responsible for cluster-wide tasks such as certificate rotation and some infrastructure management.
+
+[NOTE]
+====
+Virt-operator is not directly responsible for virtual machines in the cluster. Virt-operator's unavailability does not affect the custom workloads.
+====
+
+This alert indicates a failure at the cluster level. Critical cluster-wide management functionalities such as certification rotation, upgrade, and reconciliation of controllers are temporarily unavailable.
+
+.Troubleshoot
+
+. Check the deployment status of the virt-operator for available replicas and conditions.
+
+[source,terminal]
+----
+$ oc -n $NAMESPACE get deployment virt-operator -o yaml
+----
+
+. Check the virt-controller pods' events.
+
+[source,terminal]
+----
+$ oc -n $NAMESPACE describe pods <virt-operator pod>
+----
+
+. Check the virt-operator pods' logs.
+
+[source,terminal]
+----
+$ oc -n $NAMESPACE logs <virt-operator pod>
+----
+
+. Check if there are issues with the nodes for the control plane and masters, such as if they are in a `NotReady` state.
+
+[source,terminal]
+----
+$ oc get nodes
+----
+
+.Resolution
+
+There are several reasons for no virt-operator pods being in a `Ready` state. Identify the root cause and take appropriate action.
+
+Otherwise, open a support issue and provide the information gathered in the troubleshooting process.
+
+//VirtAPIDown Alert
+[id="VirtAPIDown_{context}"]
+== VirtAPIDown alert
+
+.Description
+
+All {product-title} API servers are down.
+
+.Reason
+
+If all {product-title} API servers are down, then no API calls for {product-title} entities occur.
+
+.Troubleshoot
+
+. Modify the environment variable `NAMESPACE`.
+
+[source,terminal]
+----
+$ export NAMESPACE="$(oc get kubevirt -A -o custom-columns="":.metadata.namespace)"
+----
+
+. Verify if there are any running virt-api pods.
+
+[source,terminal]
+----
+$ oc -n $NAMESPACE get pods -l kubevirt.io=virt-api
+----
+
+. View the pods' logs using `oc logs` and the pods' statuses using `oc describe`.
+
+. Check the status of the virt-api deployment. Use these commands to learn about related events and show if there are any issues with pulling an image, a crashing pod, or other similar problems.
+
+[source,terminal]
+----
+$ oc -n $NAMESPACE get deployment virt-api -o yaml
+----
+
+[source,terminal]
+----
+$ oc -n $NAMESPACE describe deployment virt-api
+----
+
+. Check if there are issues with the nodes, such as if the nodes are in a `NotReady` state.
+
+[source,terminal]
+----
+$ oc get nodes
+----
+
+.Resolution
+
+Virt-api pods can be down for several reasons. Identify the root cause and take appropriate action.
+
+Otherwise, open a support issue and provide the information gathered in the troubleshooting process.
+
+//VirtApiRESTErrorsBurst Alert
+[id="VirtApiRESTErrorsBurst_{context}"]
+== VirtApiRESTErrorsBurst alert
+
+.Description
+
+More than 80% of the REST calls fail in virt-api in the last five minutes.
+
+.Reason
+
+A very high rate of failed REST calls to virt-api causes slow response, slow execution of API calls, or even complete dismissal of API calls.
+
+.Troubleshoot
+
+. Modify the environment variable `NAMESPACE`.
+
+[source,terminal]
+----
+$ export NAMESPACE="$(oc get kubevirt -A -o custom-columns="":.metadata.namespace)"
+----
+
+. Check to see how many running virt-api pods exist.
+
+[source,terminal]
+----
+$ oc -n $NAMESPACE get pods -l kubevirt.io=virt-api
+----
+
+. View the pods' logs using `oc logs` and the pods' statuses using `oc describe`.
+
+. Check the status of the virt-api deployment to find out more information. These commands provide the associated events and show if there are any issues with pulling an image or a crashing pod.
+
+[source,terminal]
+----
+$ oc -n $NAMESPACE get deployment virt-api -o yaml
+----
+
+[source,terminal]
+----
+$ oc -n $NAMESPACE describe deployment virt-api
+----
+
+. Check if there are issues with the nodes, such as if the nodes are overloaded or not in a `NotReady` state.
+
+[source,terminal]
+----
+$ oc get nodes
+----
+
+.Resolution
+
+There are several reasons for a high rate of failed REST calls. Identify the root cause and take appropriate action.
+
+* Node resource exhaustion
+* Not enough memory on the cluster
+* Nodes are down
+* The API server overloads, such as when the scheduler is not 100% available)
+* Networking issues
+
+Otherwise, open a support issue and provide the information gathered in the troubleshooting process.
+
+//VirtControllerDown Alert
+[id="VirtControllerDown_{context}"]
+== VirtControllerDown alert
+
+.Description
+
+If no detection of virt-controllers occurs in the past five minutes, then virt-controller deployment has a default replica of two pods.
+
+.Reason
+
+If the virt-controller fails, then VM lifecycle management tasks, such as launching a new VMI or shutting down an existing VMI, completely fail.
+
+.Troubleshoot
+
+. Modify the environment variable `NAMESPACE`.
+
+[source,terminal]
+----
+$ export NAMESPACE="$(oc get kubevirt -A -o custom-columns="":.metadata.namespace)"
+----
+
+. Check the status of the virt-controller deployment.
+
+[source,terminal]
+----
+$ oc get deployment -n $NAMESPACE virt-controller -o yaml
+----
+
+. Check the virt-controller pods' events.
+
+[source,terminal]
+----
+$ oc -n $NAMESPACE describe pods <virt-controller pod>
+----
+
+. Check the virt-controller pods' logs.
+
+[source,terminal]
+----
+$ oc -n $NAMESPACE logs <virt-controller pod>
+----
+
+. Check the manager pod's logs to determine why creating the virt-controller pods fails.
+
+[source,terminal]
+----
+$ oc get logs <virt-controller-pod>
+----
+
+An example of a virt-controller pod name in the logs is `virt-controller-7888c64d66-dzc9p`. However, there may be several pods that run virt-controller.
+
+.Resolution
+
+There are several known reasons why the detection of no running virt-controller occurs. Identify the root cause from the list of possible reasons and take appropriate action.
+
+* Node resource exhaustion
+* Not enough memory on the cluster
+* Nodes are down
+* The API server overloads, such as when the scheduler is not 100% available)
+* Networking issues
+
+Otherwise, open a support issue and provide the information gathered in the troubleshooting process.
+
+//VirtControllerRESTErrorsBurst Alert
+[id="VirtControllerRESTErrorsBurst_{context}"]
+== VirtControllerRESTErrorsBurst alert
+
+.Description
+
+More than 80% of the REST calls failed in virt-controller in the last five minutes.
+
+.Reason
+
+Virt-controller has potentially fully lost connectivity to the API server. This loss does not affect running workloads, but propagation of status updates and actions like migrations cannot occur.
+
+.Troubleshoot
+
+There are two common error types associated with virt-controller REST call failure:
+
+* The API server overloads, causing timeouts. Check the API server metrics and details like response times and overall calls.
+
+* The virt-controller pod cannot reach the API server. Common causes are:
+** DNS issues on the node
+** Networking connectivity issues
+
+.Resolution
+
+Check the virt-controller logs to determine if the virt-controller pod cannot connect to the API server at all. If so, delete the pod to force a restart.
+
+Additionally, verify if node resource exhaustion or not having enough memory on the cluster is causing the connection failure.
+
+The issue normally relates to DNS or CNI issues outside of the scope of this alert.
+
+Otherwise, open a support issue and provide the information gathered in the troubleshooting process.
+
+//VirtHandlerRESTErrorsBurst Alert
+[id="VirtHandlerRESTErrorsBurst_{context}"]
+== VirtHandlerRESTErrorsBurst alert
+
+.Description
+
+More than 80% of the REST calls failed in virt-handler in the last five minutes.
+
+.Reason
+
+Virt-handler lost the connection to the API server. Running workloads on the affected node still run, but status updates cannot propagate and actions such as migrations cannot occur.
+
+.Troubleshoot
+
+There are two common error types associated with virt-operator REST call failure:
+
+* The API server overloads, causing timeouts. Check the API server metrics and details like response times and overall calls.
+
+* The virt-operator pod cannot reach the API server. Common causes are:
+** DNS issues on the node
+** Networking connectivity issues
+
+.Resolution
+
+If the virt-handler cannot connect to the API server, delete the pod to force a restart. The issue normally relates to DNS or CNI issues outside of the scope of this alert. Identify the root cause and take appropriate action.
+
+Otherwise, open a support issue and provide the information gathered in the troubleshooting process.
+
+//VirtOperatorDown Alert
+[id="VirtOperatorDown_{context}"]
+== VirtOperatorDown alert
+
+.Description
+
+This alert occurs when no virt-operator pod is in the `Running` state in the past 10 minutes. The virt-operator deployment has a default replica of two pods.
+
+.Reason
+
+The virt-operator is the first Kubernetes Operator active in an {product-title} cluster. Its primary responsibilities are:
+
+* Installation
+* Live-update
+* Live-upgrade of a cluster
+* Monitoring the lifecycle of top-level controllers such as virt-controller, virt-handler, and virt-launcher
+* Managing the reconciliation of top-level controllers
+
+In addition, the virt-operator is responsible for cluster-wide tasks such as certificate rotation and some infrastructure management.
+
+[NOTE]
+====
+The virt-operator is not directly responsible for virtual machines in the cluster. The virt-operator's unavailability does not affect the custom workloads.
+====
+
+This alert indicates a failure at the cluster level. Critical cluster-wide management functionalities such as certification rotation, upgrade, and reconciliation of controllers are temporarily unavailable.
+
+.Troubleshoot
+
+. Modify the environment variable `NAMESPACE`.
+
+[source,terminal]
+----
+$ export NAMESPACE="$(oc get kubevirt -A -o custom-columns="":.metadata.namespace)"
+----
+
+. Check the status of the virt-operator deployment.
+
+[source,terminal]
+----
+$ oc get deployment -n $NAMESPACE virt-operator -o yaml
+----
+
+. Check the virt-operator pods' events.
+
+[source,terminal]
+----
+$ oc -n $NAMESPACE describe pods <virt-operator pod>
+----
+
+. Check the virt-operator pods' logs.
+
+[source,terminal]
+----
+$ oc -n $NAMESPACE logs <virt-operator pod>
+----
+
+. Check the manager pod's logs to determine why creating the virt-operator pods fails.
+
+[source,terminal]
+----
+$ oc get logs <virt-operator-pod>
+----
+
+An example of a virt-operator pod name in the logs is `virt-operator-7888c64d66-dzc9p`. However, there may be several pods that run virt-operator.
+
+.Resolution
+
+There are several known reasons why the detection of no running virt-operator occurs. Identify the root cause from the list of possible reasons and take appropriate action.
+
+* Node resource exhaustion
+* Not enough memory on the cluster
+* Nodes are down
+* The API server overloads, such as when the scheduler is not 100% available)
+* Networking issues
+
+Otherwise, open a support issue and provide the information gathered in the troubleshooting process.
+
+//VirtOperatorRESTErrorsBurst Alert
+[id="VirtOperatorRESTErrorsBurst_{context}"]
+== VirtOperatorRESTErrorsBurst alert
+
+.Description
+
+More than 80% of the REST calls failed in virt-operator in the last five minutes.
+
+.Reason
+
+Virt-operator lost the connection to the API server. Cluster-level actions such as upgrading and controller reconciliation do not function. There is no effect to customer workloads such as VMs and VMIs.
+
+.Troubleshoot
+
+There are two common error types associated with virt-operator REST call failure:
+
+* The API server overloads, causing timeouts. Check the API server metrics and details, such as response times and overall calls.
+
+* The virt-operator pod cannot reach the API server. Common causes are network connectivity problems and DNS issues on the node. Check the virt-operator logs to verify that the pod can connect to the API server at all.
+
+[source,terminal]
+----
+$ export NAMESPACE="$(oc get kubevirt -A -o custom-columns="":.metadata.namespace)"
+----
+
+[source,terminal]
+----
+$ oc -n $NAMESPACE get pods -l kubevirt.io=virt-operator
+----
+
+[source,terminal]
+----
+$ oc -n $NAMESPACE logs <pod-name>
+----
+
+[source,terminal]
+----
+$ oc -n $NAMESPACE describe pod <pod-name>
+----
+
+.Resolution
+
+If the virt-operator cannot connect to the API server, delete the pod to force a restart. The issue normally relates to DNS or CNI issues outside of the scope of this alert. Identify the root cause and take appropriate action.
+
+Otherwise, open a support issue and provide the information gathered in the troubleshooting process.
--- a/virt/logging_events_monitoring/virt-virtualization-alerts.adoc
+++ b/virt/logging_events_monitoring/virt-virtualization-alerts.adoc
@@ -0,0 +1,25 @@
+:_content-type: ASSEMBLY
+[id="virt-virtualization-alerts"]
+= {VirtProductName} critical alerts
+include::modules/virt-document-attributes.adoc[]
+include::modules/common-attributes.adoc[]
+:context: virt-virtualization-alerts
+
+toc::[]
+
+:FeatureName: {VirtProductName} critical alerts
+include::snippets/technology-preview.adoc[]
+
+{VirtProductName} has alerts that inform you when a problem occurs. Critical alerts require immediate attention.
+
+Each alert has a corresponding description of the problem, a reason for why the alert is occurring, a troubleshooting process to diagnose the source of the problem, and steps for resolving the alert.
+
+include::modules/virt-cnv-network-alerts.adoc[leveloffset=+1]
+include::modules/virt-cnv-ssp-alerts.adoc[leveloffset=+1]
+include::modules/virt-cnv-virt-alerts.adoc[leveloffset=+1]
+
+[role="_additional-resources"]
+[id="additional-resources_virt-virtualization-alerts"]
+== Additional resources
+
+* xref:../../support/getting-support.adoc[Getting support]