openshift-docs/modules/virt-querying-metrics.adoc

// Module included in the following assemblies:
//
// * virt/support/virt-prometheus-queries.adoc

[id="virt-querying-metrics_{context}"]
= Virtualization metrics

The following metric descriptions include example Prometheus Query Language (PromQL) queries. These metrics are not an API and might change between versions.

[NOTE]
====
The following examples use `topk` queries that specify a time period. If virtual machines are deleted during that time period, they can still appear in the query output.
====

// Hiding in ROSA/OSD as user cannot edit MCO
ifndef::openshift-rosa,openshift-dedicated[]
[id="virt-promql-vcpu-metrics_{context}"]
== vCPU metrics

The following query can identify virtual machines that are waiting for Input/Output (I/O):

`kubevirt_vmi_vcpu_wait_seconds_total`::
Returns the wait time (in seconds) for a virtual machine's vCPU. Type: Counter.

A value above '0' means that the vCPU wants to run, but the host scheduler cannot run it yet. This inability to run indicates that there is an issue with I/O.

[NOTE]
====
To query the vCPU metric, the `schedstats=enable` kernel argument must first be applied to the `MachineConfig` object. This kernel argument enables scheduler statistics used for debugging and performance tuning and adds a minor additional load to the scheduler.
====

.Example vCPU wait time query
[source,promql]
----
topk(3, sum by (name, namespace) (rate(kubevirt_vmi_vcpu_wait_seconds_total[6m]))) > 0 <1>
----
<1> This query returns the top 3 VMs waiting for I/O at every given moment over a six-minute time period.
endif::openshift-rosa,openshift-dedicated[]

[id="virt-promql-network-metrics_{context}"]
== Network metrics

The following queries can identify virtual machines that are saturating the network:

`kubevirt_vmi_network_receive_bytes_total`::
Returns the total amount of traffic received (in bytes) on the virtual machine's network. Type: Counter.

`kubevirt_vmi_network_transmit_bytes_total`::
Returns the total amount of traffic transmitted (in bytes) on the virtual machine's network. Type: Counter.

.Example network traffic query
[source,promql]
----
topk(3, sum by (name, namespace) (rate(kubevirt_vmi_network_receive_bytes_total[6m])) + sum by (name, namespace) (rate(kubevirt_vmi_network_transmit_bytes_total[6m]))) > 0 <1>
----
<1> This query returns the top 3 VMs transmitting the most network traffic at every given moment over a six-minute time period.

[id="virt-promql-storage-metrics_{context}"]
== Storage metrics

[id="virt-storage-traffic_{context}"]
=== Storage-related traffic

The following queries can identify VMs that are writing large amounts of data:

`kubevirt_vmi_storage_read_traffic_bytes_total`::
Returns the total amount (in bytes) of the virtual machine's storage-related traffic. Type: Counter.

`kubevirt_vmi_storage_write_traffic_bytes_total`::
Returns the total amount of storage writes (in bytes) of the virtual machine's storage-related traffic. Type: Counter.

.Example storage-related traffic query
[source,promql]
----
topk(3, sum by (name, namespace) (rate(kubevirt_vmi_storage_read_traffic_bytes_total[6m])) + sum by (name, namespace) (rate(kubevirt_vmi_storage_write_traffic_bytes_total[6m]))) > 0 <1>
----
<1> This query returns the top 3 VMs performing the most storage traffic at every given moment over a six-minute time period.

[id="virt-storage-snapshot-data_{context}"]
=== Storage snapshot data

`kubevirt_vmsnapshot_disks_restored_from_source`::
Returns the total number of virtual machine disks restored from the source virtual machine. Type: Gauge.

`kubevirt_vmsnapshot_disks_restored_from_source_bytes`::
Returns the amount of space in bytes restored from the source virtual machine. Type: Gauge.

.Examples of storage snapshot data queries
[source,promql]
----
kubevirt_vmsnapshot_disks_restored_from_source{vm_name="simple-vm", vm_namespace="default"} <1>
----
<1> This query returns the total number of virtual machine disks restored from the source virtual machine.

[source,promql]
----
kubevirt_vmsnapshot_disks_restored_from_source_bytes{vm_name="simple-vm", vm_namespace="default"} <1>
----
<1> This query returns the amount of space in bytes restored from the source virtual machine.

[id="virt-iops_{context}"]
=== I/O performance

The following queries can determine the I/O performance of storage devices:

`kubevirt_vmi_storage_iops_read_total`::
Returns the amount of write I/O operations the virtual machine is performing per second. Type: Counter.

`kubevirt_vmi_storage_iops_write_total`::
Returns the amount of read I/O operations the virtual machine is performing per second. Type: Counter.

.Example I/O performance query
[source,promql]
----
topk(3, sum by (name, namespace) (rate(kubevirt_vmi_storage_iops_read_total[6m])) + sum by (name, namespace) (rate(kubevirt_vmi_storage_iops_write_total[6m]))) > 0 <1>
----
<1> This query returns the top 3 VMs performing the most I/O operations per second at every given moment over a six-minute time period.

[id="virt-promql-guest-memory-metrics_{context}"]
== Guest memory swapping metrics

The following queries can identify which swap-enabled guests are performing the most memory swapping:

`kubevirt_vmi_memory_swap_in_traffic_bytes`::
Returns the total amount (in bytes) of memory the virtual guest is swapping in. Type: Gauge.

`kubevirt_vmi_memory_swap_out_traffic_bytes`::
Returns the total amount (in bytes) of memory the virtual guest is swapping out. Type: Gauge.

.Example memory swapping query
[source,promql]
----
topk(3, sum by (name, namespace) (rate(kubevirt_vmi_memory_swap_in_traffic_bytes[6m])) + sum by (name, namespace) (rate(kubevirt_vmi_memory_swap_out_traffic_bytes[6m]))) > 0 <1>
----
<1> This query returns the top 3 VMs where the guest is performing the most memory swapping at every given moment over a six-minute time period.

[NOTE]
====
Memory swapping indicates that the virtual machine is under memory pressure. Increasing the memory allocation of the virtual machine can mitigate this issue.
====