openshift-docs/modules/recommended-etcd-practices.adoc

// Module included in the following assemblies:
//
// * scalability_and_performance/recommended-host-practices.adoc
// * post_installation_configuration/cluster-tasks.adoc
// * post_installation_configuration/node-tasks.adoc

[id="recommended-etcd-practices_{context}"]
= Recommended etcd practices

For large and dense clusters, etcd can suffer from poor performance
if the keyspace grows excessively large and exceeds the space quota.
Periodic maintenance of etcd, including defragmentation, must be performed
to free up space in the data store. It is highly recommended that you monitor
Prometheus for etcd metrics and defragment it when required before etcd raises
a cluster-wide alarm that puts the cluster into a maintenance mode, which
only accepts key reads and deletes. Some of the key metrics to monitor are
`etcd_server_quota_backend_bytes` which is the current quota limit,
`etcd_mvcc_db_total_size_in_use_in_bytes` which indicates the actual
database usage after a history compaction, and
`etcd_debugging_mvcc_db_total_size_in_bytes` which shows the database size
including free space waiting for defragmentation.

Etcd writes data to disk, so its performance strongly depends on disk performance. Etcd
persists proposals on disk. Slow disks and disk activity from other processes might cause long
fsync latencies, causing etcd to miss heartbeats, inability to commit new proposals to the disk
on time, which can cause request timeouts and temporary leader loss. It is highly recommended to
run etcd on machines backed by SSD/NVMe disks with low latency and high throughput.

Some of the key metrics to monitor on a deployed {product-title} cluster
are p99 of etcd disk write ahead log duration and the number of etcd leader changes.
Use Prometheus to track these metrics. `etcd_disk_wal_fsync_duration_seconds_bucket`
reports the etcd disk fsync duration, `etcd_server_leader_changes_seen_total` reports
the leader changes. To rule out a slow disk and confirm that the disk is reasonably fast,
99th percentile of the `etcd_disk_wal_fsync_duration_seconds_bucket` should be less than 10ms.

Fio, a I/O benchmarking tool can be used to validate the hardware for etcd before or after
creating the OpenShift cluster. Run fio and analyze the results:

Assuming container runtimes like podman or docker are installed on the machine under test and
the path etcd writes the data exists - /var/lib/etcd, run:

.Procedure
Run the following if using podman:
[source,terminal]
----
$ sudo podman run --volume /var/lib/etcd:/var/lib/etcd:Z quay.io/openshift-scale/etcd-perf
----

Alternatively, run the following if using docker:
[source,terminal]
----
$ sudo docker run --volume /var/lib/etcd:/var/lib/etcd:Z quay.io/openshift-scale/etcd-perf
----

The output reports whether the disk is fast enough to host etcd by comparing the 99th percentile
of the fsync metric captured from the run to see if it is less than 10ms.