openshift-docs/modules/etcd-verify-hardware.adoc

// Module included in the following assemblies:
//
// * etcd/etcd-practices.adoc

:_mod-docs-content-type: PROCEDURE
[id="etcd-verify-hardware_{context}"]
= Validating the hardware for etcd

To validate the hardware for etcd before or after you create the {product-title} cluster, you can use fio.

.Prerequisites

* Container runtimes such as Podman or Docker are installed on the machine that you are testing.
* Data is written to the `/var/lib/etcd` path.

.Procedure
* Run fio and analyze the results:
+
--
** If you use Podman, run this command:
[source,terminal]
+
----
$ sudo podman run --volume /var/lib/etcd:/var/lib/etcd:Z quay.io/cloud-bulldozer/etcd-perf
----

** If you use Docker, run this command:
[source,terminal]
+
----
$ sudo docker run --volume /var/lib/etcd:/var/lib/etcd:Z quay.io/cloud-bulldozer/etcd-perf
----
--

The output reports whether the disk is fast enough to host etcd by comparing the 99th percentile of the fsync metric captured from the run to see if it is less than 10 ms. A few of the most important etcd metrics that might affected by I/O performance are as follows:

* `etcd_disk_wal_fsync_duration_seconds_bucket` metric reports the etcd's WAL fsync duration
* `etcd_disk_backend_commit_duration_seconds_bucket`  metric reports the etcd backend commit latency duration
* `etcd_server_leader_changes_seen_total` metric reports the leader changes

Because etcd replicates the requests among all the members, its performance strongly depends on network input/output (I/O) latency. High network latencies result in etcd heartbeats taking longer than the election timeout, which results in leader elections that are disruptive to the cluster. A key metric to monitor on a deployed {product-title} cluster is the 99th percentile of etcd network peer latency on each etcd cluster member. Use Prometheus to track the metric.

The `histogram_quantile(0.99, rate(etcd_network_peer_round_trip_time_seconds_bucket[2m]))` metric reports the round trip time for etcd to finish replicating the client requests between the members. Ensure that it is less than 50 ms.