mirror of
https://github.com/openshift/openshift-docs.git
synced 2026-02-05 03:47:04 +01:00
OSDOCS#12867: Docs for hibernating a cluster
This commit is contained in:
committed by
openshift-cherrypick-robot
parent
e0d6cd8c84
commit
d8d13abf06
@@ -3539,6 +3539,8 @@ Topics:
|
||||
File: graceful-cluster-shutdown
|
||||
- Name: Restarting a cluster gracefully
|
||||
File: graceful-cluster-restart
|
||||
- Name: Hibernating a cluster
|
||||
File: hibernating-cluster
|
||||
- Name: OADP Application backup and restore
|
||||
Dir: application_backup_and_restore
|
||||
Topics:
|
||||
|
||||
41
backup_and_restore/hibernating-cluster.adoc
Normal file
41
backup_and_restore/hibernating-cluster.adoc
Normal file
@@ -0,0 +1,41 @@
|
||||
:_mod-docs-content-type: ASSEMBLY
|
||||
[id="hibernating-cluster"]
|
||||
= Hibernating an {product-title} cluster
|
||||
include::_attributes/common-attributes.adoc[]
|
||||
:context: hibernating-cluster
|
||||
|
||||
toc::[]
|
||||
|
||||
You can hibernate your {product-title} cluster for up to 90 days.
|
||||
|
||||
// About hibernating a cluster
|
||||
include::modules/hibernating-cluster-about.adoc[leveloffset=+1]
|
||||
|
||||
[id="hibernating-cluster_prerequisites_{context}"]
|
||||
== Prerequisites
|
||||
|
||||
* Take an xref:../backup_and_restore/control_plane_backup_and_restore/backing-up-etcd.adoc#backing-up-etcd-data_backup-etcd[etcd backup] prior to hibernating the cluster.
|
||||
+
|
||||
[IMPORTANT]
|
||||
====
|
||||
It is important to take an etcd backup before hibernating so that your cluster can be restored if you encounter any issues when resuming the cluster.
|
||||
|
||||
For example, the following conditions can cause the resumed cluster to malfunction:
|
||||
|
||||
* etcd data corruption during hibernation
|
||||
* Node failure due to hardware
|
||||
* Network connectivity issues
|
||||
|
||||
If your cluster fails to recover, follow the steps to xref:../backup_and_restore/control_plane_backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.adoc#dr-restoring-cluster-state[restore to a previous cluster state].
|
||||
====
|
||||
|
||||
// Hibernating a cluster
|
||||
include::modules/hibernating-cluster-hibernate.adoc[leveloffset=+1]
|
||||
|
||||
[role="_additional-resources"]
|
||||
.Additional resources
|
||||
|
||||
* xref:../backup_and_restore/control_plane_backup_and_restore/backing-up-etcd.adoc#backup-etcd[Backing up etcd]
|
||||
|
||||
// Resuming a hibernated cluster
|
||||
include::modules/hibernating-cluster-resume.adoc[leveloffset=+1]
|
||||
20
modules/hibernating-cluster-about.adoc
Normal file
20
modules/hibernating-cluster-about.adoc
Normal file
@@ -0,0 +1,20 @@
|
||||
// Module included in the following assemblies:
|
||||
//
|
||||
// * backup_and_restore/hibernating-cluster.adoc
|
||||
|
||||
:_mod-docs-content-type: CONCEPT
|
||||
[id="hibernating-cluster-about_{context}"]
|
||||
= About cluster hibernation
|
||||
|
||||
{product-title} clusters can be hibernated in order to save money on cloud hosting costs. You can hibernate your {product-title} cluster for up to 90 days and expect it to resume successfully.
|
||||
|
||||
You must wait at least 24 hours after cluster installation before hibernating your cluster to allow for the first certification rotation.
|
||||
|
||||
[IMPORTANT]
|
||||
====
|
||||
If you must hibernate your cluster before the 24 hour certificate rotation, use the following procedure instead: link:https://www.redhat.com/en/blog/enabling-openshift-4-clusters-to-stop-and-resume-cluster-vms[Enabling OpenShift 4 Clusters to Stop and Resume Cluster VMs].
|
||||
====
|
||||
|
||||
When hibernating a cluster, you must hibernate all cluster nodes. It is not supported to suspend only certain nodes.
|
||||
|
||||
After resuming, it can take up to 45 minutes for the cluster to become ready.
|
||||
97
modules/hibernating-cluster-hibernate.adoc
Normal file
97
modules/hibernating-cluster-hibernate.adoc
Normal file
@@ -0,0 +1,97 @@
|
||||
// Module included in the following assemblies:
|
||||
//
|
||||
// * backup_and_restore/hibernating-cluster.adoc
|
||||
|
||||
:_mod-docs-content-type: PROCEDURE
|
||||
[id="hibernating-cluster-hibernate_{context}"]
|
||||
= Hibernating a cluster
|
||||
|
||||
You can hibernate a cluster for up to 90 days. The cluster can recover if certificates expire while the cluster was in hibernation.
|
||||
|
||||
.Prerequisites
|
||||
|
||||
* The cluster has been running for at least 24 hours to allow the first certificate rotation to complete.
|
||||
+
|
||||
[IMPORTANT]
|
||||
====
|
||||
If you must hibernate your cluster before the 24 hour certificate rotation, use the following procedure instead: link:https://www.redhat.com/en/blog/enabling-openshift-4-clusters-to-stop-and-resume-cluster-vms[Enabling OpenShift 4 Clusters to Stop and Resume Cluster VMs].
|
||||
====
|
||||
|
||||
* You have taken an etcd backup.
|
||||
|
||||
* You have access to the cluster as a user with the `cluster-admin` role.
|
||||
|
||||
.Procedure
|
||||
|
||||
. Confirm that your cluster has been installed for at least 24 hours.
|
||||
|
||||
. Ensure that all nodes are in a good state by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc get nodes
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
NAME STATUS ROLES AGE VERSION
|
||||
ci-ln-812tb4k-72292-8bcj7-master-0 Ready control-plane,master 32m v1.31.3
|
||||
ci-ln-812tb4k-72292-8bcj7-master-1 Ready control-plane,master 32m v1.31.3
|
||||
ci-ln-812tb4k-72292-8bcj7-master-2 Ready control-plane,master 32m v1.31.3
|
||||
Ci-ln-812tb4k-72292-8bcj7-worker-a-zhdvk Ready worker 19m v1.31.3
|
||||
ci-ln-812tb4k-72292-8bcj7-worker-b-9hrmv Ready worker 19m v1.31.3
|
||||
ci-ln-812tb4k-72292-8bcj7-worker-c-q8mw2 Ready worker 19m v1.31.3
|
||||
----
|
||||
+
|
||||
All nodes should show `Ready` in the `STATUS` column.
|
||||
|
||||
. Ensure that all cluster Operators are in a good state by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc get clusteroperators
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
|
||||
authentication 4.18.0-0 True False False 51m
|
||||
baremetal 4.18.0-0 True False False 72m
|
||||
cloud-controller-manager 4.18.0-0 True False False 75m
|
||||
cloud-credential 4.18.0-0 True False False 77m
|
||||
cluster-api 4.18.0-0 True False False 42m
|
||||
cluster-autoscaler 4.18.0-0 True False False 72m
|
||||
config-operator 4.18.0-0 True False False 72m
|
||||
console 4.18.0-0 True False False 55m
|
||||
...
|
||||
----
|
||||
+
|
||||
All cluster Operators should show `AVAILABLE`=`True`, `PROGRESSING`=`False`, and `DEGRADED`=`False`.
|
||||
|
||||
. Ensure that all machine config pools are in a good state by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc get mcp
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
|
||||
master rendered-master-87871f187930e67233c837e1d07f49c7 True False False 3 3 3 0 96m
|
||||
worker rendered-worker-3c4c459dc5d90017983d7e72928b8aed True False False 3 3 3 0 96m
|
||||
----
|
||||
+
|
||||
All machine config pools should show `UPDATING`=`False` and `DEGRADED`=`False`.
|
||||
|
||||
. Stop the cluster virtual machines:
|
||||
+
|
||||
Use the tools native to your cluster's cloud environment to shut down the cluster's virtual machines.
|
||||
+
|
||||
[IMPORTANT]
|
||||
====
|
||||
If you use a bastion virtual machine, do not shut down this virtual machine.
|
||||
====
|
||||
118
modules/hibernating-cluster-resume.adoc
Normal file
118
modules/hibernating-cluster-resume.adoc
Normal file
@@ -0,0 +1,118 @@
|
||||
// Module included in the following assemblies:
|
||||
//
|
||||
// * backup_and_restore/hibernating-cluster.adoc
|
||||
|
||||
:_mod-docs-content-type: PROCEDURE
|
||||
[id="hibernating-cluster-resume_{context}"]
|
||||
= Resuming a hibernated cluster
|
||||
|
||||
When you resume a hibernated cluster within 90 days, you might have to approve certificate signing requests (CSRs) for the nodes to become ready.
|
||||
|
||||
It can take around 45 minutes for the cluster to resume, depending on the size of your cluster.
|
||||
|
||||
.Prerequisites
|
||||
|
||||
* You hibernated your cluster less than 90 days ago.
|
||||
* You have access to the cluster as a user with the `cluster-admin` role.
|
||||
|
||||
.Procedure
|
||||
|
||||
. Within 90 days of cluster hibernation, resume the cluster virtual machines:
|
||||
+
|
||||
Use the tools native to your cluster's cloud environment to resume the cluster's virtual machines.
|
||||
|
||||
. Wait about 5 minutes, depending on the number of nodes in your cluster.
|
||||
|
||||
. Approve CSRs for the nodes:
|
||||
|
||||
.. Check that there is a CSR for each node in the `NotReady` state:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc get csr
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
NAME AGE SIGNERNAME REQUESTOR REQUESTEDDURATION CONDITION
|
||||
csr-4dwsd 37m kubernetes.io/kube-apiserver-client system:node:ci-ln-812tb4k-72292-8bcj7-worker-c-q8mw2 24h Pending
|
||||
csr-4vrbr 49m kubernetes.io/kube-apiserver-client system:node:ci-ln-812tb4k-72292-8bcj7-master-1 24h Pending
|
||||
csr-4wk5x 51m kubernetes.io/kubelet-serving system:node:ci-ln-812tb4k-72292-8bcj7-master-1 <none> Pending
|
||||
csr-84vb6 51m kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper <none> Pending
|
||||
----
|
||||
|
||||
.. Approve each valid CSR by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc adm certificate approve <csr_name>
|
||||
----
|
||||
|
||||
.. Verify that all necessary CSRs were approved by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc get csr
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
NAME AGE SIGNERNAME REQUESTOR REQUESTEDDURATION CONDITION
|
||||
csr-4dwsd 37m kubernetes.io/kube-apiserver-client system:node:ci-ln-812tb4k-72292-8bcj7-worker-c-q8mw2 24h Approved,Issued
|
||||
csr-4vrbr 49m kubernetes.io/kube-apiserver-client system:node:ci-ln-812tb4k-72292-8bcj7-master-1 24h Approved,Issued
|
||||
csr-4wk5x 51m kubernetes.io/kubelet-serving system:node:ci-ln-812tb4k-72292-8bcj7-master-1 <none> Approved,Issued
|
||||
csr-84vb6 51m kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper <none> Approved,Issued
|
||||
----
|
||||
+
|
||||
CSRs should show `Approved,Issued` in the `CONDITION` column.
|
||||
|
||||
. Verify that all nodes now show as ready by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc get nodes
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
NAME STATUS ROLES AGE VERSION
|
||||
ci-ln-812tb4k-72292-8bcj7-master-0 Ready control-plane,master 32m v1.31.3
|
||||
ci-ln-812tb4k-72292-8bcj7-master-1 Ready control-plane,master 32m v1.31.3
|
||||
ci-ln-812tb4k-72292-8bcj7-master-2 Ready control-plane,master 32m v1.31.3
|
||||
Ci-ln-812tb4k-72292-8bcj7-worker-a-zhdvk Ready worker 19m v1.31.3
|
||||
ci-ln-812tb4k-72292-8bcj7-worker-b-9hrmv Ready worker 19m v1.31.3
|
||||
ci-ln-812tb4k-72292-8bcj7-worker-c-q8mw2 Ready worker 19m v1.31.3
|
||||
----
|
||||
+
|
||||
All nodes should show `Ready` in the `STATUS` column. It might take a few minutes for all nodes to become ready after approving the CSRs.
|
||||
|
||||
. Wait for cluster Operators to restart to load the new certificates.
|
||||
+
|
||||
This might take 5 or 10 minutes.
|
||||
|
||||
. Verify that all cluster Operators are in a good state by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc get clusteroperators
|
||||
----
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
|
||||
authentication 4.18.0-0 True False False 51m
|
||||
baremetal 4.18.0-0 True False False 72m
|
||||
cloud-controller-manager 4.18.0-0 True False False 75m
|
||||
cloud-credential 4.18.0-0 True False False 77m
|
||||
cluster-api 4.18.0-0 True False False 42m
|
||||
cluster-autoscaler 4.18.0-0 True False False 72m
|
||||
config-operator 4.18.0-0 True False False 72m
|
||||
console 4.18.0-0 True False False 55m
|
||||
...
|
||||
----
|
||||
+
|
||||
All cluster Operators should show `AVAILABLE`=`True`, `PROGRESSING`=`False`, and `DEGRADED`=`False`.
|
||||
Reference in New Issue
Block a user