From 076c576a9e54d5644e86ad4c2c9228a54097da0e Mon Sep 17 00:00:00 2001 From: Bob Furu Date: Mon, 26 Oct 2020 18:28:15 -0400 Subject: [PATCH] BZ1886712 - Document disabling autoreboot after MCO update --- ...ubleshooting-disabling-autoreboot-mco.adoc | 112 ++++++++++++++++++ modules/understanding-control-plane.adoc | 2 +- ...understanding-machine-config-operator.adoc | 15 ++- .../troubleshooting-operator-issues.adoc | 5 +- 4 files changed, 127 insertions(+), 7 deletions(-) create mode 100644 modules/troubleshooting-disabling-autoreboot-mco.adoc diff --git a/modules/troubleshooting-disabling-autoreboot-mco.adoc b/modules/troubleshooting-disabling-autoreboot-mco.adoc new file mode 100644 index 0000000000..8968188782 --- /dev/null +++ b/modules/troubleshooting-disabling-autoreboot-mco.adoc @@ -0,0 +1,112 @@ +// Module included in the following assemblies: +// +// * support/troubleshooting/troubleshooting-operator-issues.adoc + +[id="troubleshooting-disabling-autoreboot-mco_{context}"] += Disabling Machine Config Operator from automatically rebooting + +When configuration changes are made by the Machine Config Operator, {op-system-first} must reboot for the changes to take effect. Whether the configuration change is automatic, such as when a `kube-apiserver-to-kubelet-signer` CA is rotated, or manual, such as when a registry or SSH key is updated, an {op-system} node reboots automatically unless is is paused. + +To avoid unwanted disruptions, you can modify the machine config pool to prevent automatic rebooting after the Operator makes changes to the machine config. + +[NOTE] +==== +Pausing a machine config pool pauses all system reboot processes and all configuration changes from being applied. +==== + +.Prerequisites + +* You have access to the cluster as a user with the `cluster-admin` role. +* You have installed the OpenShift CLI (`oc`). +* You have root access in {product-title}. + +.Procedure +. To pause the autoreboot process after machine config changes are applied: + +* As root, update the `spec.paused` field to `true` in the MachineConfigPool CustomResourceDefinition (CRD). ++ +.Control plane (master) nodes +[source,terminal] +---- +# oc patch --type=merge --patch='{"spec":{"paused":true}}' machineconfigpool/master +---- ++ +.Worker nodes +[source,terminal] +---- +# oc patch --type=merge --patch='{"spec":{"paused":true}}' machineconfigpool/worker +---- + +. To verify that the machine config pool is paused: ++ +.Control plane (master) nodes +[source,terminal] +---- +# oc get machineconfigpool/master --template='{{.spec.paused}}' +---- ++ +.Worker nodes +[source,terminal] +---- +# oc get machineconfigpool/worker --template='{{.spec.paused}}' +---- ++ +The `spec.paused` field is `true` and the the machine config pool is paused. + +. Alternatively, to unpause the autoreboot process: + +* As root, update the `spec.paused` field to `false` in the MachineConfigPool CustomResourceDefinition (CRD). ++ +.Control plane (master) nodes +[source,terminal] +---- +# oc patch --type=merge --patch='{"spec":{"paused":false}}' machineconfigpool/master +---- ++ +.Worker nodes +[source,terminal] +---- +# oc patch --type=merge --patch='{"spec":{"paused":false}}' machineconfigpool/worker +---- ++ +[NOTE] +==== +By unpausing a machine config pool, all paused changes are applied at reboot. +==== ++ +. To verify that the machine config pool is unpaused: ++ +.Control plane (master) nodes +[source,terminal] +---- +# oc get machineconfigpool/master --template='{{.spec.paused}}' +---- ++ +.Worker nodes +[source,terminal] +---- +# oc get machineconfigpool/worker --template='{{.spec.paused}}' +---- ++ +The `spec.paused` field is `false` and the the machine config pool is unpaused. + +. To see if the machine config pool has pending changes: ++ +[source,terminal] +---- +# oc get machineconfigpool +---- ++ +.Example output +---- +NAME CONFIG UPDATED UPDATING +master rendered-master-546383f80705bd5aeaba93 True False +worker rendered-worker-b4c51bb33ccaae6fc4a6a5 True False +---- ++ +When `UPDATED` is `True` and `UPDATING` is `False`, there are no pending changes, and vice versa. + +[IMPORTANT] +==== +It is recommended to schedule a maintenance window for a reboot as early as possible by setting `spec.paused` to `false` so that the queued changes since last reboot will take effect. +==== diff --git a/modules/understanding-control-plane.adoc b/modules/understanding-control-plane.adoc index 3cf85f8cea..ae9e3be3a2 100644 --- a/modules/understanding-control-plane.adoc +++ b/modules/understanding-control-plane.adoc @@ -9,4 +9,4 @@ The control plane, which is composed of master machines, manages the {product-title} cluster. The control plane machines manage workloads on the compute machines, which are also known as worker machines. The cluster itself manages all upgrades to the machines by the actions of the Cluster Version Operator, the -Machine Config Operator, and set of individual Operators. +Machine Config Operator, and a set of individual Operators. diff --git a/modules/understanding-machine-config-operator.adoc b/modules/understanding-machine-config-operator.adoc index 9d41545535..83ae365ed1 100644 --- a/modules/understanding-machine-config-operator.adoc +++ b/modules/understanding-machine-config-operator.adoc @@ -19,15 +19,15 @@ constructs. They include: plane. It monitors all of the cluster nodes and orchestrates their configuration updates. * The `machine-config-daemon` DaemonSet, which runs on -each node in the cluster and updates a machine to configuration defined by -MachineConfig as instructed by the MachineConfigController. When the node sees +each node in the cluster and updates a machine to configuration as defined by +MachineConfig and as instructed by the MachineConfigController. When the node detects a change, it drains off its pods, applies the update, and reboots. These changes come in the form of Ignition configuration files that apply the specified machine configuration and control kubelet configuration. The update itself is delivered in a container. This process is key to the success of managing {product-title} and {op-system} updates together. * The `machine-config-server` DaemonSet, which provides the Ignition config files -to master nodes as they join the cluster. +to control plane nodes as they join the cluster. The machine configuration is a subset of the Ignition configuration. The `machine-config-daemon` reads the machine configuration to see if it needs to do @@ -36,5 +36,10 @@ configuration changes, or other changes to the operating system or {product-titl configuration. When you perform node management operations, you create or modify a -KubeletConfig Custom Resource (CR). -//See https://github.com/openshift/machine-config-operator/blob/master/docs/KubeletConfigDesign.md[KubeletConfigDesign] for details. \ No newline at end of file +KubeletConfig custom resource (CR). +//See https://github.com/openshift/machine-config-operator/blob/master/docs/KubeletConfigDesign.md[KubeletConfigDesign] for details. + +[IMPORTANT] +==== +To prevent control plane nodes from autorebooting after machine config changes are applied, you must pause the autoreboot process by setting the `spec.paused` field to `true` in the machine pool config. +==== diff --git a/support/troubleshooting/troubleshooting-operator-issues.adoc b/support/troubleshooting/troubleshooting-operator-issues.adoc index 04510ddf67..335f10d4c6 100644 --- a/support/troubleshooting/troubleshooting-operator-issues.adoc +++ b/support/troubleshooting/troubleshooting-operator-issues.adoc @@ -5,7 +5,7 @@ include::modules/common-attributes.adoc[] toc::[] -Operators are a method of packaging, deploying, and managing an {product-title} application. They act like an extension of the software vendor’s engineering team, watching over an {product-title} environment and using its current state to make decisions in real time. Operators are designed to handle upgrades seamlessly, react to failures automatically, and not take shortcuts, like skipping a software backup process to save time. +Operators are a method of packaging, deploying, and managing an {product-title} application. They act like an extension of the software vendor’s engineering team, watching over an {product-title} environment and using its current state to make decisions in real time. Operators are designed to handle upgrades seamlessly, react to failures automatically, and not take shortcuts, such as skipping a software backup process to save time. {product-title} {product-version} includes a default set of Operators that are required for proper functioning of the cluster. These default Operators are managed by the Cluster Version Operator (CVO). @@ -24,3 +24,6 @@ include::modules/querying-operator-pod-status.adoc[leveloffset=+1] // Gathering Operator logs include::modules/gathering-operator-logs.adoc[leveloffset=+1] + +// Disabling Machine Config Operator from autorebooting +include::modules/troubleshooting-disabling-autoreboot-mco.adoc[leveloffset=+1]