1
0
mirror of https://github.com/openshift/openshift-docs.git synced 2026-02-05 12:46:18 +01:00

OCPBUGS-59852: Updated how to add control planes to BM Postinstall docs

This commit is contained in:
dfitzmau
2025-09-02 12:59:32 +01:00
parent 5935ff1591
commit 4700faccda
4 changed files with 404 additions and 5 deletions

View File

@@ -2441,6 +2441,8 @@ Topics:
File: cpmso-troubleshooting
- Name: Disabling the control plane machine set
File: cpmso-disabling
- Name: Manually scaling control plane machines
File: cpmso-manually-scaling-control-planes
- Name: Managing machines with the Cluster API
Dir: cluster_api_machine_management
Topics:

View File

@@ -0,0 +1,17 @@
:_mod-docs-content-type: ASSEMBLY
[id="cpmso-manually-scaling-control-planes"]
= Manually scaling control plane machines
include::_attributes/common-attributes.adoc[]
:context: cpmso-manually-scaling-control-planes
toc::[]
When installing a cluster on bare-metal infrastructure, you can manually scale up to 4 or 5 control plane nodes for your cluster. Consider this use case in situations where you need to recover your cluster from a degraded state, perform deep-level debugging, or ensure stability and security of the control planes in complex scenarios.
[IMPORTANT]
====
Red{nbsp}Hat supports a cluster that has 4 or 5 control plane nodes only on bare-metal infrastructure.
====
// Adding a control plane node to your cluster
include::modules/creating-control-plane-node.adoc[leveloffset=+1]

View File

@@ -0,0 +1,380 @@
// Module included in the following assemblies:
//
// * machine_management/control_plane_machines_management/cpmso-manually-scaling-control-planes.adoc
:_mod-docs-content-type: PROCEDURE
[id="creating-control-plane-node_{context}"]
= Adding a control plane node to your cluster
When installing a cluster on bare-metal infrastructure, you can manually scale up to 4 or 5 control plane nodes for your cluster. The example in the procedure uses `node-5` as the new control plane node.
.Prerequisites
* You have installed a healthy cluster with at least three control plane nodes.
* You have created a single control plane node that you intend to add to your cluster as a postinstalltion task.
.Procedure
. Retrieve pending Certificate Signing Requests (CSRs) for the new control plane node by entering the following command:
+
[source,terminal]
----
$ oc get csr | grep Pending
----
. Approve all pending CSRs for the control plane node by entering the following command:
+
[source,terminal]
----
$ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs --no-run-if-empty oc adm certificate approve
----
+
[IMPORTANT]
====
You must approve the CSRs to complete the installation.
====
. Confirm that the control plane node is in the `Ready` status by entering the following command:
+
[source,terminal]
----
$ oc get nodes
----
+
[NOTE]
====
On installer-provisioned infrastructure, the etcd Operator relies on the Machine API to manage the control plane and ensure etcd quorum. The Machine API then uses `Machine` CRs to represent and manage the underlying control plane nodes.
====
. Create the `BareMetalHost` and `Machine` CRs and link them to the `Node` CR of the control plane node.
+
.. Create the `BareMetalHost` CR with a unique `.metadata.name` value as demonstrated in the following example:
+
[source,yaml]
----
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
name: node-5
namespace: openshift-machine-api
spec:
automatedCleaningMode: metadata
bootMACAddress: 00:00:00:00:00:02
bootMode: UEFI
customDeploy:
method: install_coreos
externallyProvisioned: true
online: true
userData:
name: master-user-data-managed
namespace: openshift-machine-api
# ...
----
+
.. Apply the `BareMetalHost` CR by entering the following command:
+
[source,terminal]
----
$ oc apply -f <filename> <1>
----
<1> Replace <filename> with the name of the `BareMetalHost` CR.
+
.. Create the `Machine` CR by using the unique `.metadata.name` value as demonstrated in the following example:
+
[source,yaml]
----
apiVersion: machine.openshift.io/v1beta1
kind: Machine
metadata:
annotations:
machine.openshift.io/instance-state: externally provisioned
metal3.io/BareMetalHost: openshift-machine-api/node-5
finalizers:
- machine.machine.openshift.io
labels:
machine.openshift.io/cluster-api-cluster: <cluster_name> <1>
machine.openshift.io/cluster-api-machine-role: master
machine.openshift.io/cluster-api-machine-type: master
name: node-5
namespace: openshift-machine-api
spec:
metadata: {}
providerSpec:
value:
apiVersion: baremetal.cluster.k8s.io/v1alpha1
customDeploy:
method: install_coreos
hostSelector: {}
image:
checksum: ""
url: ""
kind: BareMetalMachineProviderSpec
metadata:
creationTimestamp: null
userData:
name: master-user-data-managed
# ...
----
<1> Replace `<cluster_name>` with the name of the specific cluster, for example, `test-day2-1-6qv96`.
+
.. Get the cluster name by running the following command:
+
[source,terminal]
----
$ oc get infrastructure cluster -o=jsonpath='{.status.infrastructureName}{"\n"}'
----
+
.. Apply the `Machine` CR by entering the following command:
+
[source,terminal]
----
$ oc apply -f <filename> <1>
----
<1> Replace `<filename>` with the name of the `Machine` CR.
+
.. Link `BareMetalHost`, `Machine`, and `Node` objects by running the `link-machine-and-node.sh` script:
+
... Copy the following `link-machine-and-node.sh` script to a local machine:
+
[source,text]
----
#!/bin/bash
# Credit goes to
# https://bugzilla.redhat.com/show_bug.cgi?id=1801238.
# This script will link Machine object
# and Node object. This is needed
# in order to have IP address of
# the Node present in the status of the Machine.
set -e
machine="$1"
node="$2"
if [ -z "$machine" ] || [ -z "$node" ]; then
echo "Usage: $0 MACHINE NODE"
exit 1
fi
node_name=$(echo "${node}" | cut -f2 -d':')
oc proxy &
proxy_pid=$!
function kill_proxy {
kill $proxy_pid
}
trap kill_proxy EXIT SIGINT
HOST_PROXY_API_PATH="http://localhost:8001/apis/metal3.io/v1alpha1/namespaces/openshift-machine-api/baremetalhosts"
function print_nics() {
local ips
local eob
declare -a ips
readarray -t ips < <(echo "${1}" \
| jq '.[] | select(. | .type == "InternalIP") | .address' \
| sed 's/"//g')
eob=','
for (( i=0; i<${#ips[@]}; i++ )); do
if [ $((i+1)) -eq ${#ips[@]} ]; then
eob=""
fi
cat <<- EOF
{
"ip": "${ips[$i]}",
"mac": "00:00:00:00:00:00",
"model": "unknown",
"speedGbps": 10,
"vlanId": 0,
"pxe": true,
"name": "eth1"
}${eob}
EOF
done
}
function wait_for_json() {
local name
local url
local curl_opts
local timeout
local start_time
local curr_time
local time_diff
name="$1"
url="$2"
timeout="$3"
shift 3
curl_opts="$@"
echo -n "Waiting for $name to respond"
start_time=$(date +%s)
until curl -g -X GET "$url" "${curl_opts[@]}" 2> /dev/null | jq '.' 2> /dev/null > /dev/null; do
echo -n "."
curr_time=$(date +%s)
time_diff=$((curr_time - start_time))
if [[ $time_diff -gt $timeout ]]; then
printf '\nTimed out waiting for %s' "${name}"
return 1
fi
sleep 5
done
echo " Success!"
return 0
}
wait_for_json oc_proxy "${HOST_PROXY_API_PATH}" 10 -H "Accept: application/json" -H "Content-Type: application/json"
addresses=$(oc get node -n openshift-machine-api "${node_name}" -o json | jq -c '.status.addresses')
machine_data=$(oc get machines.machine.openshift.io -n openshift-machine-api -o json "${machine}")
host=$(echo "$machine_data" | jq '.metadata.annotations["metal3.io/BareMetalHost"]' | cut -f2 -d/ | sed 's/"//g')
if [ -z "$host" ]; then
echo "Machine $machine is not linked to a host yet." 1>&2
exit 1
fi
# The address structure on the host doesn't match the node, so extract
# the values we want into separate variables so we can build the patch
# we need.
hostname=$(echo "${addresses}" | jq '.[] | select(. | .type == "Hostname") | .address' | sed 's/"//g')
set +e
read -r -d '' host_patch << EOF
{
"status": {
"hardware": {
"hostname": "${hostname}",
"nics": [
$(print_nics "${addresses}")
],
"systemVendor": {
"manufacturer": "Red Hat",
"productName": "product name",
"serialNumber": ""
},
"firmware": {
"bios": {
"date": "04/01/2014",
"vendor": "SeaBIOS",
"version": "1.11.0-2.el7"
}
},
"ramMebibytes": 0,
"storage": [],
"cpu": {
"arch": "x86_64",
"model": "Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz",
"clockMegahertz": 2199.998,
"count": 4,
"flags": []
}
}
}
}
EOF
set -e
echo "PATCHING HOST"
echo "${host_patch}" | jq .
curl -s \
-X PATCH \
"${HOST_PROXY_API_PATH}/${host}/status" \
-H "Content-type: application/merge-patch+json" \
-d "${host_patch}"
oc get baremetalhost -n openshift-machine-api -o yaml "${host}"
----
+
... Make the script executable by entering the following command:
+
[source,terminal]
----
$ chmod +x link-machine-and-node.sh
----
+
... Run the script by entering the following command:
+
[source,terminal]
----
$ bash link-machine-and-node.sh node-5 node-5
----
+
[NOTE]
====
The first `node-5` instance represents the machine, and the second instance represents the node.
====
.Verification
. Confirm members of etcd by executing into one of the pre-existing control plane nodes:
+
.. Open a remote shell session to the control plane node by entering the following command:
+
[source,terminal]
----
$ oc rsh -n openshift-etcd etcd-node-0
----
+
.. List etcd members:
+
[source,terminal]
----
# etcdctl member list -w table
----
. Check the etcd Operator configuration process until completion by entering the following command. Expected output shows `False` under the `PROGRESSING` column.
+
[source,terminal]
----
$ oc get clusteroperator etcd
----
. Confirm etcd health by running the following commands:
+
.. Open a remote shell session to the control plane node:
+
[source,terminal]
----
$ oc rsh -n openshift-etcd etcd-node-0
----
+
.. Check endpoint health. Expected output shows `is healthy` for the endpoint.
+
[source,terminal]
----
# etcdctl endpoint health
----
. Verify that all nodes are ready by entering the following command. The expected output shows the `Ready` status beside each node entry.
+
[source,terminal]
----
$ oc get nodes
----
. Verify that the cluster Operators are all available by entering the following command. Expected output lists each Operator and shows the available status as `True` beside each listed Operator.
+
[source,terminal]
----
$ oc get ClusterOperators
----
. Verify that the cluster version is correct by entering the following command:
+
[source,terminal]
----
$ oc get ClusterVersion
----
+
.Example output
[source,terminal,subs="attributes+"]]
----
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version {product-title}.5 True False 5h57m Cluster version is {product-title}.5
----

View File

@@ -18,12 +18,9 @@ You complete most of the cluster configuration and customization after you deplo
If you install your cluster on {ibm-z-name}, not all features and functions are available.
====
You modify the configuration resources to configure the major features of the
cluster, such as the image registry, networking configuration, image build
behavior, and the identity provider.
You modify the configuration resources to configure the major features of the cluster, such as the image registry, networking configuration, image build behavior, and the identity provider.
For current documentation of the settings that you control by using these resources, use
the `oc explain` command, for example `oc explain builds --api-version=config.openshift.io/v1`
For current documentation of the settings that you control by using these resources, use the `oc explain` command, for example `oc explain builds --api-version=config.openshift.io/v1`
[id="configuration-resources_{context}"]
=== Cluster configuration resources
@@ -236,6 +233,9 @@ include::modules/nodes-cluster-worker-latency-profiles-using.adoc[leveloffset=+2
xref:../machine_management/control_plane_machine_management/cpmso-about.adoc#cpmso-about[Control plane machine sets] provide management capabilities for control plane machines that are similar to what compute machine sets provide for compute machines. The availability and initial status of control plane machine sets on your cluster depend on your cloud provider and the version of {product-title} that you installed. For more information, see xref:../machine_management/control_plane_machine_management/cpmso-getting-started.adoc#cpmso-getting-started[Getting started with control plane machine sets].
// Adding a control plane node to your cluster
include::modules/creating-control-plane-node.adoc[leveloffset=+2]
[id="post-install-creating-infrastructure-machinesets-production"]
== Creating infrastructure machine sets for production environments