diff --git a/_attributes/common-attributes.adoc b/_attributes/common-attributes.adoc index 4012fbb050..31af016654 100644 --- a/_attributes/common-attributes.adoc +++ b/_attributes/common-attributes.adoc @@ -133,5 +133,10 @@ endif::[] :ibmzProductName: IBM Z // Red Hat Quay Container Security Operator :rhq-cso: Red Hat Quay Container Security Operator -:sno: single-node OpenShift -:sno-caps: Single-node OpenShift +:sno: single-node Openshift +:sno-caps: Single-node Openshift +//TALO and Redfish events Operators +:cgu-operator-first: Topology Aware Lifecycle Manager (TALM) +:cgu-operator-full: Topology Aware Lifecycle Manager +:cgu-operator: TALM +:redfish-operator: Bare Metal Event Relay diff --git a/_topic_maps/_topic_map.yml b/_topic_maps/_topic_map.yml index 074d9392e5..255a3f5e51 100644 --- a/_topic_maps/_topic_map.yml +++ b/_topic_maps/_topic_map.yml @@ -2228,6 +2228,8 @@ Topics: File: managing-alerts - Name: Reviewing monitoring dashboards File: reviewing-monitoring-dashboards +- Name: Monitoring bare-metal events + File: using-rfhe - Name: Accessing third-party monitoring APIs File: accessing-third-party-monitoring-apis - Name: Troubleshooting monitoring issues @@ -2283,6 +2285,8 @@ Topics: Distros: openshift-origin,openshift-enterprise - Name: Improving cluster stability in high latency environments using worker latency profiles File: scaling-worker-latency-profiles +- Name: Topology Aware Lifecycle Manager for cluster updates + File: cnf-talm-for-cluster-upgrades Distros: openshift-origin,openshift-enterprise - Name: Creating a performance profile File: cnf-create-performance-profiles diff --git a/images/211_OpenShift_Redfish_dataflow_0222.png b/images/211_OpenShift_Redfish_dataflow_0222.png new file mode 100644 index 0000000000..6aa8ce8cbe Binary files /dev/null and b/images/211_OpenShift_Redfish_dataflow_0222.png differ diff --git a/images/217_OpenShift_Zero_Touch_Provisioning_updates_0222_1.png b/images/217_OpenShift_Zero_Touch_Provisioning_updates_0222_1.png new file mode 100644 index 0000000000..112ac405fb Binary files /dev/null and b/images/217_OpenShift_Zero_Touch_Provisioning_updates_0222_1.png differ diff --git a/images/217_OpenShift_Zero_Touch_Provisioning_updates_0222_2.png b/images/217_OpenShift_Zero_Touch_Provisioning_updates_0222_2.png new file mode 100644 index 0000000000..d2f623fb0f Binary files /dev/null and b/images/217_OpenShift_Zero_Touch_Provisioning_updates_0222_2.png differ diff --git a/images/217_OpenShift_Zero_Touch_Provisioning_updates_0222_3.png b/images/217_OpenShift_Zero_Touch_Provisioning_updates_0222_3.png new file mode 100644 index 0000000000..13b98bb110 Binary files /dev/null and b/images/217_OpenShift_Zero_Touch_Provisioning_updates_0222_3.png differ diff --git a/modules/about-ztp-and-distributed-units-on-openshift-clusters.adoc b/modules/about-ztp-and-distributed-units-on-openshift-clusters.adoc new file mode 100644 index 0000000000..250a28977a --- /dev/null +++ b/modules/about-ztp-and-distributed-units-on-openshift-clusters.adoc @@ -0,0 +1,25 @@ +// Module included in the following assemblies: +// +// *scalability_and_performance/ztp-deploying-disconnected.adoc + +:_content-type: CONCEPT +[id="about-ztp-and-distributed-units-on-openshift-clusters_{context}"] += About ZTP and distributed units on OpenShift clusters + +You can install a distributed unit (DU) on {product-title} clusters at scale with {rh-rhacm-first} using the assisted installer (AI) and the policy generator with core-reduction technology enabled. The DU installation is done using zero touch provisioning (ZTP) in a disconnected environment. + +{rh-rhacm} manages clusters in a hub-and-spoke architecture, where a single hub cluster manages many spoke clusters. {rh-rhacm} applies radio access network (RAN) policies from predefined custom resources (CRs). Hub clusters running ACM provision and deploy the spoke clusters using ZTP and AI. DU installation follows the AI installation of {product-title} on each cluster. + +The AI service handles provisioning of {product-title} on single node clusters, three-node clusters, or standard clusters running on bare metal. ACM ships with and deploys the AI when the `MultiClusterHub` custom resource is installed. + +With ZTP and AI, you can provision {product-title} clusters to run your DUs at scale. A high-level overview of ZTP for distributed units in a disconnected environment is as follows: + +* A hub cluster running {rh-rhacm-first} manages a disconnected internal registry that mirrors the {product-title} release images. The internal registry is used to provision the spoke clusters. + +* You manage the bare metal host machines for your DUs in an inventory file that uses YAML for formatting. You store the inventory file in a Git repository. + +* You install the DU bare metal host machines on site, and make the hosts ready for provisioning. To be ready for provisioning, the following is required for each bare metal host: + +** Network connectivity - including DNS for your network. Hosts should be reachable through the hub and managed spoke clusters. Ensure there is layer 3 connectivity between the hub and the host where you want to install your hub cluster. + +** Baseboard Management Controller (BMC) details for each host - ZTP uses BMC details to connect the URL and credentials for accessing the BMC. ZTP manages the spoke cluster definition CRs, with the exception of the `BMCSecret` CR, which you create manually. These define the relevant elements for the managed clusters. diff --git a/modules/baremetal-event-relay.adoc b/modules/baremetal-event-relay.adoc new file mode 100644 index 0000000000..4f2f2599be --- /dev/null +++ b/modules/baremetal-event-relay.adoc @@ -0,0 +1,52 @@ +// Module included in the following assemblies: +// +// * operators/operator-reference.adoc +[id="baremetal-event-relay_{context}"] += {redfish-operator} + +[discrete] +== Purpose +The OpenShift {redfish-operator} manages the life-cycle of the Bare Metal Event Relay. The Bare Metal Event Relay enables you to configure the types of cluster event that are monitored using Redfish hardware events. + +[discrete] +== Configuration Objects +You can use this command to edit the configuration after installation: for example, the webhook port. +You can edit configuration objects with: + +[source,terminal] +---- +$ oc -n [namespace] edit cm hw-event-proxy-operator-manager-config +---- + +[source,terminal] +---- +apiVersion: controller-runtime.sigs.k8s.io/v1alpha1 +kind: ControllerManagerConfig +health: + healthProbeBindAddress: :8081 +metrics: + bindAddress: 127.0.0.1:8080 +webhook: + port: 9443 +leaderElection: + leaderElect: true + resourceName: 6e7a703c.redhat-cne.org +---- + +[discrete] +== Project +link:https://github.com/redhat-cne/hw-event-proxy-operator[hw-event-proxy-operator] + +[discrete] +== CRD +The proxy enables applications running on bare-metal clusters to respond quickly to Redfish hardware changes and failures such as breaches of temperature thresholds, fan failure, disk loss, power outages, and memory failure, reported using the HardwareEvent CR. + +`hardwareevents.event.redhat-cne.org`: + +* Scope: Namespaced +* CR: HardwareEvent +* Validation: Yes + +[discrete] +== Additional Resources +You can learn more in the topic xref:../monitoring/using-rfhe.adoc[Monitoring Redfish hardware events]. diff --git a/modules/cnf-about-topology-aware-lifecycle-manager-blocking-crs.adoc b/modules/cnf-about-topology-aware-lifecycle-manager-blocking-crs.adoc new file mode 100644 index 0000000000..740fc5f3a1 --- /dev/null +++ b/modules/cnf-about-topology-aware-lifecycle-manager-blocking-crs.adoc @@ -0,0 +1,381 @@ +// Module included in the following assemblies: +// Epic CNF-2600 (CNF-2133) (4.10), Story TELCODOCS-285 +// * scalability_and_performance/ztp-deploying-disconnected.adoc + +:_content-type: PROCEDURE +[id="cnf-about-topology-aware-lifecycle-manager-blocking-crs_{context}"] += Blocking ClusterGroupUpgrade CRs + +You can create multiple `ClusterGroupUpgrade` CRs and control their order of application. + +For example, if you create `ClusterGroupUpgrade` CR C that blocks the start of `ClusterGroupUpgrade` CR A, then `ClusterGroupUpgrade` CR A cannot start until the status of `ClusterGroupUpgrade` CR C becomes `UpgradeComplete`. + +One `ClusterGroupUpgrade` CR can have multiple blocking CRs. In this case, all the blocking CRs must complete before the upgrade for the current CR can start. + +.Prerequisites + +* Install the {cgu-operator-first}. +* Provision one or more managed clusters. +* Log in as a user with `cluster-admin` privileges. +* Create {rh-rhacm} policies in the hub cluster. + +.Procedure + +. Save the content of the `ClusterGroupUpgrade` CRs in the `cgu-a.yaml`, `cgu-b.yaml`, and `cgu-c.yaml` files. ++ +[source,yaml] +---- +apiVersion: ran.openshift.io/v1alpha1 +kind: ClusterGroupUpgrade +metadata: + name: cgu-a + namespace: default +spec: + blockingCRs: <1> + - name: cgu-c + namespace: default + clusters: + - spoke1 + - spoke2 + - spoke3 + enable: false + managedPolicies: + - policy1-common-cluster-version-policy + - policy2-common-pao-sub-policy + - policy3-common-ptp-sub-policy + remediationStrategy: + canaries: + - spoke1 + maxConcurrency: 2 + timeout: 240 +status: + conditions: + - message: The ClusterGroupUpgrade CR is not enabled + reason: UpgradeNotStarted + status: "False" + type: Ready + copiedPolicies: + - cgu-a-policy1-common-cluster-version-policy + - cgu-a-policy2-common-pao-sub-policy + - cgu-a-policy3-common-ptp-sub-policy + managedPoliciesForUpgrade: + - name: policy1-common-cluster-version-policy + namespace: default + - name: policy2-common-pao-sub-policy + namespace: default + - name: policy3-common-ptp-sub-policy + namespace: default + placementBindings: + - cgu-a-policy1-common-cluster-version-policy + - cgu-a-policy2-common-pao-sub-policy + - cgu-a-policy3-common-ptp-sub-policy + placementRules: + - cgu-a-policy1-common-cluster-version-policy + - cgu-a-policy2-common-pao-sub-policy + - cgu-a-policy3-common-ptp-sub-policy + remediationPlan: + - - spoke1 + - - spoke2 +---- +<1> Defines the blocking CRs. The `cgu-a` update cannot start until `cgu-c` is complete. ++ +[source,yaml] +---- +apiVersion: ran.openshift.io/v1alpha1 +kind: ClusterGroupUpgrade +metadata: + name: cgu-b + namespace: default +spec: + blockingCRs: <1> + - name: cgu-a + namespace: default + clusters: + - spoke4 + - spoke5 + enable: false + managedPolicies: + - policy1-common-cluster-version-policy + - policy2-common-pao-sub-policy + - policy3-common-ptp-sub-policy + - policy4-common-sriov-sub-policy + remediationStrategy: + maxConcurrency: 1 + timeout: 240 +status: + conditions: + - message: The ClusterGroupUpgrade CR is not enabled + reason: UpgradeNotStarted + status: "False" + type: Ready + copiedPolicies: + - cgu-b-policy1-common-cluster-version-policy + - cgu-b-policy2-common-pao-sub-policy + - cgu-b-policy3-common-ptp-sub-policy + - cgu-b-policy4-common-sriov-sub-policy + managedPoliciesForUpgrade: + - name: policy1-common-cluster-version-policy + namespace: default + - name: policy2-common-pao-sub-policy + namespace: default + - name: policy3-common-ptp-sub-policy + namespace: default + - name: policy4-common-sriov-sub-policy + namespace: default + placementBindings: + - cgu-b-policy1-common-cluster-version-policy + - cgu-b-policy2-common-pao-sub-policy + - cgu-b-policy3-common-ptp-sub-policy + - cgu-b-policy4-common-sriov-sub-policy + placementRules: + - cgu-b-policy1-common-cluster-version-policy + - cgu-b-policy2-common-pao-sub-policy + - cgu-b-policy3-common-ptp-sub-policy + - cgu-b-policy4-common-sriov-sub-policy + remediationPlan: + - - spoke4 + - - spoke5 + status: {} +---- +<1> The `cgu-b` update cannot start until `cgu-a` is complete. ++ +[source,yaml] +---- +apiVersion: ran.openshift.io/v1alpha1 +kind: ClusterGroupUpgrade +metadata: + name: cgu-c + namespace: default +spec: <1> + clusters: + - spoke6 + enable: false + managedPolicies: + - policy1-common-cluster-version-policy + - policy2-common-pao-sub-policy + - policy3-common-ptp-sub-policy + - policy4-common-sriov-sub-policy + remediationStrategy: + maxConcurrency: 1 + timeout: 240 +status: + conditions: + - message: The ClusterGroupUpgrade CR is not enabled + reason: UpgradeNotStarted + status: "False" + type: Ready + copiedPolicies: + - cgu-c-policy1-common-cluster-version-policy + - cgu-c-policy4-common-sriov-sub-policy + managedPoliciesCompliantBeforeUpgrade: + - policy2-common-pao-sub-policy + - policy3-common-ptp-sub-policy + managedPoliciesForUpgrade: + - name: policy1-common-cluster-version-policy + namespace: default + - name: policy4-common-sriov-sub-policy + namespace: default + placementBindings: + - cgu-c-policy1-common-cluster-version-policy + - cgu-c-policy4-common-sriov-sub-policy + placementRules: + - cgu-c-policy1-common-cluster-version-policy + - cgu-c-policy4-common-sriov-sub-policy + remediationPlan: + - - spoke6 + status: {} +---- +<1> The `cgu-c` update does not have any blocking CRs. {cgu-operator} starts the `cgu-c` update when the `enable` field is set to `true`. + +. Create the `ClusterGroupUpgrade` CRs by running the following command for each relevant CR: ++ +[source,terminal] +---- +$ oc apply -f .yaml +---- + +. Start the update process by running the following command for each relevant CR: ++ +[source,terminal] +---- +$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/ \ +--type merge -p '{"spec":{"enable":true}}' +---- ++ +The following examples show `ClusterGroupUpgrade` CRs where the `enable` field is set to `true`: ++ +.Example for `cgu-a` with blocking CRs ++ +[source,yaml] +---- +apiVersion: ran.openshift.io/v1alpha1 +kind: ClusterGroupUpgrade +metadata: + name: cgu-a + namespace: default +spec: + blockingCRs: + - name: cgu-c + namespace: default + clusters: + - spoke1 + - spoke2 + - spoke3 + enable: true + managedPolicies: + - policy1-common-cluster-version-policy + - policy2-common-pao-sub-policy + - policy3-common-ptp-sub-policy + remediationStrategy: + canaries: + - spoke1 + maxConcurrency: 2 + timeout: 240 +status: + conditions: + - message: 'The ClusterGroupUpgrade CR is blocked by other CRs that have not yet + completed: [cgu-c]' <1> + reason: UpgradeCannotStart + status: "False" + type: Ready + copiedPolicies: + - cgu-a-policy1-common-cluster-version-policy + - cgu-a-policy2-common-pao-sub-policy + - cgu-a-policy3-common-ptp-sub-policy + managedPoliciesForUpgrade: + - name: policy1-common-cluster-version-policy + namespace: default + - name: policy2-common-pao-sub-policy + namespace: default + - name: policy3-common-ptp-sub-policy + namespace: default + placementBindings: + - cgu-a-policy1-common-cluster-version-policy + - cgu-a-policy2-common-pao-sub-policy + - cgu-a-policy3-common-ptp-sub-policy + placementRules: + - cgu-a-policy1-common-cluster-version-policy + - cgu-a-policy2-common-pao-sub-policy + - cgu-a-policy3-common-ptp-sub-policy + remediationPlan: + - - spoke1 + - - spoke2 + status: {} +---- +<1> Shows the list of blocking CRs. ++ +.Example for `cgu-b` with blocking CRs ++ +[source,yaml] +---- +apiVersion: ran.openshift.io/v1alpha1 +kind: ClusterGroupUpgrade +metadata: + name: cgu-b + namespace: default +spec: + blockingCRs: + - name: cgu-a + namespace: default + clusters: + - spoke4 + - spoke5 + enable: true + managedPolicies: + - policy1-common-cluster-version-policy + - policy2-common-pao-sub-policy + - policy3-common-ptp-sub-policy + - policy4-common-sriov-sub-policy + remediationStrategy: + maxConcurrency: 1 + timeout: 240 +status: + conditions: + - message: 'The ClusterGroupUpgrade CR is blocked by other CRs that have not yet + completed: [cgu-a]' <1> + reason: UpgradeCannotStart + status: "False" + type: Ready + copiedPolicies: + - cgu-b-policy1-common-cluster-version-policy + - cgu-b-policy2-common-pao-sub-policy + - cgu-b-policy3-common-ptp-sub-policy + - cgu-b-policy4-common-sriov-sub-policy + managedPoliciesForUpgrade: + - name: policy1-common-cluster-version-policy + namespace: default + - name: policy2-common-pao-sub-policy + namespace: default + - name: policy3-common-ptp-sub-policy + namespace: default + - name: policy4-common-sriov-sub-policy + namespace: default + placementBindings: + - cgu-b-policy1-common-cluster-version-policy + - cgu-b-policy2-common-pao-sub-policy + - cgu-b-policy3-common-ptp-sub-policy + - cgu-b-policy4-common-sriov-sub-policy + placementRules: + - cgu-b-policy1-common-cluster-version-policy + - cgu-b-policy2-common-pao-sub-policy + - cgu-b-policy3-common-ptp-sub-policy + - cgu-b-policy4-common-sriov-sub-policy + remediationPlan: + - - spoke4 + - - spoke5 + status: {} +---- +<1> Shows the list of blocking CRs. ++ +.Example for `cgu-c` with blocking CRs ++ +[source,yaml] +---- +apiVersion: ran.openshift.io/v1alpha1 +kind: ClusterGroupUpgrade +metadata: + name: cgu-c + namespace: default +spec: + clusters: + - spoke6 + enable: true + managedPolicies: + - policy1-common-cluster-version-policy + - policy2-common-pao-sub-policy + - policy3-common-ptp-sub-policy + - policy4-common-sriov-sub-policy + remediationStrategy: + maxConcurrency: 1 + timeout: 240 +status: + conditions: + - message: The ClusterGroupUpgrade CR has upgrade policies that are still non compliant <1> + reason: UpgradeNotCompleted + status: "False" + type: Ready + copiedPolicies: + - cgu-c-policy1-common-cluster-version-policy + - cgu-c-policy4-common-sriov-sub-policy + managedPoliciesCompliantBeforeUpgrade: + - policy2-common-pao-sub-policy + - policy3-common-ptp-sub-policy + managedPoliciesForUpgrade: + - name: policy1-common-cluster-version-policy + namespace: default + - name: policy4-common-sriov-sub-policy + namespace: default + placementBindings: + - cgu-c-policy1-common-cluster-version-policy + - cgu-c-policy4-common-sriov-sub-policy + placementRules: + - cgu-c-policy1-common-cluster-version-policy + - cgu-c-policy4-common-sriov-sub-policy + remediationPlan: + - - spoke6 + status: + currentBatch: 1 + remediationPlanForBatch: + spoke6: 0 +---- +<1> The `cgu-c` update does not have any blocking CRs. diff --git a/modules/cnf-about-topology-aware-lifecycle-manager-config.adoc b/modules/cnf-about-topology-aware-lifecycle-manager-config.adoc new file mode 100644 index 0000000000..58600921ce --- /dev/null +++ b/modules/cnf-about-topology-aware-lifecycle-manager-config.adoc @@ -0,0 +1,18 @@ +// Module included in the following assemblies: +// Epic CNF-2600 (CNF-2133) (4.10), Story TELCODOCS-285 +// * scalability_and_performance/cnf-talm-for-cluster-upgrades.adoc + +:_content-type: CONCEPT +[id="cnf-about-topology-aware-lifecycle-manager-config_{context}"] += About the {cgu-operator-full} configuration + +The {cgu-operator-first} manages the deployment of {rh-rhacm-first} policies for one or more {product-title} clusters. Using {cgu-operator} in a large network of clusters allows the phased rollout of policies to the clusters in limited batches. This helps to minimize possible service disruptions when updating. With {cgu-operator}, you can control the following actions: + +* The timing of the update +* The number of {rh-rhacm}-managed clusters +* The subset of managed clusters to apply the policies to +* The update order of the clusters +* The set of policies remediated to the cluster +* The order of policies remediated to the cluster + +{cgu-operator} supports the orchestration of the {product-title} y-stream and z-stream updates, and day-two operations on y-streams and z-streams. diff --git a/modules/cnf-about-topology-aware-lifecycle-manager-policies.adoc b/modules/cnf-about-topology-aware-lifecycle-manager-policies.adoc new file mode 100644 index 0000000000..55ef99c991 --- /dev/null +++ b/modules/cnf-about-topology-aware-lifecycle-manager-policies.adoc @@ -0,0 +1,21 @@ +// Module included in the following assemblies: +// Epic CNF-2600 (CNF-2133) (4.10), Story TELCODOCS-285 +// * scalability_and_performance/cnf-talm-for-cluster-upgrades.adoc + +:_content-type: CONCEPT +[id="cnf-about-topology-aware-lifecycle-manager-about-policies_{context}"] += About managed policies used with {cgu-operator-full} + +The {cgu-operator-first} uses {rh-rhacm} policies for cluster updates. + +{cgu-operator} can be used to manage the rollout of any policy CR where the `remediationAction` field is set to `inform`. +Supported use cases include the following: + +* Manual user creation of policy CRs +* Automatically generated policies from the `PolicyGenTemplate` custom resource definition (CRD) + +For policies that update an Operator subscription with manual approval, {cgu-operator} provides additional functionality that approves the installation of the updated Operator. + +For more information about managed policies, see link:https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes/2.4/html-single/governance/index#policy-overview[Policy Overview] in the {rh-rhacm} documentation. + +For more information about the `PolicyGenTemplate` CRD, see the "About the PolicyGenTemplate" section in "Deploying distributed units at scale in a disconnected environment". diff --git a/modules/cnf-provisioning-deploying-a-distributed-unit-(du)-manually.adoc b/modules/cnf-provisioning-deploying-a-distributed-unit-(du)-manually.adoc deleted file mode 100644 index 3e716a2adb..0000000000 --- a/modules/cnf-provisioning-deploying-a-distributed-unit-(du)-manually.adoc +++ /dev/null @@ -1,112 +0,0 @@ -// CNF-950 4.7 Provisioning and deploying a Distributed Unit (DU) manually -// Module included in the following assemblies: -// -// *scalability_and_performance/cnf-provisioning-and-deploying-a-distributed-unit.adoc - -[id="cnf-provisioning-deploying-a-distributed-unit-(du)-manually_{context}"] -= Provisioning and deploying a distributed unit (DU) manually - -Radio access network (RAN) is composed of central units (CU), distributed units (DU), and radio units (RU). -RAN from the telecommunications standard perspective is shown below: - -image::135_OpenShift_Distributed_Unit_0121.svg[High level RAN overview] - -From the three components composing RAN, the CU and DU can be virtualized and implemented as cloud-native functions. - -The CU and DU split architecture is driven by real-time computing and networking requirements. A DU can be seen as a real-time part of a -telecommunication baseband unit. -One distributed unit may aggregate several cells. A CU can be seen as a non-realtime part of a baseband unit, aggregating -traffic from one or more distributed units. - -A cell in the context of a DU can be seen as a real-time application performing intensive digital signal processing, data transfer, -and algorithmic tasks. -Cells often use hardware acceleration (FPGA, GPU, eASIC) for DSP processing offload, but there are also software-only implementations -(FlexRAN), based on AVX-512 instructions. - -Running cell application on COTS hardware requires the following features to be enabled: - -* Real-time kernel -* CPU isolation -* NUMA awareness -* Huge pages memory management -* Precision timing synchronization using PTP -* AVX-512 instruction set (for Flexran and / or FPGA implementation) -* Additional features depending on the RAN Operator requirements - -Accessing hardware acceleration devices and high throughput network interface controllers by virtualized software applications -requires use of SR-IOV and Passthrough PCI device virtualization. - -In addition to the compute and acceleration requirements, DUs operate on multiple internal and external networks. - -[id="cnf-manifest-structure_{context}"] -== The manifest structure - -The profile is built from one cluster specific folder and one or more site-specific folders. -This is done to address a deployment that includes remote worker nodes, with several sites belonging to the same cluster. - -The [`cluster-config`](ran-profile/cluster-config) directory contains performance and PTP customizations based upon -Operator deployments in [`deploy`](../feature-configs/deploy) folder. - -The [`site.1.fqdn`](site.1.fqdn) folder contains site-specific network customizations. - -[id="cnf-du-prerequisites_{context}"] -== Prerequisites - -Before installing the Operators and deploying the DU, perform the following steps. - -. Create a machine config pool for the RAN worker nodes. For example: -+ -[source,yaml] ----- -cat < node-role.kubernetes.io/worker-cnf="" ----- - -. Label the node as PTP slave (DU only): -+ -[source,terminal] ----- -$ oc label --overwrite node/ ptp/slave="" ----- - -[id="cnf-du-configuration-notes_{context}"] -== SR-IOV configuration notes - -The `SriovNetworkNodePolicy` object must be configured differently for different NIC models and placements. - -|==================== -|*Manufacturer* |*deviceType* |*isRdma* -|Intel |vfio-pci or netdevice |false -|Mellanox |netdevice |structure -|==================== - -In addition, when configuring the `nicSelector`, the `pfNames` value must match the intended interface name on the specific host. - -If there is a mixed cluster where some of the nodes are deployed with Intel NICs and some with Mellanox, several SR-IOV configurations can be -created with the same `resourceName`. The device plug-in will discover only the available ones and will put the capacity on the node accordingly. diff --git a/modules/cnf-rfhe-notifications-api-refererence.adoc b/modules/cnf-rfhe-notifications-api-refererence.adoc new file mode 100644 index 0000000000..51fb19e3e5 --- /dev/null +++ b/modules/cnf-rfhe-notifications-api-refererence.adoc @@ -0,0 +1,161 @@ +// Module included in the following assemblies: +// +// * monitoring/using-rfhe.adoc + +:_content-type: REFERENCE +[id="cnf-rfhe-notifications-api-refererence_{context}"] += Subscribing applications to bare-metal events REST API reference + +Use the bare-metal events REST API to subscribe an application to the bare-metal events that are generated on the parent node. + +Subscribe applications to Redfish events by using the resource address `/cluster/node//redfish/event`, where `` is the cluster node running the application. + +Deploy your `cloud-event-consumer` application container and `cloud-event-proxy` sidecar container in a separate application pod. The `cloud-event-consumer` application subscribes to the `cloud-event-proxy` container in the application pod. + +Use the following API endpoints to subscribe the `cloud-event-consumer` application to Redfish events posted by the `cloud-event-proxy` container at [x-]`http://localhost:8089/api/cloudNotifications/v1/` in the application pod: + +* `/api/cloudNotifications/v1/subscriptions` +- `POST`: Creates a new subscription +- `GET`: Retrieves a list of subscriptions +* `/api/cloudNotifications/v1/subscriptions/` +- `GET`: Returns details for the specified subscription ID +* `api/cloudNotifications/v1/subscriptions/status/` +- `PUT`: Creates a new status ping request for the specified subscription ID +* `/api/cloudNotifications/v1/health` +- `GET`: Returns the health status of `cloudNotifications` API + +[NOTE] +==== +`9089` is the default port for the `cloud-event-consumer` container deployed in the application pod. You can configure a different port for your application as required. +==== + +[discrete] +== api/cloudNotifications/v1/subscriptions + +[discrete] +=== HTTP method + +`GET api/cloudNotifications/v1/subscriptions` + +[discrete] +==== Description + +Returns a list of subscriptions. If subscriptions exist, a `200 OK` status code is returned along with the list of subscriptions. + +.Example API response +[source,json] +---- +[ + { + "id": "ca11ab76-86f9-428c-8d3a-666c24e34d32", + "endpointUri": "http://localhost:9089/api/cloudNotifications/v1/dummy", + "uriLocation": "http://localhost:8089/api/cloudNotifications/v1/subscriptions/ca11ab76-86f9-428c-8d3a-666c24e34d32", + "resource": "/cluster/node/openshift-worker-0.openshift.example.com/redfish/event" + } +] +---- + +[discrete] +=== HTTP method + +`POST api/cloudNotifications/v1/subscriptions` + +[discrete] +==== Description + +Creates a new subscription. If a subscription is successfully created, or if it already exists, a `201 Created` status code is returned. + +.Query parameters +|=== +| Parameter | Type + +| subscription +| data +|=== + +.Example payload +[source,json] +---- +{ + "uriLocation": "http://localhost:8089/api/cloudNotifications/v1/subscriptions", + "resource": "/cluster/node/openshift-worker-0.openshift.example.com/redfish/event" +} +---- + +[discrete] +== api/cloudNotifications/v1/subscriptions/ + +[discrete] +=== HTTP method + +`GET api/cloudNotifications/v1/subscriptions/` + +[discrete] +==== Description + +Returns details for the subscription with ID `` + +.Query parameters +|=== +| Parameter | Type + +| `` +| string +|=== + +.Example API response +[source,json] +---- +{ + "id":"ca11ab76-86f9-428c-8d3a-666c24e34d32", + "endpointUri":"http://localhost:9089/api/cloudNotifications/v1/dummy", + "uriLocation":"http://localhost:8089/api/cloudNotifications/v1/subscriptions/ca11ab76-86f9-428c-8d3a-666c24e34d32", + "resource":"/cluster/node/openshift-worker-0.openshift.example.com/redfish/event" +} +---- + +[discrete] +== api/cloudNotifications/v1/subscriptions/status/ + +[discrete] +=== HTTP method + +`PUT api/cloudNotifications/v1/subscriptions/status/` + +[discrete] +==== Description + +Creates a new status ping request for subscription with ID ``. If a subscription is present, the status request is successful and a `202 Accepted` status code is returned. + +.Query parameters +|=== +| Parameter | Type + +| `` +| string +|=== + +.Example API response +[source,json] +---- +{"status":"ping sent"} +---- + +[discrete] +== api/cloudNotifications/v1/health/ + +[discrete] +=== HTTP method + +`GET api/cloudNotifications/v1/health/` + +[discrete] +==== Description + +Returns the health status for the `cloudNotifications` REST API. + +.Example API response +[source,terminal] +---- +OK +---- diff --git a/modules/cnf-topology-aware-lifecycle-manager-about-cgu-crs.adoc b/modules/cnf-topology-aware-lifecycle-manager-about-cgu-crs.adoc new file mode 100644 index 0000000000..b20a612d1b --- /dev/null +++ b/modules/cnf-topology-aware-lifecycle-manager-about-cgu-crs.adoc @@ -0,0 +1,250 @@ +// Module included in the following assemblies: +// Epic CNF-2600 (CNF-2133) (4.10), Story TELCODOCS-285 +// * scalability_and_performance/cnf-talm-for-cluster-upgrades.adoc + +:_content-type: CONCEPT +[id="talo-about-cgu-crs_{context}"] += About the ClusterGroupUpgrade CR + +The {cgu-operator-first} builds the remediation plan from the `ClusterGroupUpgrade` CR for a group of clusters. You can define the following specifications in a `ClusterGroupUpgrade` CR: + +* Clusters in the group +* Blocking `ClusterGroupUpgrade` CRs +* Applicable list of managed policies +* Number of concurrent updates +* Applicable canary updates +* Actions to perform before and after the update +* Update timing + +As {cgu-operator} works through remediation of the policies to the specified clusters, the `ClusterGroupUpgrade` CR can have the following states: + +* `UpgradeNotStarted` +* `UpgradeCannotStart` +* `UpgradeNotComplete` +* `UpgradeTimedOut` +* `UpgradeCompleted` +* `PrecachingRequired` + +[NOTE] +==== +After {cgu-operator} completes a cluster update, the cluster does not update again under the control of the same `ClusterGroupUpgrade` CR. You must create a new `ClusterGroupUpgrade` CR in the following cases: + +* When you need to update the cluster again +* When the cluster changes to non-compliant with the `inform` policy after being updated +==== + +[id="upgrade_not_started"] +== The UpgradeNotStarted state + +The initial state of the `ClusterGroupUpgrade` CR is `UpgradeNotStarted`. + +{cgu-operator} builds a remediation plan based on the following fields: + +* The `clusterSelector` field specifies the labels of the clusters that you want to update. +* The `clusters` field specifies a list of clusters to update. +* The `canaries` field specifies the clusters for canary updates. +* The `maxConcurrency` field specifies the number of clusters to update in a batch. + +You can use the `clusters` and the `clusterSelector` fields together to create a combined list of clusters. + +The remediation plan starts with the clusters listed in the `canaries` field. Each canary cluster forms a single-cluster batch. + +[NOTE] +==== +Any failures during the update of a canary cluster stops the update process. +==== + +The `ClusterGroupUpgrade` CR transitions to the `UpgradeNotCompleted` state after the remediation plan is successfully created and after the `enable` field is set to `true`. At this point, {cgu-operator} starts to update the non-compliant clusters with the specified managed policies. + +[NOTE] +==== +You can only make changes to the `spec` fields if the `ClusterGroupUpgrade` CR is either in the `UpgradeNotStarted` or the `UpgradeCannotStart` state. +==== + +.Sample `ClusterGroupUpgrade` CR in the `UpgradeNotStarted` state + +[source,yaml] +---- +apiVersion: ran.openshift.io/v1alpha1 +kind: ClusterGroupUpgrade +metadata: + name: cgu-upgrade-complete + namespace: default +spec: + clusters: <1> + - spoke1 + enable: false + managedPolicies: <2> + - policy1-common-cluster-version-policy + - policy2-common-pao-sub-policy + remediationStrategy: <3> + canaries: <4> + - spoke1 + maxConcurrency: 1 <5> + timeout: 240 +status: <6> + conditions: + - message: The ClusterGroupUpgrade CR is not enabled + reason: UpgradeNotStarted + status: "False" + type: Ready + copiedPolicies: + - cgu-upgrade-complete-policy1-common-cluster-version-policy + - cgu-upgrade-complete-policy2-common-pao-sub-policy + managedPoliciesForUpgrade: + - name: policy1-common-cluster-version-policy + namespace: default + - name: policy2-common-pao-sub-policy + namespace: default + placementBindings: + - cgu-upgrade-complete-policy1-common-cluster-version-policy + - cgu-upgrade-complete-policy2-common-pao-sub-policy + placementRules: + - cgu-upgrade-complete-policy1-common-cluster-version-policy + - cgu-upgrade-complete-policy2-common-pao-sub-policy + remediationPlan: + - - spoke1 +---- +<1> Defines the list of clusters to update. +<2> Lists the user-defined set of policies to remediate. +<3> Defines the specifics of the cluster updates. +<4> Defines the clusters for canary updates. +<5> Defines the maximum number of concurrent updates in a batch. The number of remediation batches is the number of canary clusters, plus the number of clusters, except the canary clusters, divided by the `maxConcurrency` value. The clusters that are already compliant with all the managed policies are excluded from the remediation plan. +<6> Displays information about the status of the updates. + +[id="upgrade_cannot_start"] +== The UpgradeCannotStart state + +In the `UpgradeCannotStart` state, the update cannot start because of the following reasons: + +* Blocking CRs are missing from the system +* Blocking CRs have not yet finished + +[id="upgrade_not_completed"] +== The UpgradeNotCompleted state + +In the `UpgradeNotCompleted` state, {cgu-operator} enforces the policies following the remediation plan defined in the `UpgradeNotStarted` state. + +Enforcing the policies for subsequent batches starts immediately after all the clusters of the current batch are compliant with all the managed policies. If the batch times out, {cgu-operator} moves on to the next batch. The timeout value of a batch is the `spec.timeout` field divided by the number of batches in the remediation plan. + +[NOTE] +==== +The managed policies apply in the order that they are listed in the `managedPolicies` field in the `ClusterGroupUpgrade` CR. One managed policy is applied to the specified clusters at a time. After the specified clusters comply with the current policy, the next managed policy is applied to the next non-compliant cluster. +==== + +.Sample `ClusterGroupUpgrade` CR in the `UpgradeNotCompleted` state + +[source,yaml] +---- +apiVersion: ran.openshift.io/v1alpha1 +kind: ClusterGroupUpgrade +metadata: + name: cgu-upgrade-complete + namespace: default +spec: + clusters: + - spoke1 + enable: true <1> + managedPolicies: + - policy1-common-cluster-version-policy + - policy2-common-pao-sub-policy + remediationStrategy: + maxConcurrency: 1 + timeout: 240 +status: <2> + conditions: + - message: The ClusterGroupUpgrade CR has upgrade policies that are still non compliant + reason: UpgradeNotCompleted + status: "False" + type: Ready + copiedPolicies: + - cgu-upgrade-complete-policy1-common-cluster-version-policy + - cgu-upgrade-complete-policy2-common-pao-sub-policy + managedPoliciesForUpgrade: + - name: policy1-common-cluster-version-policy + namespace: default + - name: policy2-common-pao-sub-policy + namespace: default + placementBindings: + - cgu-upgrade-complete-policy1-common-cluster-version-policy + - cgu-upgrade-complete-policy2-common-pao-sub-policy + placementRules: + - cgu-upgrade-complete-policy1-common-cluster-version-policy + - cgu-upgrade-complete-policy2-common-pao-sub-policy + remediationPlan: + - - spoke1 + status: + currentBatch: 1 + remediationPlanForBatch: <3> + spoke1: 0 +---- +<1> The update starts when the value of the `spec.enable` field is `true`. +<2> The `status` fields change accordingly when the update begins. +<3> Lists the clusters in the batch and the index of the policy that is being currently applied to each cluster. The index of the policies starts with `0` and the index follows the order of the `status.managedPoliciesForUpgrade` list. + +[id="upgrade_timed_out"] +== The UpgradeTimedOut state + +In the `UpgradeTimedOut` state, {cgu-operator} checks every hour if all the policies for the `ClusterGroupUpgrade` CR are compliant. The checks continue until the `ClusterGroupUpgrade` CR is deleted or the updates are completed. +The periodic checks allow the updates to complete if they get prolonged due to network, CPU, or other issues. + +{cgu-operator} transitions to the `UpgradeTimedOut` state in two cases: + +* When the current batch contains canary updates and the cluster in the batch does not comply with all the managed policies within the batch timeout. +* When the clusters do not comply with the managed policies within the `timeout` value specified in the `remediationStrategy` field. + +If the policies are compliant, {cgu-operator} transitions to the `UpgradeCompleted` state. + +[id="upgrade_completed"] +== The UpgradeCompleted state + +In the `UpgradeCompleted` state, the cluster updates are complete. + +.Sample `ClusterGroupUpgrade` CR in the `UpgradeCompleted` state + +[source,yaml] +---- +apiVersion: ran.openshift.io/v1alpha1 +kind: ClusterGroupUpgrade +metadata: + name: cgu-upgrade-complete + namespace: default +spec: + actions: + afterCompletion: + deleteObjects: true <1> + clusters: + - spoke1 + enable: true + managedPolicies: + - policy1-common-cluster-version-policy + - policy2-common-pao-sub-policy + remediationStrategy: + maxConcurrency: 1 + timeout: 240 +status: <2> + conditions: + - message: The ClusterGroupUpgrade CR has all clusters compliant with all the managed policies + reason: UpgradeCompleted + status: "True" + type: Ready + managedPoliciesForUpgrade: + - name: policy1-common-cluster-version-policy + namespace: default + - name: policy2-common-pao-sub-policy + namespace: default + remediationPlan: + - - spoke1 + status: + remediationPlanForBatch: + spoke1: -2 <3> +---- +<1> The value of `spec.action.afterCompletion.deleteObjects` field is `true` by default. After the update is completed, {cgu-operator} deletes the underlying {rh-rhacm} objects that were created during the update. This option is to prevent the {rh-rhacm} hub from continuously checking for compliance after a successful update. +<2> The `status` fields show that the updates completed successfully. +<3> Displays that all the policies are applied to the cluster. + +[id="precaching-required"] +[discreet] +== The PrecachingRequired state + +In the `PrecachingRequired` state, the clusters need to have images pre-cached before the update can start. For more information about pre-caching, see the "Using the container image pre-cache feature" section. diff --git a/modules/cnf-topology-aware-lifecycle-manager-apply-policies.adoc b/modules/cnf-topology-aware-lifecycle-manager-apply-policies.adoc new file mode 100644 index 0000000000..b1150216a9 --- /dev/null +++ b/modules/cnf-topology-aware-lifecycle-manager-apply-policies.adoc @@ -0,0 +1,363 @@ +// Module included in the following assemblies: +// Epic CNF-2600 (CNF-2133) (4.10), Story TELCODOCS-285 +// * scalability_and_performance/cnf-talm-for-cluster-upgrades.adoc + +:_content-type: PROCEDURE +[id="talo-apply-policies_{context}"] += Applying update policies to managed clusters + +You can update your managed clusters by applying your policies. + +.Prerequisites + +* Install the {cgu-operator-first}. +* Provision one or more managed clusters. +* Log in as a user with `cluster-admin` privileges. +* Create {rh-rhacm} policies in the hub cluster. + +.Procedure + +. Save the contents of the `ClusterGroupUpgrade` CR in the `cgu-1.yaml` file. ++ +[source,yaml] +---- +apiVersion: ran.openshift.io/v1alpha1 +kind: ClusterGroupUpgrade +metadata: + name: cgu-1 + namespace: default +spec: + managedPolicies: <1> + - policy1-common-cluster-version-policy + - policy2-common-pao-sub-policy + - policy3-common-ptp-sub-policy + - policy4-common-sriov-sub-policy + enable: false + clusters: <2> + - spoke1 + - spoke2 + - spoke5 + - spoke6 + remediationStrategy: + maxConcurrency: 2 <3> + timeout: 240 <4> +---- +<1> The name of the policies to apply. +<2> The list of clusters to update. +<3> The `maxConcurrency` field signifies the number of clusters updated at the same time. +<4> The update timeout in minutes. + +. Create the `ClusterGroupUpgrade` CR by running the following command: ++ +[source,terminal] +---- +$ oc create -f cgu-1.yaml +---- + +.. Check if the `ClusterGroupUpgrade` CR was created in the hub cluster by running the following command: ++ +[source,terminal] +---- +$ oc get cgu --all-namespaces +---- ++ +.Example output ++ +[source,terminal] +---- +NAMESPACE NAME AGE +default cgu-1 8m55s +---- + +.. Check the status of the update by running the following command: ++ +[source,terminal] +---- +$ oc get cgu -n default cgu-1 -ojsonpath='{.status}' | jq +---- ++ +.Example output ++ +[source,json] +---- +{ + "computedMaxConcurrency": 2, + "conditions": [ + { + "lastTransitionTime": "2022-02-25T15:34:07Z", + "message": "The ClusterGroupUpgrade CR is not enabled", <1> + "reason": "UpgradeNotStarted", + "status": "False", + "type": "Ready" + } + ], + "copiedPolicies": [ + "cgu-policy1-common-cluster-version-policy", + "cgu-policy2-common-pao-sub-policy", + "cgu-policy3-common-ptp-sub-policy", + "cgu-policy4-common-sriov-sub-policy" + ], + "managedPoliciesContent": { + "policy1-common-cluster-version-policy": "null", + "policy2-common-pao-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"performance-addon-operator\",\"namespace\":\"openshift-performance-addon-operator\"}]", + "policy3-common-ptp-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"ptp-operator-subscription\",\"namespace\":\"openshift-ptp\"}]", + "policy4-common-sriov-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"sriov-network-operator-subscription\",\"namespace\":\"openshift-sriov-network-operator\"}]" + }, + "managedPoliciesForUpgrade": [ + { + "name": "policy1-common-cluster-version-policy", + "namespace": "default" + }, + { + "name": "policy2-common-pao-sub-policy", + "namespace": "default" + }, + { + "name": "policy3-common-ptp-sub-policy", + "namespace": "default" + }, + { + "name": "policy4-common-sriov-sub-policy", + "namespace": "default" + } + ], + "managedPoliciesNs": { + "policy1-common-cluster-version-policy": "default", + "policy2-common-pao-sub-policy": "default", + "policy3-common-ptp-sub-policy": "default", + "policy4-common-sriov-sub-policy": "default" + }, + "placementBindings": [ + "cgu-policy1-common-cluster-version-policy", + "cgu-policy2-common-pao-sub-policy", + "cgu-policy3-common-ptp-sub-policy", + "cgu-policy4-common-sriov-sub-policy" + ], + "placementRules": [ + "cgu-policy1-common-cluster-version-policy", + "cgu-policy2-common-pao-sub-policy", + "cgu-policy3-common-ptp-sub-policy", + "cgu-policy4-common-sriov-sub-policy" + ], + "precaching": { + "spec": {} + }, + "remediationPlan": [ + [ + "spoke1", + "spoke2" + ], + [ + "spoke5", + "spoke6" + ] + ], + "status": {} +} +---- +<1> The `spec.enable` field in the `ClusterGroupUpgrade` CR is set to `false`. + +.. Check the status of the policies by running the following command: ++ +[source,terminal] +---- +$ oc get policies -A +---- ++ +.Example output +[source,terminal] +---- +NAMESPACE NAME REMEDIATION ACTION COMPLIANCE STATE AGE +default cgu-policy1-common-cluster-version-policy enforce 17m <1> +default cgu-policy2-common-pao-sub-policy enforce 17m +default cgu-policy3-common-ptp-sub-policy enforce 17m +default cgu-policy4-common-sriov-sub-policy enforce 17m +default policy1-common-cluster-version-policy inform NonCompliant 15h +default policy2-common-pao-sub-policy inform NonCompliant 15h +default policy3-common-ptp-sub-policy inform NonCompliant 18m +default policy4-common-sriov-sub-policy inform NonCompliant 18m +---- +<1> The `spec.remediationAction` field of policies currently applied on the clusters is set to `enforce`. The managed policies in `inform` mode from the `ClusterGroupUpgrade` CR remain in `inform` mode during the update. + +. Change the value of the `spec.enable` field to `true` by running the following command: ++ +[source,terminal] +---- +$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-1 \ +--patch '{"spec":{"enable":true}}' --type=merge +---- + +.Verification + +. Check the status of the update again by running the following command: ++ +[source,terminal] +---- +$ oc get cgu -n default cgu-1 -ojsonpath='{.status}' | jq +---- ++ +.Example output ++ +[source,json] +---- +{ + "computedMaxConcurrency": 2, + "conditions": [ <1> + { + "lastTransitionTime": "2022-02-25T15:34:07Z", + "message": "The ClusterGroupUpgrade CR has upgrade policies that are still non compliant", + "reason": "UpgradeNotCompleted", + "status": "False", + "type": "Ready" + } + ], + "copiedPolicies": [ + "cgu-policy1-common-cluster-version-policy", + "cgu-policy2-common-pao-sub-policy", + "cgu-policy3-common-ptp-sub-policy", + "cgu-policy4-common-sriov-sub-policy" + ], + "managedPoliciesContent": { + "policy1-common-cluster-version-policy": "null", + "policy2-common-pao-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"performance-addon-operator\",\"namespace\":\"openshift-performance-addon-operator\"}]", + "policy3-common-ptp-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"ptp-operator-subscription\",\"namespace\":\"openshift-ptp\"}]", + "policy4-common-sriov-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"sriov-network-operator-subscription\",\"namespace\":\"openshift-sriov-network-operator\"}]" + }, + "managedPoliciesForUpgrade": [ + { + "name": "policy1-common-cluster-version-policy", + "namespace": "default" + }, + { + "name": "policy2-common-pao-sub-policy", + "namespace": "default" + }, + { + "name": "policy3-common-ptp-sub-policy", + "namespace": "default" + }, + { + "name": "policy4-common-sriov-sub-policy", + "namespace": "default" + } + ], + "managedPoliciesNs": { + "policy1-common-cluster-version-policy": "default", + "policy2-common-pao-sub-policy": "default", + "policy3-common-ptp-sub-policy": "default", + "policy4-common-sriov-sub-policy": "default" + }, + "placementBindings": [ + "cgu-policy1-common-cluster-version-policy", + "cgu-policy2-common-pao-sub-policy", + "cgu-policy3-common-ptp-sub-policy", + "cgu-policy4-common-sriov-sub-policy" + ], + "placementRules": [ + "cgu-policy1-common-cluster-version-policy", + "cgu-policy2-common-pao-sub-policy", + "cgu-policy3-common-ptp-sub-policy", + "cgu-policy4-common-sriov-sub-policy" + ], + "precaching": { + "spec": {} + }, + "remediationPlan": [ + [ + "spoke1", + "spoke2" + ], + [ + "spoke5", + "spoke6" + ] + ], + "status": { + "currentBatch": 1, + "currentBatchStartedAt": "2022-02-25T15:54:16Z", + "remediationPlanForBatch": { + "spoke1": 0, + "spoke2": 1 + }, + "startedAt": "2022-02-25T15:54:16Z" + } +} +---- +<1> Reflects the update progress of the current batch. Run this command again to receive updated information about the progress. + +. If the policies include Operator subscriptions, you can check the installation progress directly on the single-node cluster. + +.. Export the `KUBECONFIG` file of the single-node cluster you want to check the installation progress for by running the following command: ++ +[source,terminal] +---- +$ export KUBECONFIG= +---- + +.. Check all the subscriptions present on the single-node cluster and look for the one in the policy you are trying to install through the `ClusterGroupUpgrade` CR by running the following command: ++ +[source,terminal] +---- +$ oc get subs -A | grep -i +---- ++ +.Example output for `cluster-logging` policy ++ +[source,terminal] +---- +NAMESPACE NAME PACKAGE SOURCE CHANNEL +openshift-logging cluster-logging cluster-logging redhat-operators stable +---- + +. If one of the managed policies includes a `ClusterVersion` CR, check the status of platform updates in the current batch by running the following command against the spoke cluster: ++ +[source,terminal] +---- +$ oc get clusterversion +---- ++ +.Example output ++ +[source,terminal] +---- +NAME VERSION AVAILABLE PROGRESSING SINCE STATUS +version 4.9.5 True True 43s Working towards 4.9.7: 71 of 735 done (9% complete) +---- + +. Check the Operator subscription by running the following command: ++ +[source,terminal] +---- +$ oc get subs -n -ojsonpath="{.status}" +---- + +. Check the install plans present on the single-node cluster that is associated with the desired subscription by running the following command: ++ +[source,terminal] +---- +$ oc get installplan -n +---- ++ +.Example output for `cluster-logging` Operator ++ +[source,terminal] +---- +NAMESPACE NAME CSV APPROVAL APPROVED +openshift-logging install-6khtw cluster-logging.5.3.3-4 Manual true <1> +---- +<1> The install plans have their `Approval` field set to `Manual` and their `Approved` field changes from `true` to `false` after {cgu-operator} approves the install plan. + +. Check if the cluster service version for the Operator of the policy that the `ClusterGroupUpgrade` is installing reached the `Succeeded` phase by running the following command: ++ +[source,terminal] +---- +$ oc get csv -n +---- ++ +.Example output for OpenShift Logging Operator ++ +[source,terminal] +---- +NAME DISPLAY VERSION REPLACES PHASE +cluster-logging.5.4.2 Red Hat OpenShift Logging 5.4.2 Succeeded +---- diff --git a/modules/cnf-topology-aware-lifecycle-manager-autocreate-cgu-cr-ztp.adoc b/modules/cnf-topology-aware-lifecycle-manager-autocreate-cgu-cr-ztp.adoc new file mode 100644 index 0000000000..8e73116513 --- /dev/null +++ b/modules/cnf-topology-aware-lifecycle-manager-autocreate-cgu-cr-ztp.adoc @@ -0,0 +1,63 @@ +// Module included in the following assemblies: +// Epic CNF-2600 (CNF-2133) (4.10), Story TELCODOCS-285 +// * scalability_and_performance/ztp-deploying-disconnected.adoc + +:_content-type: PROCEDURE +[id="talo-precache-autocreated-cgu-for-ztp_{context}"] += About the auto-created ClusterGroupUpgrade CR for ZTP + +{cgu-operator} has a controller called `ManagedClusterForCGU` that monitors the `Ready` state of the `ManagedCluster` CRs on the hub cluster and creates the `ClusterGroupUpgrade` CRs for ZTP (zero touch provisioning). + +For any managed cluster in the `Ready` state without a "ztp-done" label applied, the `ManagedClusterForCGU` controller automatically creates a `ClusterGroupUpgrade` CR in the `ztp-install` namespace with its associated {rh-rhacm} policies that are created during the ZTP process. {cgu-operator} then remediates the set of configuration policies that are listed in the auto-created `ClusterGroupUpgrade` CR to push the configuration CRs to the managed cluster. + +[NOTE] +==== +If the managed cluster has no bound policies when the cluster becomes `Ready`, no `ClusterGroupUpgrade` CR is created. +==== + +.Example of an auto-created `ClusterGroupUpgrade` CR for ZTP + +[source,yaml] +---- +apiVersion: ran.openshift.io/v1alpha1 +kind: ClusterGroupUpgrade +metadata: + generation: 1 + name: spoke1 + namespace: ztp-install + ownerReferences: + - apiVersion: cluster.open-cluster-management.io/v1 + blockOwnerDeletion: true + controller: true + kind: ManagedCluster + name: spoke1 + uid: 98fdb9b2-51ee-4ee7-8f57-a84f7f35b9d5 + resourceVersion: "46666836" + uid: b8be9cd2-764f-4a62-87d6-6b767852c7da +spec: + actions: + afterCompletion: + addClusterLabels: + ztp-done: "" <1> + deleteClusterLabels: + ztp-running: "" + deleteObjects: true + beforeEnable: + addClusterLabels: + ztp-running: "" <2> + clusters: + - spoke1 + enable: true + managedPolicies: + - common-spoke1-config-policy + - common-spoke1-subscriptions-policy + - group-spoke1-config-policy + - spoke1-config-policy + - group-spoke1-validator-du-policy + preCaching: false + remediationStrategy: + maxConcurrency: 1 + timeout: 240 +---- +<1> Applied to the managed cluster when {cgu-operator} completes the cluster configuration. +<2> Applied to the managed cluster when {cgu-operator} starts deploying the configuration policies. diff --git a/modules/cnf-topology-aware-lifecycle-manager-installation-cli.adoc b/modules/cnf-topology-aware-lifecycle-manager-installation-cli.adoc new file mode 100644 index 0000000000..64c5428dcc --- /dev/null +++ b/modules/cnf-topology-aware-lifecycle-manager-installation-cli.adoc @@ -0,0 +1,72 @@ +// Module included in the following assemblies: +// Epic CNF-2600 (CNF-2133) (4.10), Story TELCODOCS-285 +// * scalability_and_performance/cnf-talm-for-cluster-upgrades.adoc + +:_content-type: PROCEDURE +[id="installing-topology-aware-lifecycle-manager-using-cli_{context}"] += Installing the {cgu-operator-full} by using the CLI + +You can use the OpenShift CLI (`oc`) to install the {cgu-operator-first}. + +.Prerequisites + +* Install the OpenShift CLI (`oc`). +* Install the latest version of the {rh-rhacm} Operator. +* Set up a hub cluster with disconnected registry. +* Log in as a user with `cluster-admin` privileges. + +.Procedure + +. Create a `Subscription` CR: +.. Define the `Subscription` CR and save the YAML file, for example, `talm-subscription.yaml`: ++ +[source,yaml] +---- +apiVersion: operators.coreos.com/v1alpha1 +kind: Subscription +metadata: + name: openshift-topology-aware-lifecycle-manager-subscription + namespace: openshift-operators +spec: + channel: "stable" + name: topology-aware-lifecycle-manager + source: redhat-operators + sourceNamespace: openshift-marketplace +---- + +.. Create the `Subscription` CR by running the following command: ++ +[source,terminal] +---- +$ oc create -f talm-subscription.yaml +---- + +.Verification + +. Verify that the installation succeeded by inspecting the CSV resource: ++ +[source,terminal] +---- +$ oc get csv -n openshift-operators +---- ++ +.Example output +[source,terminal] +---- +NAME DISPLAY VERSION REPLACES PHASE +topology-aware-lifecycle-manager.4.10.0-202206301927 Topology Aware Lifecycle Manager 4.10.0-202206301927 Succeeded +---- + +. Verify that the {cgu-operator} is up and running: ++ +[source,terminal] +---- +$ oc get deploy -n openshift-operators +---- ++ +.Example output +[source,terminal] +---- +NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE +openshift-operators cluster-group-upgrades-controller-manager 1/1 1 1 14s +---- \ No newline at end of file diff --git a/modules/cnf-topology-aware-lifecycle-manager-installation-web-console.adoc b/modules/cnf-topology-aware-lifecycle-manager-installation-web-console.adoc new file mode 100644 index 0000000000..88ea0d4d24 --- /dev/null +++ b/modules/cnf-topology-aware-lifecycle-manager-installation-web-console.adoc @@ -0,0 +1,36 @@ +// Module included in the following assemblies: +// Epic CNF-2600 (CNF-2133) (4.10), Story TELCODOCS-285 +// * scalability_and_performance/cnf-talm-for-cluster-upgrades.adoc + +:_content-type: PROCEDURE +[id="installing-topology-aware-lifecycle-manager-using-web-console_{context}"] += Installing the {cgu-operator-full} by using the web console + +You can use the {product-title} web console to install the {cgu-operator-full}. + +.Prerequisites + +// Based on polarion test cases + +* Install the latest version of the {rh-rhacm} Operator. +* Set up a hub cluster with disconnected regitry. +* Log in as a user with `cluster-admin` privileges. + +.Procedure + +. In the {product-title} web console, navigate to *Operators* -> *OperatorHub*. +. Search for the *{cgu-operator-full}* from the list of available Operators, and then click *Install*. +. Keep the default selection of *Installation mode* ["All namespaces on the cluster (default)"] and *Installed Namespace* ("openshift-operators") to ensure that the Operator is installed properly. +. Click *Install*. + +.Verification + +To confirm that the installation is successful: + +. Navigate to the *Operators* -> *Installed Operators* page. +. Check that the Operator is installed in the `All Namespaces` namespace and its status is `Succeeded`. + +If the Operator is not installed successfully: + +. Navigate to the *Operators* -> *Installed Operators* page and inspect the `Status` column for any errors or failures. +. Navigate to the *Workloads* -> *Pods* page and check the logs in any containers in the `cluster-group-upgrades-controller-manager` pod that are reporting issues. diff --git a/modules/cnf-topology-aware-lifecycle-manager-operator-and-platform-update.adoc b/modules/cnf-topology-aware-lifecycle-manager-operator-and-platform-update.adoc new file mode 100644 index 0000000000..1d07e64921 --- /dev/null +++ b/modules/cnf-topology-aware-lifecycle-manager-operator-and-platform-update.adoc @@ -0,0 +1,136 @@ +// Module included in the following assemblies: +// Epic CNF-2600 (CNF-2133) (4.10), Story TELCODOCS-285 +// * scalability_and_performance/ztp-deploying-disconnected.adoc + +:_content-type: PROCEDURE +[id="talo-operator-and-platform-update_{context}"] += Performing a platform and an Operator update together + +You can perform a platform and an Operator update at the same time. + +.Prerequisites + +* Install the {cgu-operator-first}. +* Update ZTP to the latest version. +* Provision one or more managed clusters with ZTP. +* Log in as a user with `cluster-admin` privileges. +* Create {rh-rhacm} policies in the hub cluster. + +.Procedure + +. Create the `PolicyGenTemplate` CR for the updates by following the steps described in the "Performing a platform update" and "Performing an Operator update" sections. + +. Apply the prep work for the platform and the Operator update. + +.. Save the content of the `ClusterGroupUpgrade` CR with the policies for platform update preparation work, catalog source updates, and target clusters to the `cgu-platform-operator-upgrade-prep.yml` file, for example: ++ +[source,yaml] +---- +apiVersion: ran.openshift.io/v1alpha1 +kind: ClusterGroupUpgrade +metadata: + name: cgu-platform-operator-upgrade-prep + namespace: default +spec: + managedPolicies: + - du-upgrade-platform-upgrade-prep + - du-upgrade-operator-catsrc-policy + clusterSelector: + - group-du-sno + remediationStrategy: + maxConcurrency: 10 + enable: true +---- + +.. Apply the `cgu-platform-operator-upgrade-prep.yml` file to the hub cluster by running the following command: ++ +[source,terminal] +---- +$ oc apply -f cgu-platform-operator-upgrade-prep.yml +---- + +.. Monitor the process. Upon completion, ensure that the policy is compliant by running the following command: ++ +[source,terminal] +---- +$ oc get policies --all-namespaces +---- + +. Create the `ClusterGroupUpdate` CR for the platform and the Operator update with the `spec.enable` field set to `false`. +.. Save the contents of the platform and Operator update `ClusterGroupUpdate` CR with the policies and the target clusters to the `cgu-platform-operator-upgrade.yml` file, as shown in the following example: ++ +[source,yaml] +---- +apiVersion: ran.openshift.io/v1alpha1 +kind: ClusterGroupUpgrade +metadata: + name: cgu-du-upgrade + namespace: default +spec: + managedPolicies: + - du-upgrade-platform-upgrade <1> + - du-upgrade-operator-catsrc-policy <2> + - common-subscriptions-policy <3> + preCaching: true + clusterSelector: + - group-du-sno + remediationStrategy: + maxConcurrency: 1 + enable: false +---- +<1> This is the platform update policy. +<2> This is the policy containing the catalog source information for the Operators to be updated. It is needed for the pre-caching feature to determine which Operator images to download to the spoke cluster. +<3> This is the policy to update the Operators. + +.. Apply the `cgu-platform-operator-upgrade.yml` file to the hub cluster by running the following command: ++ +[source,terminal] +---- +$ oc apply -f cgu-platform-operator-upgrade.yml +---- + +. Optional: Pre-cache the images for the platform and the Operator update. +.. Enable pre-caching in the `ClusterGroupUpgrade` CR by running the following command: ++ +[source,terminal] +---- +$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-du-upgrade \ +--patch '{"spec":{"preCaching": true}}' --type=merge +---- + +.. Monitor the update process and wait for the pre-caching to complete. Check the status of pre-caching by running the following command on the spoke cluster: ++ +[source,terminal] +---- +$ oc get jobs,pods -n openshift-talm-pre-cache +---- + +.. Check if the pre-caching is completed before starting the update by running the following command: ++ +[source,terminal] +---- +$ oc get cgu cgu-du-upgrade -ojsonpath='{.status.conditions}' +---- + +. Start the platform and Operator update. +.. Enable the `cgu-du-upgrade` `ClusterGroupUpgrade` CR to start the platform and the Operator update by running the following command: ++ +[source,terminal] +---- +$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-du-upgrade \ +--patch '{"spec":{"enable":true, "preCaching": false}}' --type=merge +---- + +.. Monitor the process. Upon completion, ensure that the policy is compliant by running the following command: ++ +[source,terminal] +---- +$ oc get policies --all-namespaces +---- ++ +[NOTE] +==== +The CRs for the platform and Operator updates can be created from the beginning by configuring the setting to `spec.enable: true`. In this case, the update starts immediately after pre-caching completes and there is no need to manually enable the CR. + +Both pre-caching and the update create extra resources, such as policies, placement bindings, placement rules, managed cluster actions, and managed cluster view, to help complete the procedures. Setting the `afterCompletion.deleteObjects` field to `true` deletes all these resources after the updates complete. +==== \ No newline at end of file diff --git a/modules/cnf-topology-aware-lifecycle-manager-operator-update.adoc b/modules/cnf-topology-aware-lifecycle-manager-operator-update.adoc new file mode 100644 index 0000000000..d0ada7b05e --- /dev/null +++ b/modules/cnf-topology-aware-lifecycle-manager-operator-update.adoc @@ -0,0 +1,263 @@ +// Module included in the following assemblies: +// Epic CNF-2600 (CNF-2133) (4.10), Story TELCODOCS-285 +// * scalability_and_performance/ztp-deploying-disconnected.adoc + +:_content-type: PROCEDURE +[id="talo-operator-update_{context}"] += Performing an Operator update + +You can perform an Operator update with the {cgu-operator}. + +.Prerequisites + +* Install the {cgu-operator-first}. +* Update ZTP to the latest version. +* Provision one or more managed clusters with ZTP. +* Mirror the desired index image, bundle images, and all Operator images referenced in the bundle images. +* Log in as a user with `cluster-admin` privileges. +* Create {rh-rhacm} policies in the hub cluster. + +.Procedure + +. Update the `PolicyGenTemplate` CR for the Operator update. +.. Update the `du-upgrade` `PolicyGenTemplate` CR with the following additional contents in the `du-upgrade.yaml` file: ++ +[source,yaml] +---- +apiVersion: ran.openshift.io/v1 +kind: PolicyGenTemplate +metadata: + name: "du-upgrade" + namespace: "ztp-group-du-sno" +spec: + bindingRules: + group-du-sno: "" + mcp: "master" + remediationAction: inform + sourceFiles: + - fileName: DefaultCatsrc.yaml + remediationAction: inform + policyName: "operator-catsrc-policy" + metadata: + name: redhat-operators + spec: + displayName: Red Hat Operators Catalog + image: registry.example.com:5000/olm/redhat-operators:v4.10 <1> + updateStrategy: <2> + registryPoll: + interval: 1h +---- +<1> The index image URL contains the desired Operator images. If the index images are always pushed to the same image name and tag, this change is not needed. +<2> Set how frequently the Operator Lifecycle Manager (OLM) polls the index image for new Operator versions with the `registryPoll.interval` field. This change is not needed if a new index image tag is always pushed for y-stream and z-stream Operator updates. The `registryPoll.interval` field can be set to a shorter interval to expedite the update, however shorter intervals increase computational load. To counteract this, you can restore `registryPoll.interval` to the default value once the update is complete. + + +.. This update generates one policy, `du-upgrade-operator-catsrc-policy`, to update the `redhat-operators` catalog source with the new index images that contain the desired Operators images. ++ +[NOTE] +==== +If you want to use the image pre-caching for Operators and there are Operators from a different catalog source other than `redhat-operators`, you must perform the following tasks: + +* Prepare a separate catalog source policy with the new index image or registry poll interval update for the different catalog source. +* Prepare a separate subscription policy for the desired Operators that are from the different catalog source. +==== ++ +For example, the desired SRIOV-FEC Operator is available in the `certified-operators` catalog source. To update the catalog source and the Operator subscription, add the following contents to generate two policies, `du-upgrade-fec-catsrc-policy` and `du-upgrade-subscriptions-fec-policy`: ++ +[source,yaml] +---- +apiVersion: ran.openshift.io/v1 +kind: PolicyGenTemplate +metadata: + name: "du-upgrade" + namespace: "ztp-group-du-sno" +spec: + bindingRules: + group-du-sno: "" + mcp: "master" + remediationAction: inform + sourceFiles: + … + - fileName: DefaultCatsrc.yaml + remediationAction: inform + policyName: "fec-catsrc-policy" + metadata: + name: certified-operators + spec: + displayName: Intel SRIOV-FEC Operator + image: registry.example.com:5000/olm/far-edge-sriov-fec:v4.10 + updateStrategy: + registryPoll: + interval: 10m + - fileName: AcceleratorsSubscription.yaml + policyName: "subscriptions-fec-policy" + spec: + channel: "stable" + source: certified-operators +---- + +.. Remove the specified subscriptions channels in the common `PolicyGenTemplate` CR, if they exist. The default subscriptions channels from the ZTP image are used for the update. ++ +[NOTE] +==== +The default channel for the Operators applied through ZTP 4.10 is `stable`, except for the `performance-addon-operator`. The default channel for PAO is `4.10`. You can also specify the default channels in the common `PolicyGenTemplate` CR. +==== + +.. Push the `PolicyGenTemplate` CRs updates to the ZTP Git repository. ++ +ArgoCD pulls the changes from the Git repository and generates the policies on the hub cluster. + +.. Check the created policies by running the following command: ++ +[source,terminal] +---- +$ oc get policies -A | grep -E "catsrc-policy|subscription" +---- + +. Apply the required catalog source updates before starting the Operator update. + +.. Save the content of the `ClusterGroupUpgrade` CR named `operator-upgrade-prep` with the catalog source policies and the target spoke clusters to the `cgu-operator-upgrade-prep.yml` file: ++ +[source,yaml] +---- +apiVersion: ran.openshift.io/v1alpha1 +kind: ClusterGroupUpgrade +metadata: + name: cgu-operator-upgrade-prep + namespace: default +spec: + clusters: + - spoke1 + enable: true + managedPolicies: + - du-upgrade-operator-catsrc-policy + remediationStrategy: + maxConcurrency: 1 +---- + +.. Apply the policy to the hub cluster by running the following command: ++ +[source,terminal] +---- +$ oc apply -f cgu-operator-upgrade-prep.yml +---- + +.. Monitor the update process. Upon completion, ensure that the policy is compliant by running the following command: ++ +[source,terminal] +---- +$ oc get policies -A | grep -E "catsrc-policy" +---- + +. Create the `ClusterGroupUpgrade` CR for the Operator update with the `spec.enable` field set to `false`. +.. Save the content of the Operator update `ClusterGroupUpgrade` CR with the `du-upgrade-operator-catsrc-policy` policy and the subscription policies created from the common `PolicyGenTemplate` and the target clusters to the `cgu-operator-upgrade.yml` file, as shown in the following example: ++ +[source,yaml] +---- +apiVersion: ran.openshift.io/v1alpha1 +kind: ClusterGroupUpgrade +metadata: + name: cgu-operator-upgrade + namespace: default +spec: + managedPolicies: + - du-upgrade-operator-catsrc-policy <1> + - common-subscriptions-policy <2> + preCaching: false + clusters: + - spoke1 + remediationStrategy: + maxConcurrency: 1 + enable: false +---- +<1> The policy is needed by the image pre-caching feature to retrieve the operator images from the catalog source. +<2> The policy contains Operator subscriptions. If you have upgraded ZTP from 4.9 to 4.10 by following "Upgrade ZTP from 4.9 to 4.10", all Operator subscriptions are grouped into the `common-subscriptions-policy` policy. ++ +[NOTE] +==== +One `ClusterGroupUpgrade` CR can only pre-cache the images of the desired Operators defined in the subscription policy from one catalog source included in the `ClusterGroupUpgrade` CR. If the desired Operators are from different catalog sources, such as in the example of the SRIOV-FEC Operator, another `ClusterGroupUpgrade` CR must be created with `du-upgrade-fec-catsrc-policy` and `du-upgrade-subscriptions-fec-policy` policies for the SRIOV-FEC Operator images pre-caching and update. +==== + +.. Apply the `ClusterGroupUpgrade` CR to the hub cluster by running the following command: ++ +[source,terminal] +---- +$ oc apply -f cgu-operator-upgrade.yml +---- + +. Optional: Pre-cache the images for the Operator update. + +.. Before starting image pre-caching, verify the subscription policy is `NonCompliant` at this point by running the following command: ++ +[source,terminal] +---- +$ oc get policy common-subscriptions-policy -n +---- ++ +.Example output ++ +[source,terminal] +---- +NAME REMEDIATION ACTION COMPLIANCE STATE AGE +common-subscriptions-policy inform NonCompliant 27d +---- + +.. Enable pre-caching in the `ClusterGroupUpgrade` CR by running the following command: ++ +[source,terminal] +---- +$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-operator-upgrade \ +--patch '{"spec":{"preCaching": true}}' --type=merge +---- + +.. Monitor the process and wait for the pre-caching to complete. Check the status of pre-caching by running the following command on the spoke cluster: ++ +[source,terminal] +---- +$ oc get cgu cgu-operator-upgrade -o jsonpath='{.status.precaching.status}' +---- + +.. Check if the pre-caching is completed before starting the update by running the following command: ++ +[source,terminal] +---- +$ oc get cgu -n default cgu-operator-upgrade -ojsonpath='{.status.conditions}' | jq +---- ++ +.Example output ++ +[source,json] +---- +[ + { + "lastTransitionTime": "2022-03-08T20:49:08.000Z", + "message": "The ClusterGroupUpgrade CR is not enabled", + "reason": "UpgradeNotStarted", + "status": "False", + "type": "Ready" + }, + { + "lastTransitionTime": "2022-03-08T20:55:30.000Z", + "message": "Precaching is completed", + "reason": "PrecachingCompleted", + "status": "True", + "type": "PrecachingDone" + } +] +---- + +. Start the Operator update. + +.. Enable the `cgu-operator-upgrade` `ClusterGroupUpgrade` CR and disable pre-caching to start the Operator update by running the following command: ++ +[source,terminal] +---- +$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-operator-upgrade \ +--patch '{"spec":{"enable":true, "preCaching": false}}' --type=merge +---- + +.. Monitor the process. Upon completion, ensure that the policy is compliant by running the following command: ++ +[source,terminal] +---- +$ oc get policies --all-namespaces +---- \ No newline at end of file diff --git a/modules/cnf-topology-aware-lifecycle-manager-platform-update.adoc b/modules/cnf-topology-aware-lifecycle-manager-platform-update.adoc new file mode 100644 index 0000000000..0500f161d3 --- /dev/null +++ b/modules/cnf-topology-aware-lifecycle-manager-platform-update.adoc @@ -0,0 +1,196 @@ +// Module included in the following assemblies: +// Epic CNF-2600 (CNF-2133) (4.10), Story TELCODOCS-285 +// * scalability_and_performance/ztp-deploying-disconnected.adoc + +:_content-type: PROCEDURE +[id="talo-platform-update_{context}"] += Performing a platform update + +You can perform a platform update with the {cgu-operator}. + +.Prerequisites + +* Install the {cgu-operator-first}. +* Update ZTP to the latest version. +* Provision one or more managed clusters with ZTP. +* Mirror the desired image repository. +* Log in as a user with `cluster-admin` privileges. +* Create {rh-rhacm} policies in the hub cluster. + +.Procedure + +. Create a `PolicyGenTemplate` CR for the platform update: +.. Save the following contents of the `PolicyGenTemplate` CR in the `du-upgrade.yaml` file. ++ +.Example of `PolicyGenTemplate` for platform update ++ +[source,yaml] +---- +apiVersion: ran.openshift.io/v1 +kind: PolicyGenTemplate +metadata: + name: "du-upgrade" + namespace: "ztp-group-du-sno" +spec: + bindingRules: + group-du-sno: "" + mcp: "master" + remediationAction: inform + sourceFiles: + - fileName: ImageSignature.yaml <1> + policyName: "platform-upgrade-prep" + binaryData: + ${DIGEST_ALGO}-${DIGEST_ENCODED}: ${SIGNATURE_BASE64} <2> + - fileName: DisconnectedICSP.yaml + policyName: "platform-upgrade-prep" + metadata: + name: disconnected-internal-icsp-for-ocp + spec: + repositoryDigestMirrors: <3> + - mirrors: + - quay-intern.example.com/ocp4/openshift-release-dev + source: quay.io/openshift-release-dev/ocp-release + - mirrors: + - quay-intern.example.com/ocp4/openshift-release-dev + source: quay.io/openshift-release-dev/ocp-v4.0-art-dev + - fileName: ClusterVersion.yaml <4> + policyName: "platform-upgrade-prep" + metadata: + name: version + annotations: + ran.openshift.io/ztp-deploy-wave: "1" + spec: + channel: "stable-4.10" + upstream: http://upgrade.example.com/images/upgrade-graph_stable-4.10 + - fileName: ClusterVersion.yaml <5> + policyName: "platform-upgrade" + metadata: + name: version + spec: + channel: "stable-4.10" + upstream: http://upgrade.example.com/images/upgrade-graph_stable-4.10 + desiredUpdate: + version: 4.10.4 + status: + history: + - version: 4.10.4 + state: "Completed" +---- +<1> The `ConfigMap` CR contains the signature of the desired release image to update to. +<2> Shows the image signature of the desired {product-title} release. Get the signature from the `checksum-${OCP_RELASE_NUMBER}.yaml` file you saved when following the procedures in the "Setting up the environment" section. +<3> Shows the mirror repository that contains the desired {product-title} image. Get the mirrors from the `imageContentSources.yaml` file that you saved when following the procedures in the "Setting up the environment" section. +<4> Shows the `ClusterVersion` CR to update upstream. +<5> Shows the `ClusterVersion` CR to trigger the update. The `channel`, `upstream`, and `desiredVersion` fields are all required for image pre-caching. ++ +The `PolicyGenTemplate` CR generates two policies: + +* The `du-upgrade-platform-upgrade-prep` policy does the preparation work for the platform update. It creates the `ConfigMap` CR for the desired release image signature, creates the image content source of the mirrored release image repository, and updates the cluster version with the desired update channel and the update graph reachable by the spoke cluster in the disconnected environment. + +* The `du-upgrade-platform-upgrade` policy is used to perform platform upgrade. + +.. Add the `du-upgrade.yaml` file contents to the `kustomization.yaml` file located in the ZTP Git repository for the `PolicyGenTemplate` CRs and push the changes to the Git repository. ++ +ArgoCD pulls the changes from the Git repository and generates the policies on the hub cluster. + +.. Check the created policies by running the following command: ++ +[source,terminal] +---- +$ oc get policies -A | grep platform-upgrade +---- + +. Apply the required update resources before starting the platform update with the {cgu-operator}. + +.. Save the content of the `platform-upgrade-prep` `ClusterUpgradeGroup` CR with the `du-upgrade-platform-upgrade-prep` policy and the target spoke clusters to the `cgu-platform-upgrade-prep.yml` file, as shown in the following example: ++ +[source,yaml] +---- +apiVersion: ran.openshift.io/v1alpha1 +kind: ClusterGroupUpgrade +metadata: + name: cgu-platform-upgrade-prep + namespace: default +spec: + managedPolicies: + - du-upgrade-platform-upgrade-prep + clusters: + - spoke1 + remediationStrategy: + maxConcurrency: 1 + enable: true +---- + +.. Apply the policy to the hub cluster by running the following command: ++ +[source,terminal] +---- +$ oc apply -f cgu-platform-upgrade-prep.yml +---- + +.. Monitor the update process. Upon completion, ensure that the policy is compliant by running the following command: ++ +[source,terminal] +---- +$ oc get policies --all-namespaces +---- + +. Create the `ClusterGroupUpdate` CR for the platform update with the `spec.enable` field set to `false`. + +.. Save the content of the platform update `ClusterGroupUpdate` CR with the `du-upgrade-platform-upgrade` policy and the target clusters to the `cgu-platform-upgrade.yml` file, as shown in the following example: ++ +[source,yaml] +---- +apiVersion: ran.openshift.io/v1alpha1 +kind: ClusterGroupUpgrade +metadata: + name: cgu-platform-upgrade + namespace: default +spec: + managedPolicies: + - du-upgrade-platform-upgrade + preCaching: false + clusters: + - spoke1 + remediationStrategy: + maxConcurrency: 1 + enable: false +---- + +.. Apply the `ClusterGroupUpdate` CR to the hub cluster by running the following command: ++ +[source,terminal] +---- +$ oc apply -f cgu-platform-upgrade.yml +---- + +. Optional: Pre-cache the images for the platform update. +.. Enable pre-caching in the `ClusterGroupUpdate` CR by running the following command: ++ +[source,terminal] +---- +$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-platform-upgrade \ +--patch '{"spec":{"preCaching": true}}' --type=merge +---- + +.. Monitor the update process and wait for the pre-caching to complete. Check the status of pre-caching by running the following command on the hub cluster: ++ +[source,terminal] +---- +$ oc get cgu cgu-platform-upgrade -o jsonpath='{.status.precaching.status}' +---- + +. Start the platform update: +.. Enable the `cgu-platform-upgrade` policy and disable pre-caching by running the following command: ++ +[source,terminal] +---- +$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-platform-upgrade \ +--patch '{"spec":{"enable":true, "preCaching": false}}' --type=merge +---- + +.. Monitor the process. Upon completion, ensure that the policy is compliant by running the following command: ++ +[source,terminal] +---- +$ oc get policies --all-namespaces +---- \ No newline at end of file diff --git a/modules/cnf-topology-aware-lifecycle-manager-policies-concept.adoc b/modules/cnf-topology-aware-lifecycle-manager-policies-concept.adoc new file mode 100644 index 0000000000..fa940f519b --- /dev/null +++ b/modules/cnf-topology-aware-lifecycle-manager-policies-concept.adoc @@ -0,0 +1,19 @@ +// Module included in the following assemblies: +// Epic CNF-2600 (CNF-2133) (4.10), Story TELCODOCS-285 +// * scalability_and_performance/cnf-talm-for-cluster-upgrades.adoc + +:_content-type: CONCEPT +[id="talo-policies-concept_{context}"] += Update policies on managed clusters + +The {cgu-operator-first} remediates a set of `inform` policies for the clusters specified in the `ClusterGroupUpgrade` CR. {cgu-operator} remediates `inform` policies by making `enforce` copies of the managed {rh-rhacm} policies. Each copied policy has its own corresponding {rh-rhacm} placement rule and {rh-rhacm} placement binding. + +One by one, {cgu-operator} adds each cluster from the current batch to the placement rule that corresponds with the applicable managed policy. If a cluster is already compliant with a policy, {cgu-operator} skips applying that policy on the compliant cluster. {cgu-operator} then moves on to applying the next policy to the non-compliant cluster. After {cgu-operator} completes the updates in a batch, all clusters are removed from the placement rules associated with the copied policies. Then, the update of the next batch starts. + +If a spoke cluster does not report any compliant state to {rh-rhacm}, the managed policies on the hub cluster can be missing status information that {cgu-operator} needs. {cgu-operator} handles these cases in the following ways: + +* If a policy's `status.compliant` field is missing, {cgu-operator} ignores the policy and adds a log entry. Then, {cgu-operator} continues looking at the policy's `status.status` field. +* If a policy's `status.status` is missing, {cgu-operator} produces an error. +* If a cluster's compliance status is missing in the policy's `status.status` field, {cgu-operator} considers that cluster to be non-compliant with that policy. + +For more information about {rh-rhacm} policies, see link:https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes/2.4/html-single/governance/index#policy-overview[Policy overview]. diff --git a/modules/cnf-topology-aware-lifecycle-manager-precache-concept.adoc b/modules/cnf-topology-aware-lifecycle-manager-precache-concept.adoc new file mode 100644 index 0000000000..70dd4b5cfa --- /dev/null +++ b/modules/cnf-topology-aware-lifecycle-manager-precache-concept.adoc @@ -0,0 +1,28 @@ +// Module included in the following assemblies: +// Epic CNF-2600 (CNF-2133) (4.10), Story TELCODOCS-285 +// * scalability_and_performance/cnf-talm-for-cluster-upgrades.adoc + +:_content-type: CONCEPT +[id="talo-precache-feature-concept_{context}"] += Using the container image pre-cache feature + +Clusters might have limited bandwidth to access the container image registry, which can cause a timeout before the updates are completed. + +[NOTE] +==== +The time of the update is not set by {cgu-operator}. You can apply the `ClusterGroupUpgrade` CR at the beginning of the update by manual application or by external automation. +==== + +The container image pre-caching starts when the `preCaching` field is set to `true` in the `ClusterGroupUpgrade` CR. After a successful pre-caching process, you can start remediating policies. The remediation actions start when the `enable` field is set to `true`. + +The pre-caching process can be in the following statuses: + +`PrecacheNotStarted`:: This is the initial state all clusters are automatically assigned to on the first reconciliation pass of the `ClusterGroupUpgrade` CR. ++ +In this state, {cgu-operator} deletes any pre-caching namespace and hub view resources of spoke clusters that remain from previous incomplete updates. {cgu-operator} then creates a new `ManagedClusterView` resource for the spoke pre-caching namespace to verify its deletion in the `PrecachePreparing` state. +`PrecachePreparing`:: Cleaning up any remaining resources from previous incomplete updates is in progress. +`PrecacheStarting`:: Pre-caching job prerequisites and the job are created. +`PrecacheActive`:: The job is in "Active" state. +`PrecacheSucceeded`:: The pre-cache job has succeeded. +`PrecacheTimeout`:: The artifact pre-caching has been partially done. +`PrecacheUnrecoverableError`:: The job ends with a non-zero exit code. diff --git a/modules/cnf-topology-aware-lifecycle-manager-precache-feature.adoc b/modules/cnf-topology-aware-lifecycle-manager-precache-feature.adoc new file mode 100644 index 0000000000..7a86ecace8 --- /dev/null +++ b/modules/cnf-topology-aware-lifecycle-manager-precache-feature.adoc @@ -0,0 +1,161 @@ +// Module included in the following assemblies: +// Epic CNF-2600 (CNF-2133) (4.10), Story TELCODOCS-285 +// * scalability_and_performance/cnf-talm-for-cluster-upgrades.adoc + +:_content-type: PROCEDURE +[id="talo-precache-start_and_update_{context}"] += Creating a ClusterGroupUpgrade CR with pre-caching + +The pre-cache feature allows the required container images to be present on the spoke cluster before the update starts. + +.Prerequisites + +* Install the {cgu-operator-first}. +* Provision one or more managed clusters. +* Log in as a user with `cluster-admin` privileges. + +.Procedure + +. Save the contents of the `ClusterGroupUpgrade` CR with the `preCaching` field set to `true` in the `clustergroupupgrades-group-du.yaml` file: ++ +[source,yaml] +---- +apiVersion: ran.openshift.io/v1alpha1 +kind: ClusterGroupUpgrade +metadata: + name: du-upgrade-4918 + namespace: ztp-group-du-sno +spec: + preCaching: true <1> + clusters: + - cnfdb1 + - cnfdb2 + enable: false + managedPolicies: + - du-upgrade-platform-upgrade + remediationStrategy: + maxConcurrency: 2 + timeout: 240 +---- +<1> The `preCaching` field is set to `true`, which enables {cgu-operator} to pull the container images before starting the update. + +. When you want to start the update, apply the `ClusterGroupUpgrade` CR by running the following command: ++ +[source,terminal] +---- +$ oc apply -f clustergroupupgrades-group-du.yaml +---- + +.Verification + +. Check if the `ClusterGroupUpgrade` CR exists in the hub cluster by running the following command: ++ +[source,terminal] +---- +$ oc get cgu -A +---- ++ +.Example output ++ +[source,terminal] +---- +NAMESPACE NAME AGE +ztp-group-du-sno du-upgrade-4918 10s <1> +---- +<1> The CR is created. + +. Check the status of the pre-caching task by running the following command: ++ +[source,terminal] +---- +$ oc get cgu -n ztp-group-du-sno du-upgrade-4918 -o jsonpath='{.status}' +---- ++ +.Example output ++ +[source,json] +---- +{ + "conditions": [ + { + "lastTransitionTime": "2022-01-27T19:07:24Z", + "message": "Precaching is not completed (required)", <1> + "reason": "PrecachingRequired", + "status": "False", + "type": "Ready" + }, + { + "lastTransitionTime": "2022-01-27T19:07:24Z", + "message": "Precaching is required and not done", + "reason": "PrecachingNotDone", + "status": "False", + "type": "PrecachingDone" + }, + { + "lastTransitionTime": "2022-01-27T19:07:34Z", + "message": "Pre-caching spec is valid and consistent", + "reason": "PrecacheSpecIsWellFormed", + "status": "True", + "type": "PrecacheSpecValid" + } + ], + "precaching": { + "clusters": [ + "cnfdb1" <2> + ], + "spec": { + "platformImage": "image.example.io"}, + "status": { + "cnfdb1": "Active"} + } +} +---- +<1> Displays that the update is in progress. +<2> Displays the list of identified clusters. + +. Check the status of the pre-caching job by running the following command on the spoke cluster: ++ +[source,terminal] +---- +$ oc get jobs,pods -n openshift-talm-pre-cache +---- ++ +.Example output ++ +[source,terminal] +---- +NAME COMPLETIONS DURATION AGE +job.batch/pre-cache 0/1 3m10s 3m10s + +NAME READY STATUS RESTARTS AGE +pod/pre-cache--1-9bmlr 1/1 Running 0 3m10s +---- + + . Check the status of the `ClusterGroupUpgrade` CR by running the following command: ++ +[source,terminal] +---- +$ oc get cgu -n ztp-group-du-sno du-upgrade-4918 -o jsonpath='{.status}' +---- ++ +.Example output ++ +[source,json] +---- +"conditions": [ + { + "lastTransitionTime": "2022-01-27T19:30:41Z", + "message": "The ClusterGroupUpgrade CR has all clusters compliant with all the managed policies", + "reason": "UpgradeCompleted", + "status": "True", + "type": "Ready" + }, + { + "lastTransitionTime": "2022-01-27T19:28:57Z", + "message": "Precaching is completed", + "reason": "PrecachingCompleted", + "status": "True", + "type": "PrecachingDone" <1> + } +---- +<1> The pre-cache tasks are done. diff --git a/modules/cnf-topology-aware-lifecycle-manager-preparing-for-updates.adoc b/modules/cnf-topology-aware-lifecycle-manager-preparing-for-updates.adoc new file mode 100644 index 0000000000..e592414996 --- /dev/null +++ b/modules/cnf-topology-aware-lifecycle-manager-preparing-for-updates.adoc @@ -0,0 +1,107 @@ +// Module included in the following assemblies: +// Epic CNF-2600 (CNF-2133) (4.10), Story TELCODOCS-285 +// * scalability_and_performance/ztp-deploying-disconnected.adoc + +:_content-type: PROCEDURE +[id="talo-platform-prepare-end-to-end_{context}"] += End-to-end procedures for updating clusters in a disconnected environment + +If you have deployed spoke clusters with distributed unit (DU) profiles using the GitOps ZTP with the {cgu-operator-first} pipeline described in "Deploying distributed units at scale in a disconnected environment", this procedure describes how to upgrade your spoke clusters and Operators. + +[id="talo-platform-prepare-for-update_{context}"] +== Preparing for the updates + +If both the hub and the spoke clusters are running {product-title} 4.9, you must update ZTP from version 4.9 to 4.10. If {product-title} 4.10 is used, you can set up the environment. + +[id="talo-platform-prepare-for-update-env-setup_{context}"] +== Setting up the environment + +{cgu-operator} can perform both platform and Operator updates. + +You must mirror both the platform image and Operator images that you want to update to in your mirror registry before you can use {cgu-operator} to update your disconnected clusters. Complete the following steps to mirror the images: + +* For platform updates, you must perform the following steps: ++ +. Mirror the desired {product-title} image repository. Ensure that the desired platform image is mirrored by following the "Mirroring the {product-title} image repository" procedure linked in the Additional Resources. Save the contents of the `imageContentSources` section in the `imageContentSources.yaml` file: ++ +.Example output +[source,yaml] +---- +imageContentSources: + - mirrors: + - mirror-ocp-registry.ibmcloud.io.cpak:5000/openshift-release-dev/openshift4 + source: quay.io/openshift-release-dev/ocp-release + - mirrors: + - mirror-ocp-registry.ibmcloud.io.cpak:5000/openshift-release-dev/openshift4 + source: quay.io/openshift-release-dev/ocp-v4.0-art-dev +---- + +. Save the image signature of the desired platform image that was mirrored. You must add the image signature to the `PolicyGenTemplate` CR for platform updates. To get the image signature, perform the following steps: + +.. Specify the desired {product-title} tag by running the following command: ++ +[source,terminal] +---- +$ OCP_RELEASE_NUMBER= +---- + +.. Specify the architecture of the server by running the following command: ++ +[source,terminal] +---- +$ ARCHITECTURE= +---- + +.. Get the release image digest from Quay by running the following command ++ +[source,terminal] +---- +$ DIGEST="$(oc adm release info quay.io/openshift-release-dev/ocp-release:${OCP_RELEASE_NUMBER}-${ARCHITECTURE} | sed -n 's/Pull From: .*@//p')" +---- + +.. Set the digest algorithm by running the following command: ++ +[source,terminal] +---- +$ DIGEST_ALGO="${DIGEST%%:*}" +---- + +.. Set the digest signature by running the following command: ++ +[source,terminal] +---- +$ DIGEST_ENCODED="${DIGEST#*:}" +---- + +.. Get the image signature from the link:https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release/[mirror.openshift.com] website by running the following command: ++ +[source,terminal] +---- +$ SIGNATURE_BASE64=$(curl -s "https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release/${DIGEST_ALGO}=${DIGEST_ENCODED}/signature-1" | base64 -w0 && echo) +---- + +.. Save the image signature to the `checksum-.yaml` file by running the following commands: ++ +[source,terminal] +---- +$ cat >checksum-${OCP_RELEASE_NUMBER}.yaml <> +** <> +** <> +** <> + +To ensure that the `ClusterGroupUpgrade` configuration is functional, you can do the following: + +. Create the `ClusterGroupUpgrade` CR with the `spec.enable` field set to `false`. + +. Wait for the status to be updated and go through the troubleshooting questions. + +. If everything looks as expected, set the `spec.enable` field to `true` in the `ClusterGroupUpgrade` CR. + +[WARNING] +==== +After you set the `spec.enable` field to `true` in the `ClusterUpgradeGroup` CR , the update procedure starts and you cannot edit the CR's `spec` fields anymore. +==== + +[id="talo-troubleshooting-modify-cgu_{context}"] +== Cannot modify the ClusterUpgradeGroup CR + +Issue:: You cannot edit the `ClusterUpgradeGroup` CR after enabling the update. + +Resolution:: Restart the procedure by performing the following steps: ++ +. Remove the old `ClusterGroupUpgrade` CR by running the following command: ++ +[source,terminal] +---- +$ oc delete cgu -n +---- ++ +. Check and fix the existing issues with the managed clusters and policies. +.. Ensure that all the clusters are managed clusters and available. +.. Ensure that all the policies exist and have the `spec.remediationAction` field set to `inform`. ++ +. Create a new `ClusterGroupUpgrade` CR with the correct configurations. ++ +[source,terminal] +---- +$ oc apply -f +---- + +[id="talo-troubleshooting-managed-policies_{context}"] +== Managed policies + +[discrete] +== Checking managed policies on the system + +Issue:: You want to check if you have the correct managed policies on the system. + +Resolution:: Run the following command: ++ +[source,terminal] +---- +$ oc get cgu lab-upgrade -ojsonpath='{.spec.managedPolicies}' +---- ++ +.Example output ++ +[source,json] +---- +["group-du-sno-validator-du-validator-policy", "policy2-common-pao-sub-policy", "policy3-common-ptp-sub-policy"] +---- + +[discrete] +== Checking remediationAction mode + +Issue:: You want to check if the `remediationAction` field is set to `inform` in the `spec` of the managed policies. + +Resolution:: Run the following command: ++ +[source,terminal] +---- +$ oc get policies --all-namespaces +---- ++ +.Example output ++ +[source,terminal] +---- +NAMESPACE NAME REMEDIATION ACTION COMPLIANCE STATE AGE +default policy1-common-cluster-version-policy inform NonCompliant 5d21h +default policy2-common-pao-sub-policy inform Compliant 5d21h +default policy3-common-ptp-sub-policy inform NonCompliant 5d21h +default policy4-common-sriov-sub-policy inform NonCompliant 5d21h +---- + +[discrete] +== Checking policy compliance state + +Issue:: You want to check the compliance state of policies. + +Resolution:: Run the following command: ++ +[source,terminal] +---- +$ oc get policies --all-namespaces +---- ++ +.Example output ++ +[source,terminal] +---- +NAMESPACE NAME REMEDIATION ACTION COMPLIANCE STATE AGE +default policy1-common-cluster-version-policy inform NonCompliant 5d21h +default policy2-common-pao-sub-policy inform Compliant 5d21h +default policy3-common-ptp-sub-policy inform NonCompliant 5d21h +default policy4-common-sriov-sub-policy inform NonCompliant 5d21h +---- + +[id="talo-troubleshooting-clusters_{context}"] +== Clusters + +[discrete] +=== Checking if managed clusters are present + +Issue:: You want to check if the clusters in the `ClusterGroupUpgrade` CR are managed clusters. + +Resolution:: Run the following command: ++ +[source,terminal] +---- +$ oc get managedclusters +---- ++ +.Example output ++ +[source,terminal] +---- +NAME HUB ACCEPTED MANAGED CLUSTER URLS JOINED AVAILABLE AGE +local-cluster true https://api.hub.example.com:6443 True Unknown 13d +spoke1 true https://api.spoke1.example.com:6443 True True 13d +spoke3 true https://api.spoke3.example.com:6443 True True 27h +---- + +. Alternatively, check the {cgu-operator} manager logs: + +.. Get the name of the {cgu-operator} manager by running the following command: ++ +[source,terminal] +---- +$ oc get pod -n openshift-operators +---- ++ +.Example output ++ +[source,terminal] +---- +NAME READY STATUS RESTARTS AGE +cluster-group-upgrades-controller-manager-75bcc7484d-8k8xp 2/2 Running 0 45m +---- + +.. Check the {cgu-operator} manager logs by running the following command: ++ +[source,terminal] +---- +$ oc logs -n openshift-operators \ +cluster-group-upgrades-controller-manager-75bcc7484d-8k8xp -c manager +---- ++ +.Example output ++ +[source,terminal] +---- +ERROR controller-runtime.manager.controller.clustergroupupgrade Reconciler error {"reconciler group": "ran.openshift.io", "reconciler kind": "ClusterGroupUpgrade", "name": "lab-upgrade", "namespace": "default", "error": "Cluster spoke5555 is not a ManagedCluster"} <1> +sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem +---- +<1> The error message shows that the cluster is not a managed cluster. + +[discrete] +=== Checking if managed clusters are available + +Issue:: You want to check if the managed clusters specified in the `ClusterGroupUpgrade` CR are available. + +Resolution:: Run the following command: ++ +[source,terminal] +---- +$ oc get managedclusters +---- ++ +.Example output ++ +[source,terminal] +---- +NAME HUB ACCEPTED MANAGED CLUSTER URLS JOINED AVAILABLE AGE +local-cluster true https://api.hub.testlab.com:6443 True Unknown 13d +spoke1 true https://api.spoke1.testlab.com:6443 True True 13d <1> +spoke3 true https://api.spoke3.testlab.com:6443 True True 27h <1> +---- +<1> The value of the `AVAILABLE` field is `True` for the managed clusters. + +[discrete] +=== Checking clusterSelector + +Issue:: You want to check if the `clusterSelector` field is specified in the `ClusterGroupUpgrade` CR in at least one of the managed clusters. + +Resolution:: Run the following command: ++ +[source,terminal] +---- +$ oc get managedcluster --selector=upgrade=true <1> +---- +<1> The label for the clusters you want to update is `upgrade:true`. ++ +.Example output ++ +[source,terminal] +---- +NAME HUB ACCEPTED MANAGED CLUSTER URLS JOINED AVAILABLE AGE +spoke1 true https://api.spoke1.testlab.com:6443 True True 13d +spoke3 true https://api.spoke3.testlab.com:6443 True True 27h +---- + +[discrete] +=== Checking if canary clusters are present + +Issue:: You want to check if the canary clusters are present in the list of clusters. ++ +.Example `ClusterGroupUpgrade` CR +[source,yaml] +---- +spec: + clusters: + - spoke1 + - spoke3 + clusterSelector: + - upgrade2=true + remediationStrategy: + canaries: + - spoke3 + maxConcurrency: 2 + timeout: 240 +---- + +Resolution:: Run the following commands: ++ +[source,terminal] +---- +$ oc get cgu lab-upgrade -ojsonpath='{.spec.clusters}' +---- ++ +.Example output ++ +[source,json] +---- +["spoke1", "spoke3"] +---- + +. Check if the canary clusters are present in the list of clusters that match `clusterSelector` labels by running the following command: ++ +[source,terminal] +---- +$ oc get managedcluster --selector=upgrade=true +---- ++ +.Example output ++ +[source,terminal] +---- +NAME HUB ACCEPTED MANAGED CLUSTER URLS JOINED AVAILABLE AGE +spoke1 true https://api.spoke1.testlab.com:6443 True True 13d +spoke3 true https://api.spoke3.testlab.com:6443 True True 27h +---- + +[NOTE] +==== +A cluster can be present in `spec.clusters` and also be matched by the `spec.clusterSelecter` label. +==== + +[discrete] +=== Checking the pre-caching status on spoke clusters + +. Check the status of pre-caching by running the following command on the spoke cluster: ++ +[source,terminal] +---- +$ oc get jobs,pods -n openshift-talo-pre-cache +---- + +[id="talo-troubleshooting-remediation-strategy_{context}"] +== Remediation Strategy + +[discrete] +=== Checking if remediationStrategy is present in the ClusterGroupUpgrade CR + +Issue:: You want to check if the `remediationStrategy` is present in the `ClusterGroupUpgrade` CR. + +Resolution:: Run the following command: ++ +[source,terminal] +---- +$ oc get cgu lab-upgrade -ojsonpath='{.spec.remediationStrategy}' +---- ++ +.Example output ++ +[source,json] +---- +{"maxConcurrency":2, "timeout":240} +---- + +[discrete] +=== Checking if maxConcurrency is specified in the ClusterGroupUpgrade CR + +Issue:: You want to check if the `maxConcurrency` is specified in the `ClusterGroupUpgrade` CR. + +Resolution:: Run the following command: ++ +[source,terminal] +---- +$ oc get cgu lab-upgrade -ojsonpath='{.spec.remediationStrategy.maxConcurrency}' +---- ++ +.Example output ++ +[source,terminal] +---- +2 +---- + +[id="talo-troubleshooting-remediation-talo_{context}"] +== {cgu-operator-full} + +[discrete] +=== Checking condition message and status in the ClusterGroupUpgrade CR + +Issue:: You want to check the value of the `status.conditions` field in the `ClusterGroupUpgrade` CR. + +Resolution:: Run the following command: ++ +[source,terminal] +---- +$ oc get cgu lab-upgrade -ojsonpath='{.status.conditions}' +---- ++ +.Example output ++ +[source,json] +---- +{"lastTransitionTime":"2022-02-17T22:25:28Z", "message":"The ClusterGroupUpgrade CR has managed policies that are missing:[policyThatDoesntExist]", "reason":"UpgradeCannotStart", "status":"False", "type":"Ready"} +---- + +[discrete] +=== Checking corresponding copied policies + +Issue:: You want to check if every policy from `status.managedPoliciesForUpgrade` has a corresponding policy in `status.copiedPolicies`. + +Resolution:: Run the following command: ++ +[source,terminal] +---- +$ oc get cgu lab-upgrade -oyaml +---- ++ +.Example output ++ +[source,yaml] +---- +status: + … + copiedPolicies: + - lab-upgrade-policy3-common-ptp-sub-policy + managedPoliciesForUpgrade: + - name: policy3-common-ptp-sub-policy + namespace: default +---- + +[discrete] +=== Checking if status.remediationPlan was computed + +Issue:: You want to check if `status.remediationPlan` is computed. + +Resolution:: Run the following command: ++ +[source,terminal] +---- +$ oc get cgu lab-upgrade -ojsonpath='{.status.remediationPlan}' +---- ++ +.Example output ++ +[source,json] +---- +[["spoke2", "spoke3"]] +---- + +[discrete] +=== Errors in the {cgu-operator} manager container + +Issue:: You want to check the logs of the manager container of {cgu-operator}. + +Resolution:: Run the following command: ++ +[source,terminal] +---- +$ oc logs -n openshift-operators \ +cluster-group-upgrades-controller-manager-75bcc7484d-8k8xp -c manager +---- ++ +.Example output ++ +[source,terminal] +---- +ERROR controller-runtime.manager.controller.clustergroupupgrade Reconciler error {"reconciler group": "ran.openshift.io", "reconciler kind": "ClusterGroupUpgrade", "name": "lab-upgrade", "namespace": "default", "error": "Cluster spoke5555 is not a ManagedCluster"} <1> +sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem +---- +<1> Displays the error. diff --git a/modules/cnf-topology-aware-lifecycle-manager.adoc b/modules/cnf-topology-aware-lifecycle-manager.adoc new file mode 100644 index 0000000000..73009a7608 --- /dev/null +++ b/modules/cnf-topology-aware-lifecycle-manager.adoc @@ -0,0 +1,15 @@ +// Module included in the following assemblies: +// Epic CNF-2600 (CNF-2133) (4.10), Story TELCODOCS-285 +// scalability_and_performance/ztp-deploying-disconnected.adoc + +:_content-type: CONCEPT +[id="cnf-topology-aware-lifecycle-manager"] += Updating managed policies with the {cgu-operator-full} +include::../_attributes/common-attributes.adoc[] +//:context: cnf-topology-aware-lifecycle-manager + +You can use the {cgu-operator-first} to manage the software lifecycle of multiple OpenShift clusters. {cgu-operator} uses {rh-rhacm-first} policies to perform changes on the target clusters. + +:FeatureName: The Cluster Group Upgrades Operator + +include::snippets/technology-preview.adoc[] diff --git a/modules/hw-installing-amq-interconnect-messaging-bus.adoc b/modules/hw-installing-amq-interconnect-messaging-bus.adoc new file mode 100644 index 0000000000..4ace875f6c --- /dev/null +++ b/modules/hw-installing-amq-interconnect-messaging-bus.adoc @@ -0,0 +1,52 @@ +// Module included in the following assemblies: +// +// * monitoring/using-rfhe.adoc + +:_content-type: PROCEDURE +[id="hw-installing-amq-interconnect-messaging-bus_{context}"] += Installing the AMQ messaging bus + +To pass Redfish bare-metal event notifications between publisher and subscriber on a node, you must install and configure an AMQ messaging bus to run locally on the node. You do this by installing the AMQ Interconnect Operator for use in the cluster. + +.Prerequisites + +* Install the {product-title} CLI (`oc`). +* Log in as a user with `cluster-admin` privileges. + +.Procedure + +* Install the AMQ Interconnect Operator to its own `amq-interconnect` namespace. See link:https://access.redhat.com/documentation/en-us/red_hat_amq/2021.q1/html/deploying_amq_interconnect_on_openshift/adding-operator-router-ocp[Installing the AMQ Interconnect Operator]. + +.Verification + +. Verify that the AMQ Interconnect Operator is available and the required pods are running: ++ +[source,terminal] +---- +$ oc get pods -n amq-interconnect +---- ++ +.Example output +[source,terminal] +---- +NAME READY STATUS RESTARTS AGE +amq-interconnect-645db76c76-k8ghs 1/1 Running 0 23h +interconnect-operator-5cb5fc7cc-4v7qm 1/1 Running 0 23h +---- + +. Verify that the required `bare-metal-event-relay` bare-metal event producer pod is running in the `openshift-bare-metal-events` namespace: ++ +[source,terminal] +---- +$ oc get pods -n openshift-bare-metal-events +---- ++ +.Example output +[source,terminal] +---- +NAME READY STATUS RESTARTS AGE +hw-event-proxy-operator-controller-manager-74d5649b7c-dzgtl 2/2 Running 0 25s +---- + + + diff --git a/modules/installing-sno-requirements-for-installing-single-node-openshift.adoc b/modules/installing-sno-requirements-for-installing-single-node-openshift.adoc index 370dcb672e..fe39a1513f 100644 --- a/modules/installing-sno-requirements-for-installing-single-node-openshift.adoc +++ b/modules/installing-sno-requirements-for-installing-single-node-openshift.adoc @@ -36,6 +36,7 @@ The server must have a Baseboard Management Controller (BMC) when booting with v |Kubernetes API|`api..`| Add a DNS A/AAAA or CNAME record. This record must be resolvable by clients external to the cluster. |Internal API|`api-int..`| Add a DNS A/AAAA or CNAME record when creating the ISO manually. This record must be resolvable by nodes within the cluster. |Ingress route|`*.apps..`| Add a wildcard DNS A/AAAA or CNAME record that targets the node. This record must be resolvable by clients external to the cluster. +|Cluster node|`..`| Add a DNS A/AAAA or CNAME record and DNS PTR record to identify the node. |==== + Without persistent IP addresses, communications between the `apiserver` and `etcd` might fail. diff --git a/modules/nw-rfhe-creating-bmc-event-sub.adoc b/modules/nw-rfhe-creating-bmc-event-sub.adoc new file mode 100644 index 0000000000..94ef6f0ae4 --- /dev/null +++ b/modules/nw-rfhe-creating-bmc-event-sub.adoc @@ -0,0 +1,164 @@ +// Module included in the following assemblies: +// +// * monitoring/using-rfhe.adoc + +:_content-type: PROCEDURE +[id="nw-rfhe-creating-bmc-event-sub_{context}"] += Subscribing to bare-metal events + +You can configure the baseboard management controller (BMC) to send bare-metal events to subscribed applications running in an {product-title} cluster. Example Redfish bare-metal events include an increase in device temperature, or removal of a device. You subscribe applications to bare-metal events using a REST API. + +[IMPORTANT] +==== +You can only create a `BMCEventSubscription` custom resource (CR) for physical hardware that supports Redfish and has a vendor interface set to `redfish` or `idrac-redfish`. +==== + +[NOTE] +==== +Use the `BMCEventSubscription` CR to subscribe to predefined Redfish events. The Redfish standard does not provide an option to create specific alerts and thresholds. For example, to receive an alert event when an enclosure's temperature exceeds 40° Celsius, you must manually configure the event according to the vendor's recommendations. +==== + +Perform the following procedure to subscribe to bare-metal events for the node using a `BMCEventSubscription` CR. + +.Prerequisites +* Install the OpenShift CLI (`oc`). +* Log in as a user with `cluster-admin` privileges. +* Get the user name and password for the BMC. +* Deploy a bare-metal node with a Redfish-enabled Baseboard Management Controller (BMC) in your cluster, and enable Redfish events on the BMC. ++ +[NOTE] +==== +Enabling Redfish events on specific hardware is outside the scope of this information. For more information about enabling Redfish events for your specific hardware, consult the BMC manufacturer documentation. +==== + +.Procedure +. Confirm that the node hardware has the Redfish `EventService` enabled by running the following `curl` command: ++ +[source,terminal] +---- +curl https:///redfish/v1/EventService --insecure -H 'Content-Type: application/json' -u ":" +---- ++ +where: ++ +-- +bmc_ip_address:: is the IP address of the BMC where the Redfish events are generated. +-- ++ +.Example output +[source,terminal] +---- +{ + "@odata.context": "/redfish/v1/$metadata#EventService.EventService", + "@odata.id": "/redfish/v1/EventService", + "@odata.type": "#EventService.v1_0_2.EventService", + "Actions": { + "#EventService.SubmitTestEvent": { + "EventType@Redfish.AllowableValues": ["StatusChange", "ResourceUpdated", "ResourceAdded", "ResourceRemoved", "Alert"], + "target": "/redfish/v1/EventService/Actions/EventService.SubmitTestEvent" + } + }, + "DeliveryRetryAttempts": 3, + "DeliveryRetryIntervalSeconds": 30, + "Description": "Event Service represents the properties for the service", + "EventTypesForSubscription": ["StatusChange", "ResourceUpdated", "ResourceAdded", "ResourceRemoved", "Alert"], + "EventTypesForSubscription@odata.count": 5, + "Id": "EventService", + "Name": "Event Service", + "ServiceEnabled": true, + "Status": { + "Health": "OK", + "HealthRollup": "OK", + "State": "Enabled" + }, + "Subscriptions": { + "@odata.id": "/redfish/v1/EventService/Subscriptions" + } +} +---- + +. Get the {redfish-operator} service route for the cluster by running the following command: ++ +[source,terminal] +---- +$ oc get route -n openshift-bare-metal-events +---- ++ +.Example output +[source,terminal] +---- +NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD +hw-event-proxy hw-event-proxy-openshift-bare-metal-events.apps.compute-1.example.com hw-event-proxy-service 9087 edge None +---- + +. Create a `BMCEventSubscription` resource to subscribe to the Redfish events: + +.. Save the following YAML in the `bmc_sub.yaml` file: ++ +[source,yaml] +---- +apiVersion: metal3.io/v1alpha1 +kind: BMCEventSubscription +metadata: + name: sub-01 + namespace: openshift-machine-api +spec: + hostName: <1> + destination: <2> + context: '' +---- +<1> Specifies the name or UUID of the worker node where the Redfish events are generated. +<2> Specifies the bare-metal event proxy service, for example, `https://hw-event-proxy-openshift-bare-metal-events.apps.compute-1.example.com/webhook`. + +.. Create the `BMCEventSubscription` CR: ++ +[source,terminal] +---- +$ oc create -f bmc_sub.yaml +---- + +. Optional: To delete the BMC event subscription, run the following command: ++ +[source,terminal] +---- +$ oc delete -f bmc_sub.yaml +---- + +. Optional: To manually create a Redfish event subscription without creating a `BMCEventSubscription` CR, run the following `curl` command, specifying the BMC username and password. ++ +[source,terminal] +---- +$ curl -i -k -X POST -H "Content-Type: application/json" -d '{"Destination": "https://", "Protocol" : "Redfish", "EventTypes": ["Alert"], "Context": "root"}' -u : 'https:///redfish/v1/EventService/Subscriptions' –v +---- ++ +where: ++ +-- +proxy_service_url:: is the bare-metal event proxy service, for example, `https://hw-event-proxy-openshift-bare-metal-events.apps.compute-1.example.com/webhook`. +-- ++ +-- +bmc_ip_address:: is the IP address of the BMC where the Redfish events are generated. +-- ++ +.Example output +[source,terminal] +---- +HTTP/1.1 201 Created +Server: AMI MegaRAC Redfish Service +Location: /redfish/v1/EventService/Subscriptions/1 +Allow: GET, POST +Access-Control-Allow-Origin: * +Access-Control-Expose-Headers: X-Auth-Token +Access-Control-Allow-Headers: X-Auth-Token +Access-Control-Allow-Credentials: true +Cache-Control: no-cache, must-revalidate +Link: ; rel=describedby +Link: +Link: ; path= +ETag: "1651135676" +Content-Type: application/json; charset=UTF-8 +OData-Version: 4.0 +Content-Length: 614 +Date: Thu, 28 Apr 2022 08:47:57 GMT +---- diff --git a/modules/nw-rfhe-creating-hardware-event.adoc b/modules/nw-rfhe-creating-hardware-event.adoc new file mode 100644 index 0000000000..25bce15fff --- /dev/null +++ b/modules/nw-rfhe-creating-hardware-event.adoc @@ -0,0 +1,78 @@ +// Module included in the following assemblies: +// +// * monitoring/using-rfhe.adoc + +:_content-type: PROCEDURE +[id="nw-rfhe-creating-hardware-event_{context}"] += Creating the bare-metal event and Secret CRs + +To start using bare-metal events, create the `HardwareEvent` custom resource (CR) for the host where the Redfish hardware is present. Hardware events and faults are reported in the `hw-event-proxy` logs. + +.Prerequisites + +* Install the OpenShift CLI (`oc`). +* Log in as a user with `cluster-admin` privileges. +* Install the {redfish-operator}. +* Create a `BMCEventSubscription` CR for the BMC Redfish hardware. + +[NOTE] +==== +Multiple `HardwareEvent` resources are not permitted. +==== + +.Procedure + +. Create the `HardwareEvent` custom resource (CR): + +.. Save the following YAML in the `hw-event.yaml` file: ++ +[source,yaml] +---- +apiVersion: "event.redhat-cne.org/v1alpha1" +kind: "HardwareEvent" +metadata: + name: "hardware-event" +spec: + nodeSelector: + node-role.kubernetes.io/hw-event: "" <1> + transportHost: "amqp://amq-router-service-name.amq-namespace.svc.cluster.local" <2> + logLevel: "debug" <3> + msgParserTimeout: "10" <4> +---- +<1> Required. Use the `nodeSelector` field to target nodes with the specified label, for example, `node-role.kubernetes.io/hw-event: ""`. +<2> Required. AMQP host that delivers the events at the transport layer using the AMQP protocol. +<3> Optional. The default value is `debug`. Sets the log level in `hw-event-proxy` logs. The following log levels are available: `fatal`, `error`, `warning`, `info`, `debug`, `trace`. +<4> Optional. Sets the timeout value in milliseconds for the Message Parser. If a message parsing request is not responded to within the timeout duration, the original hardware event message is passed to the cloud native event framework. The default value is 10. + +.. Create the `HardwareEvent` CR: ++ +[source,terminal] +---- +$ oc create -f hardware-event.yaml +---- + +. Create a BMC username and password `Secret` CR that enables the hardware events proxy to access the Redfish message registry for the bare-metal host. ++ +.. Save the following YAML in the `hw-event-bmc-secret.yaml` file: ++ +[source,yaml] +---- +apiVersion: v1 +kind: Secret +metadata: + name: redfish-basic-auth +type: Opaque +stringData: <1> + username: + password: + # BMC host DNS or IP address + hostaddr: +---- +<1> Enter plain text values for the various items under `stringData`. ++ +.. Create the `Secret` CR: ++ +[source,terminal] +---- +$ oc create -f hw-event-bmc-secret.yaml +---- diff --git a/modules/nw-rfhe-installing-operator-cli.adoc b/modules/nw-rfhe-installing-operator-cli.adoc new file mode 100644 index 0000000000..c21ba94d93 --- /dev/null +++ b/modules/nw-rfhe-installing-operator-cli.adoc @@ -0,0 +1,103 @@ +// Module included in the following assemblies: +// +// * monitoring/using-rfhe.adoc + +:_content-type: PROCEDURE +[id="nw-rfhe-installing-operator-cli_{context}"] += Installing the {redfish-operator} using the CLI + +As a cluster administrator, you can install the {redfish-operator} Operator by using the CLI. + +.Prerequisites + +* A cluster that is installed on bare-metal hardware with nodes that have a RedFish-enabled Baseboard Management Controller (BMC). +* Install the OpenShift CLI (`oc`). +* Log in as a user with `cluster-admin` privileges. + +.Procedure + +. Create a namespace for the {redfish-operator}. + +.. Save the following YAML in the `bare-metal-events-namespace.yaml` file: ++ +[source,yaml] +---- +apiVersion: v1 +kind: Namespace +metadata: + name: openshift-bare-metal-events + labels: + name: openshift-bare-metal-events + openshift.io/cluster-monitoring: "true" +---- + +.. Create the `Namespace` CR: ++ +[source,terminal] +---- +$ oc create -f bare-metal-events-namespace.yaml +---- + +. Create an Operator group for the {redfish-operator} Operator. + +.. Save the following YAML in the `bare-metal-events-operatorgroup.yaml` file: ++ +[source,yaml] +---- +apiVersion: operators.coreos.com/v1 +kind: OperatorGroup +metadata: + name: bare-metal-event-relay-group + namespace: openshift-bare-metal-events +spec: + targetNamespaces: + - openshift-bare-metal-events +---- + +.. Create the `OperatorGroup` CR: ++ +[source,terminal] +---- +$ oc create -f bare-metal-events-operatorgroup.yaml +---- + +. Subscribe to the {redfish-operator}. + +.. Save the following YAML in the `bare-metal-events-sub.yaml` file: ++ +[source,yaml] +---- +apiVersion: operators.coreos.com/v1alpha1 +kind: Subscription +metadata: + name: bare-metal-event-relay-subscription + namespace: openshift-bare-metal-events +spec: + channel: "stable" + name: bare-metal-event-relay + source: redhat-operators + sourceNamespace: openshift-marketplace +---- + +.. Create the `Subscription` CR: ++ +[source,terminal] +---- +$ oc create -f bare-metal-events-sub.yaml +---- + +.Verification + +To verify that the {redfish-operator} Operator is installed, run the following command: + +[source,terminal] +---- +$ oc get csv -n openshift-bare-metal-events -o custom-columns=Name:.metadata.name,Phase:.status.phase +---- + +.Example output +[source,terminal] +---- +Name Phase +bare-metal-event-relay.4.10.0-202206301927 Succeeded +---- diff --git a/modules/nw-rfhe-installing-operator-web-console.adoc b/modules/nw-rfhe-installing-operator-web-console.adoc new file mode 100644 index 0000000000..319028d2f4 --- /dev/null +++ b/modules/nw-rfhe-installing-operator-web-console.adoc @@ -0,0 +1,42 @@ +// Module included in the following assemblies: +// +// * monitoring/using-rfhe.adoc + +:_content-type: PROCEDURE +[id="nw-rfhe-installing-operator-web-console_{context}"] += Installing the {redfish-operator} using the web console + +As a cluster administrator, you can install the {redfish-operator} Operator using the web console. + +.Prerequisites + +* A cluster that is installed on bare-metal hardware with nodes that have a RedFish-enabled Baseboard Management Controller (BMC). +* Log in as a user with `cluster-admin` privileges. + +.Procedure + +. Install the {redfish-operator} using the {product-title} web console: + +.. In the {product-title} web console, click *Operators* -> *OperatorHub*. + +.. Choose *{redfish-operator}* from the list of available Operators, and then click *Install*. + +.. On the *Install Operator* page, select or create a *Namespace*, select *openshift-bare-metal-events*, and then click *Install*. + +.Verification + +Optional: You can verify that the Operator installed successfully by performing the following check: + +. Switch to the *Operators* -> *Installed Operators* page. + +. Ensure that *{redfish-operator}* is listed in the project with a *Status* of *InstallSucceeded*. ++ +[NOTE] +==== +During installation an Operator might display a *Failed* status. If the installation later succeeds with an *InstallSucceeded* message, you can ignore the *Failed* message. +==== + +If the operator does not appear as installed, to troubleshoot further: + +* Go to the *Operators* -> *Installed Operators* page and inspect the *Operator Subscriptions* and *Install Plans* tabs for any failure or errors under *Status*. +* Go to the *Workloads* -> *Pods* page and check the logs for pods in the project namespace. diff --git a/modules/nw-rfhe-introduction.adoc b/modules/nw-rfhe-introduction.adoc new file mode 100644 index 0000000000..23157a59cf --- /dev/null +++ b/modules/nw-rfhe-introduction.adoc @@ -0,0 +1,50 @@ +// Module included in the following assemblies: +// +// * monitoring/using-rfhe.adoc + +:_content-type: CONCEPT +[id="nw-rfhe-introduction_{context}"] += How bare-metal events work + +The {redfish-operator} enables applications running on bare-metal clusters to respond quickly to Redfish hardware changes and failures such as breaches of temperature thresholds, fan failure, disk loss, power outages, and memory failure. These hardware events are delivered over a reliable low-latency transport channel based on Advanced Message Queuing Protocol (AMQP). The latency of the messaging service is between 10 to 20 milliseconds. + +The {redfish-operator} provides a publish-subscribe service for the hardware events, where multiple applications can use REST APIs to subscribe and consume the events. The {redfish-operator} supports hardware that complies with Redfish OpenAPI v1.8 or higher. + +[id="rfhe-elements_{context}"] +== {redfish-operator} data flow + +The following figure illustrates an example bare-metal events data flow. vDU is used as an example of an application interacting with bare-metal events: + +.{redfish-operator} data flow +image::211_OpenShift_Redfish_dataflow_0222.png[Bare-metal events data flow] + +=== Operator-managed pod + +The Operator uses custom resources to manage the pod containing the {redfish-operator} and its components using the `HardwareEvent` CR. + +=== {redfish-operator} + +At startup, the {redfish-operator} queries the Redfish API and downloads all the message registries, including custom registries. The {redfish-operator} then begins to receive subscribed events from the Redfish hardware. + +The {redfish-operator} enables applications running on bare-metal clusters to respond quickly to Redfish hardware changes and failures such as breaches of temperature thresholds, fan failure, disk loss, power outages, and memory failure. The events are reported using the `HardwareEvent` CR. + +=== Cloud native event + +Cloud native events (CNE) is a REST API specification for defining the format of event data. + +=== CNCF CloudEvents + +link:https://cloudevents.io/[CloudEvents] is a vendor-neutral specification developed by the Cloud Native Computing Foundation (CNCF) for defining the format of event data. + +=== AMQP dispatch router + +The dispatch router is responsible for the message delivery service between publisher and subscriber. AMQP 1.0 qpid is an open standard that supports reliable, high-performance, fully-symmetrical messaging over the internet. + +=== Cloud event proxy sidecar + +The cloud event proxy sidecar container image is based on the ORAN API specification and provides a publish-subscribe event framework for hardware events. + +[id="rfhe-data-flow_{context}"] +== Redfish message parsing service + +In addition to handling Redfish events, the {redfish-operator} provides message parsing for events without a `Message` property. The proxy downloads all the Redfish message registries including vendor specific registries from the hardware when it starts. If an event does not contain a `Message` property, the proxy uses the Redfish message registries to construct the `Message` and `Resolution` properties and add them to the event before passing the event to the cloud events framework. This service allows Redfish events to have smaller message size and lower transmission latency. diff --git a/modules/nw-rfhe-quering-redfish-hardware-event-subs.adoc b/modules/nw-rfhe-quering-redfish-hardware-event-subs.adoc new file mode 100644 index 0000000000..e71d0deaf9 --- /dev/null +++ b/modules/nw-rfhe-quering-redfish-hardware-event-subs.adoc @@ -0,0 +1,68 @@ +// Module included in the following assemblies: +// +// * monitoring/using-rfhe.adoc + +:_module-type: PROCEDURE +[id="nw-rfhe-querying-redfish-hardware-event-subs_{context}"] += Querying Redfish bare-metal event subscriptions with curl + +Some hardware vendors limit the amount of Redfish hardware event subscriptions. You can query the number of Redfish event subscriptions by using `curl`. + +.Prerequisites +* Get the user name and password for the BMC. +* Deploy a bare-metal node with a Redfish-enabled Baseboard Management Controller (BMC) in your cluster, and enable Redfish hardware events on the BMC. + +.Procedure + +. Check the current subscriptions for the BMC by running the following `curl` command: ++ +[source,terminal] +---- +$ curl --globoff -H "Content-Type: application/json" -k -X GET --user : https:///redfish/v1/EventService/Subscriptions +---- ++ +where: ++ +-- +bmc_ip_address:: is the IP address of the BMC where the Redfish events are generated. +-- ++ +.Example output +[source,terminal] +---- +% Total % Received % Xferd Average Speed Time Time Time Current +Dload Upload Total Spent Left Speed +100 435 100 435 0 0 399 0 0:00:01 0:00:01 --:--:-- 399 +{ + "@odata.context": "/redfish/v1/$metadata#EventDestinationCollection.EventDestinationCollection", + "@odata.etag": "" + 1651137375 "", + "@odata.id": "/redfish/v1/EventService/Subscriptions", + "@odata.type": "#EventDestinationCollection.EventDestinationCollection", + "Description": "Collection for Event Subscriptions", + "Members": [ + { + "@odata.id": "/redfish/v1/EventService/Subscriptions/1" + }], + "Members@odata.count": 1, + "Name": "Event Subscriptions Collection" +} +---- ++ +In this example, a single subscription is configured: `/redfish/v1/EventService/Subscriptions/1`. + +. Optional: To remove the `/redfish/v1/EventService/Subscriptions/1` subscription with `curl`, run the following command, specifying the BMC username and password: ++ +[source,terminal] +---- +$ curl --globoff -L -w "%{http_code} %{url_effective}\n" -k -u :-H "Content-Type: application/json" -d '{}' -X DELETE https:///redfish/v1/EventService/Subscriptions/1 +---- ++ +where: ++ +-- +bmc_ip_address:: is the IP address of the BMC where the Redfish events are generated. +-- + + + diff --git a/modules/ztp-about-ztp-and-distributed-units-on-single-node-openshift-clusters.adoc b/modules/ztp-about-ztp-and-distributed-units-on-single-node-openshift-clusters.adoc deleted file mode 100644 index 61061b2c39..0000000000 --- a/modules/ztp-about-ztp-and-distributed-units-on-single-node-openshift-clusters.adoc +++ /dev/null @@ -1,65 +0,0 @@ -// Module included in the following assemblies: -// -// *scalability_and_performance/ztp-deploying-disconnected.adoc - -:_content-type: CONCEPT -[id="about-ztp-and-distributed-units-on-single-node-clusters_{context}"] -= About ZTP and distributed units on single nodes - -You can install a distributed unit (DU) on a single node at scale with {rh-rhacm-first} (ACM) using the assisted installer (AI) and the policy generator with core-reduction technology enabled. The DU installation is done using zero touch provisioning (ZTP) in a disconnected environment. - -ACM manages clusters in a hub and spoke architecture, where a single hub cluster manages many spoke clusters. ACM applies radio access network (RAN) policies from predefined custom resources (CRs). Hub clusters running ACM provision and deploy the spoke clusters using ZTP and AI. DU installation follows the AI installation of {product-title} on a single node. - -The AI service handles provisioning of {product-title} on single nodes running on bare metal. ACM ships with and deploys the assisted installer when the `MultiClusterHub` custom resource is installed. - -With ZTP and AI, you can provision {product-title} single nodes to run your DUs at scale. A high level overview of ZTP for distributed units in a disconnected environment is as follows: - -* A hub cluster running ACM manages a disconnected internal registry that mirrors the {product-title} release images. The internal registry is used to provision the spoke single nodes. - -* You manage the bare-metal host machines for your DUs in an inventory file that uses YAML for formatting. You store the inventory file in a Git repository. - - -* You install the DU bare-metal host machines on site, and make the hosts ready for provisioning. To be ready for provisioning, the following is required for each bare-metal host: - -** Network connectivity - including DNS for your network. Hosts should be reachable through the hub and managed spoke clusters. Ensure there is layer 3 connectivity between the hub and the host where you want to install your hub cluster. - -** Baseboard Management Controller (BMC) details for each host - ZTP uses BMC details to connect the URL and credentials for accessing the BMC. -Create spoke cluster definition CRs. These define the relevant elements for the managed clusters. Required -CRs are as follows: -+ -[cols="1,1"] -|=== -| Custom Resource | Description - -|Namespace -|Namespace for the managed single-node cluster. - -|BMCSecret CR -|Credentials for the host BMC. - -|Image Pull Secret CR -|Pull secret for the disconnected registry. - -|AgentClusterInstall -|Specifies the single-node cluster's configuration such as networking, number of supervisor (control plane) nodes, and so on. - -|ClusterDeployment -|Defines the cluster name, domain, and other details. - -|KlusterletAddonConfig -|Manages installation and termination of add-ons on the ManagedCluster for ACM. - -|ManagedCluster -|Describes the managed cluster for ACM. - -|InfraEnv -|Describes the installation ISO to be mounted on the destination node that the assisted installer service creates. -This is the final step of the manifest creation phase. - -|BareMetalHost -|Describes the details of the bare-metal host, including BMC and credentials details. -|=== - -* When a change is detected in the host inventory repository, a host management event is triggered to provision the new or updated host. - -* The host is provisioned. When the host is provisioned and successfully rebooted, the host agent reports `Ready` status to the hub cluster. diff --git a/modules/ztp-acm-adding-images-to-mirror-registry.adoc b/modules/ztp-acm-adding-images-to-mirror-registry.adoc index aa7bdc1164..71b3fdd31c 100644 --- a/modules/ztp-acm-adding-images-to-mirror-registry.adoc +++ b/modules/ztp-acm-adding-images-to-mirror-registry.adoc @@ -4,7 +4,7 @@ // scalability_and_performance/ztp-deploying-disconnected.adoc :_content-type: PROCEDURE [id="ztp-acm-adding-images-to-mirror-registry_{context}"] -= Adding {op-system} ISO and RootFS images to a disconnected mirror host += Adding {op-system} ISO and RootFS images to the disconnected mirror host Before you install a cluster on infrastructure that you provision, you must create {op-system-first} machines for it to use. Use a disconnected mirror to host the {op-system} images you require to provision your distributed unit (DU) bare-metal hosts. diff --git a/modules/ztp-acm-installing-disconnected-rhacm.adoc b/modules/ztp-acm-installing-disconnected-rhacm.adoc index f6a16e2048..64c7e5b465 100644 --- a/modules/ztp-acm-installing-disconnected-rhacm.adoc +++ b/modules/ztp-acm-installing-disconnected-rhacm.adoc @@ -23,4 +23,4 @@ See link:https://docs.openshift.com/container-platform/4.9/operators/admin/olm-r .Procedure -* Install {rh-rhacm} on the hub cluster in the disconnected environment. See link:https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes/2.4/html/install/installing#install-on-disconnected-networks[Installing {rh-rhacm} in disconnected networks]. +* Install {rh-rhacm} on the hub cluster in the disconnected environment. See link:https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes/2.4/html/install/installing#install-on-disconnected-networks[Installing {rh-rhacm} in a disconnected environment]. diff --git a/modules/ztp-acm-preparing-to-install-disconnected-acm.adoc b/modules/ztp-acm-preparing-to-install-disconnected-acm.adoc index 3adba0fb22..284782e9b7 100644 --- a/modules/ztp-acm-preparing-to-install-disconnected-acm.adoc +++ b/modules/ztp-acm-preparing-to-install-disconnected-acm.adoc @@ -11,15 +11,3 @@ Before you can provision distributed units (DU) at scale, you must install {rh-r {rh-rhacm} is deployed as an Operator on the {product-title} hub cluster. It controls clusters and applications from a single console with built-in security policies. {rh-rhacm} provisions and manage your DU hosts. To install {rh-rhacm} in a disconnected environment, you create a mirror registry that mirrors the Operator Lifecycle Manager (OLM) catalog that contains the required Operator images. OLM manages, installs, and upgrades Operators and their dependencies in the cluster. You also use a disconnected mirror host to serve the {op-system} ISO and RootFS disk images that provision the DU bare-metal host operating system. - -Before you install a cluster on infrastructure that you provision in a restricted network, you must mirror the required container images into that environment. You can also use this procedure in unrestricted networks to ensure your clusters only use container images that have satisfied your organizational controls on external content. - -[IMPORTANT] -==== -You must have access to the internet to obtain the necessary container images. -In this procedure, you place the mirror registry on a mirror host -that has access to both your network and the internet. If you do not have access -to a mirror host, use the disconnected procedure to copy images to a device that you -can move across network boundaries. -==== - diff --git a/modules/ztp-adding-new-content-to-gitops-ztp.adoc b/modules/ztp-adding-new-content-to-gitops-ztp.adoc new file mode 100644 index 0000000000..018ff59f71 --- /dev/null +++ b/modules/ztp-adding-new-content-to-gitops-ztp.adoc @@ -0,0 +1,74 @@ +// Module included in the following assemblies: +// +// scalability_and_performance/ztp-deploying-disconnected.adoc + +:_content-type: PROCEDURE +[id="ztp-adding-new-content-to-gitops-ztp_{context}"] += Adding new content to the GitOps ZTP pipeline + +The source CRs in the GitOps ZTP site generator container provide a set of critical features and node tuning settings for RAN Distributed Unit (DU) applications. These are applied to the clusters that you deploy with ZTP. To add or modify existing source CRs in the `ztp-site-generate` container, rebuild the `ztp-site-generate` container and make it available to the hub cluster, typically from the disconnected registry associated with the hub cluster. Any valid {product-title} CR can be added. + +Perform the following procedure to add new content to the ZTP pipeline. + +.Procedure + +. Create a directory containing a Containerfile and the source CR YAML files that you want to include in the updated `ztp-site-generate` container, for example: ++ +[source,text] +---- +ztp-update/ +├── example-cr1.yaml +├── example-cr2.yaml +└── ztp-update.in +---- + +. Add the following content to the `ztp-update.in` Containerfile: ++ +[source,text] +---- +FROM registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.10 + +ADD example-cr2.yaml /kustomize/plugin/ran.openshift.io/v1/policygentemplate/source-crs/ +ADD example-cr1.yaml /kustomize/plugin/ran.openshift.io/v1/policygentemplate/source-crs/ +---- + +. Open a terminal at the `ztp-update/` folder and rebuild the container: ++ +[source,terminal] +---- +$ podman build -t ztp-site-generate-rhel8-custom:v4.10-custom-1 +---- + +. Push the built container image to your disconnected registry, for example: ++ +[source,terminal] +---- +$ podman push localhost/ztp-site-generate-rhel8-custom:v4.10-custom-1 registry.example.com:5000/ztp-site-generate-rhel8-custom:v4.10-custom-1 +---- + +. Patch the Argo CD instance on the hub cluster to point to the newly built container image: ++ +[source,terminal] +---- +$ oc patch -n openshift-gitops argocd openshift-gitops --type=json -p '[{"op": "replace", "path":"/spec/repo/initContainers/0/image", "value": "registry.example.com:5000/ztp-site-generate-rhel8-custom:v4.10-custom-1"} ]' +---- ++ +When the Argo CD instance is patched, the `openshift-gitops-repo-server` pod automatically restarts. + +.Verification + +. Verify that the new `openshift-gitops-repo-server` pod has completed initialization and that the previous repo pod is terminated: ++ +[source,terminal] +---- +$ oc get pods -n openshift-gitops | grep openshift-gitops-repo-server +---- ++ +.Example output ++ +[source,terminal] +---- +openshift-gitops-server-7df86f9774-db682 1/1 Running 1 28s +---- ++ +You must wait until the new `openshift-gitops-repo-server` pod has completed initialization and the previous pod is terminated before the newly added container image content is available. diff --git a/modules/ztp-ai-install-ocp-clusters-on-bare-metal.adoc b/modules/ztp-ai-install-ocp-clusters-on-bare-metal.adoc index 11014d007a..456e3ce010 100644 --- a/modules/ztp-ai-install-ocp-clusters-on-bare-metal.adoc +++ b/modules/ztp-ai-install-ocp-clusters-on-bare-metal.adoc @@ -8,7 +8,7 @@ The Assisted Installer Service (AIS) deploys {product-title} clusters. {rh-rhacm-first} ships with AIS. AIS is deployed when you enable the MultiClusterHub Operator on the {rh-rhacm} hub cluster. -For distributed units (DUs), {rh-rhacm} supports {product-title} deployments that run on a single bare-metal host. The single-node cluster acts as both a control plane and a worker node. +For distributed units (DUs), {rh-rhacm} supports {product-title} deployments that run on a single bare-metal host, three-node clusters, or standard clusters. In the case of single node clusters or three-node clusters, all nodes act as both control plane and worker nodes. .Prerequisites diff --git a/modules/ztp-applying-source-custom-resource-policies.adoc b/modules/ztp-applying-source-custom-resource-policies.adoc deleted file mode 100644 index 8703738d95..0000000000 --- a/modules/ztp-applying-source-custom-resource-policies.adoc +++ /dev/null @@ -1,160 +0,0 @@ -// Module included in the following assemblies: -// -// scalability_and_performance/ztp-deploying-disconnected.adoc - -:_content-type: PROCEDURE -[id="ztp-applying-source-custom-resource-policies_{context}"] -= Applying source custom resource policies - -Source custom resource policies include the following: - -* SR-IOV policies -* PTP policies -* Performance Add-on Operator policies -* MachineConfigPool policies -* SCTP policies - -You need to define the source custom resource that generates the ACM policy with consideration of possible overlay to its metadata or spec/data. -For example, a `common-namespace-policy` contains a `Namespace` definition that exists in all managed clusters. -This `namespace` is placed under the Common category and there are no changes for its spec or data across all clusters. - -.Namespace policy example - -The following example shows the source custom resource for this namespace: - -[source,yaml] ----- -apiVersion: v1 -kind: Namespace -metadata: - name: openshift-sriov-network-operator - labels: - openshift.io/run-level: "1" ----- - -.Example output - -The generated policy that applies this `namespace` includes the `namespace` as it is defined above without any change, as shown in this example: - -[source,yaml] ----- -apiVersion: policy.open-cluster-management.io/v1 -kind: Policy -metadata: - name: common-sriov-sub-ns-policy - namespace: common-sub - annotations: - policy.open-cluster-management.io/categories: CM Configuration Management - policy.open-cluster-management.io/controls: CM-2 Baseline Configuration - policy.open-cluster-management.io/standards: NIST SP 800-53 -spec: - remediationAction: enforce - disabled: false - policy-templates: - - objectDefinition: - apiVersion: policy.open-cluster-management.io/v1 - kind: ConfigurationPolicy - metadata: - name: common-sriov-sub-ns-policy-config - spec: - remediationAction: enforce - severity: low - namespaceselector: - exclude: - - kube-* - include: - - '*' - object-templates: - - complianceType: musthave - objectDefinition: - apiVersion: v1 - kind: Namespace - metadata: - labels: - openshift.io/run-level: "1" - name: openshift-sriov-network-operator ----- - -.SRIOV policy example - -The following example shows a `SriovNetworkNodePolicy` definition that exists in different clusters with a different specification for each cluster. -The example also shows the source custom resource for the `SriovNetworkNodePolicy`: - -[source,yaml] ----- -apiVersion: sriovnetwork.openshift.io/v1 -kind: SriovNetworkNodePolicy -metadata: - name: sriov-nnp - namespace: openshift-sriov-network-operator -spec: - # The $ tells the policy generator to overlay/remove the spec.item in the generated policy. - deviceType: $deviceType - isRdma: false - nicSelector: - pfNames: [$pfNames] - nodeSelector: - node-role.kubernetes.io/worker: "" - numVfs: $numVfs - priority: $priority - resourceName: $resourceName ----- - -.Example output - -The `SriovNetworkNodePolicy` name and `namespace` are the same for all clusters, so both are defined in the source `SriovNetworkNodePolicy`. -However, the generated policy requires the `$deviceType`, `$numVfs`, as input parameters in order to adjust the policy for each cluster. -The generated policy is shown in this example: - -[source,yaml] ----- -apiVersion: policy.open-cluster-management.io/v1 -kind: Policy -metadata: - name: site-du-sno-1-sriov-nnp-mh-policy - namespace: sites-sub - annotations: - policy.open-cluster-management.io/categories: CM Configuration Management - policy.open-cluster-management.io/controls: CM-2 Baseline Configuration - policy.open-cluster-management.io/standards: NIST SP 800-53 -spec: - remediationAction: enforce - disabled: false - policy-templates: - - objectDefinition: - apiVersion: policy.open-cluster-management.io/v1 - kind: ConfigurationPolicy - metadata: - name: site-du-sno-1-sriov-nnp-mh-policy-config - spec: - remediationAction: enforce - severity: low - namespaceselector: - exclude: - - kube-* - include: - - '*' - object-templates: - - complianceType: musthave - objectDefinition: - apiVersion: sriovnetwork.openshift.io/v1 - kind: SriovNetworkNodePolicy - metadata: - name: sriov-nnp-du-mh - namespace: openshift-sriov-network-operator - spec: - deviceType: vfio-pci - isRdma: false - nicSelector: - pfNames: - - ens7f0 - nodeSelector: - node-role.kubernetes.io/worker: "" - numVfs: 8 - resourceName: du_mh ----- - -[NOTE] -==== -Defining the required input parameters as `$value`, for example `$deviceType`, is not mandatory. The `$` tells the policy generator to overlay or remove the item from the generated policy. Otherwise, the value does not change. -==== diff --git a/modules/ztp-applying-the-ran-policies-for-monitoring-cluster-activity.adoc b/modules/ztp-applying-the-ran-policies-for-monitoring-cluster-activity.adoc deleted file mode 100644 index 6c00455564..0000000000 --- a/modules/ztp-applying-the-ran-policies-for-monitoring-cluster-activity.adoc +++ /dev/null @@ -1,31 +0,0 @@ -// Module included in the following assemblies: -// -// scalability_and_performance/ztp-deploying-disconnected.adoc - -:_content-type: CONCEPT -[id="ztp-applying-the-ran-policies-for-monitoring-cluster-activity_{context}"] -= Applying the RAN policies for monitoring cluster activity - -Zero touch provisioning (ZTP) uses {rh-rhacm-first} to apply the radio access network (RAN) policies using a policy-based governance approach to automatically monitor cluster activity. - -The policy generator (PolicyGen) is a Kustomize plug-in that facilitates creating ACM policies from predefined custom resources. -There are three main items: Policy Categorization, Source CR policy, and PolicyGenTemplate. PolicyGen relies on these to generate the policies and -their placement bindings and rules. - -The following diagram shows how the RAN policy generator interacts with GitOps and ACM. - -image::175_OpenShift_ACM_0821_1.png[RAN policy generator] - -RAN policies are categorized into three main groups: - -Common:: A policy that exists in the `Common` category is applied to all clusters to be represented by the site plan. - -Groups:: A policy that exists in the `Groups` category is applied to a group of clusters. Every group of clusters could have their own policies that exist under the -Groups category. For example, `Groups/group1` could have its own policies that are applied to the clusters belonging to `group1`. - -Sites:: A policy that exists in the `Sites` category is applied to a specific cluster. Any cluster could have its own policies that exist in the `Sites` category. -For example, `Sites/cluster1` will have its own policies applied to `cluster1`. - -The following diagram shows how policies are generated. - -image::175_OpenShift_ACM_0821_2.png[Generating policies] diff --git a/modules/ztp-checking-the-installation-status.adoc b/modules/ztp-checking-the-installation-status.adoc deleted file mode 100644 index e5a3533d21..0000000000 --- a/modules/ztp-checking-the-installation-status.adoc +++ /dev/null @@ -1,30 +0,0 @@ -// Module included in the following assemblies: -// -// *scalability_and_performance/ztp-deploying-disconnected.adoc - -:_content-type: PROCEDURE -[id="ztp-checking-the-installation-status_{context}"] -= Checking the installation status - -The ArgoCD pipeline detects the `SiteConfig` and `PolicyGenTemplate` custom resources (CRs) in the Git repository and syncs them to the hub cluster. In the process, it generates installation and policy CRs and applies them to the hub cluster. You can monitor the progress of this synchronization in the ArgoCD dashboard. - -.Procedure - -. Monitor the progress of cluster installation using the following commands: -+ -[source,terminal] ----- -$ export CLUSTER= ----- -+ -[source,terminal] ----- -$ oc get agentclusterinstall -n $CLUSTER $CLUSTER -o jsonpath='{.status.conditions[?(@.type=="Completed")]}' | jq ----- -+ -[source,terminal] ----- -$ curl -sk $(oc get agentclusterinstall -n $CLUSTER $CLUSTER -o jsonpath='{.status.debugInfo.eventsURL}') | jq '.[-2,-1]' ----- - -. Use the {rh-rhacm-first} (ACM) dashboard to monitor the progress of policy reconciliation. diff --git a/modules/ztp-cluster-provisioning.adoc b/modules/ztp-cluster-provisioning.adoc deleted file mode 100644 index 92f7092111..0000000000 --- a/modules/ztp-cluster-provisioning.adoc +++ /dev/null @@ -1,23 +0,0 @@ -// Module included in the following assemblies: -// -// scalability_and_performance/ztp-deploying-disconnected.adoc - -:_content-type: CONCEPT -[id="ztp-cluster-provisioning_{context}"] -= Cluster provisioning - -Zero touch provisioning (ZTP) provisions clusters using a layered approach. The base components consist of {op-system-first}, the basic operating system -for the cluster, and {product-title}. After these components are installed, the worker node can join the existing cluster. When the node has joined the existing cluster, the 5G RAN profile Operators are applied. - -The following diagram illustrates this architecture. - -image::177_OpenShift_cluster_provisioning_0821.png[Cluster provisioning] - -The following RAN Operators are deployed on every cluster: - -* Machine Config -* Precision Time Protocol (PTP) -* Performance Addon Operator -* SR-IOV -* Local Storage Operator -* Logging Operator diff --git a/modules/ztp-configuring-a-static-ip.adoc b/modules/ztp-configuring-a-static-ip.adoc index 3e470dcfa1..70c9999a08 100644 --- a/modules/ztp-configuring-a-static-ip.adoc +++ b/modules/ztp-configuring-a-static-ip.adoc @@ -79,12 +79,12 @@ spec: name: namespace: sshAuthorizedKey: - agentLabels: <1> - location: "" + agentLabelSelector: + matchLabels: + cluster-name: pullSecretRef: name: assisted-deployment-pull-secret nmStateConfigLabelSelector: matchLabels: sno-cluster-: # Match this label ---- -<1> Sets a label to match. The labels apply when the agents boot. \ No newline at end of file diff --git a/modules/ztp-configuring-ptp-fast-events.adoc b/modules/ztp-configuring-ptp-fast-events.adoc index 92f21828ab..6b7dc4dae4 100644 --- a/modules/ztp-configuring-ptp-fast-events.adoc +++ b/modules/ztp-configuring-ptp-fast-events.adoc @@ -4,37 +4,13 @@ :_module-type: PROCEDURE [id="ztp-configuring-ptp-fast-events_{context}"] -= Configuring PTP fast events using PolicyGenTemplate custom resources and GitOps ZTP += Configuring PTP fast events using PolicyGenTemplate CRs You can configure PTP fast events for vRAN clusters that are deployed using the GitOps Zero Touch Provisioning (ZTP) pipeline. Use `PolicyGenTemplate` custom resources (CRs) as the basis to create a hierarchy of configuration files tailored to your specific site requirements. -The `PolicyGenTemplate` CRs that are relevant to PTP events can be found in the `/home/ztp/argocd/example` folder in the `quay.io/redhat_emp1/ztp-site-generator:latest` reference architecture container image. The reference architecture has a `/policygentemplates` and `/siteconfig` folder. The `/policygentemplates` folder has common, group, and site-specific configuration CRs. Each `PolicyGenTemplate` CR refers to other CRs that are in the `/source-crs` folder of the reference architecture. - -The `PolicyGenTemplate` CRs required to deploy PTP fast events are described below. - -.PolicyGenTemplate CRs for vRAN deployments -[cols=2*, options="header"] -|==== -|PolicyGenTemplate CR -|Description - -|`common-ranGen.yaml` -|Contains the common RAN policies that get applied to all clusters. To deploy the PTP Operator to your clusters, you configure `Namespace`, `Subscription`, and `OperatorGroup` CRs. - -|`group-du-3node-ranGen.yaml` -|Contains the RAN policies for three-node clusters only, including PTP fast events configuration. - -|`group-du-sno-ranGen.yaml` -|Contains the RAN policies for single-node clusters only, including PTP fast events configuration. - -|`group-du-standard-ranGen.yaml` -|Contains the RAN policies for standard three control-plane clusters, including PTP fast events configuration. -|==== - .Prerequisites * Create a Git repository where you manage your custom site configuration data. -* Extract the contents of the `/home/ztp` folder from the `quay.io/redhat_emp1/ztp-site-generator:latest` reference architecture container image, and review the changes. .Procedure @@ -44,16 +20,16 @@ The `PolicyGenTemplate` CRs required to deploy PTP fast events are described bel ---- #AMQ interconnect operator for fast events - fileName: AmqSubscriptionNS.yaml - policyName: "amq-sub-policy" + policyName: "subscriptions-policy" - fileName: AmqSubscriptionOperGroup.yaml - policyName: "amq-sub-policy" + policyName: "subscriptions-policy" - fileName: AmqSubscription.yaml - policyName: "amq-sub-policy" + policyName: "subscriptions-policy" ---- . Apply the following `PolicyGenTemplate` changes to `group-du-3node-ranGen.yaml`, `group-du-sno-ranGen.yaml`, or `group-du-standard-ranGen.yaml` files according to your requirements: -.. In `.sourceFiles`, add the `PtpOperatorConfig` CR that configures the AMQ transport host to the `config-policy`: +.. In `.sourceFiles`, add the `PtpOperatorConfig` CR file that configures the AMQ transport host to the `config-policy`: + [source,yaml] ---- @@ -80,13 +56,21 @@ The `PolicyGenTemplate` CRs required to deploy PTP fast events are described bel maxOffsetThreshold: 100 #nano secs minOffsetThreshold: -100 #nano secs ---- -<1> Can be one `PtpConfigMaster.yaml`, `PtpConfigSlave.yaml`, or `PtpConfigSlaveCvl.yaml` depending on your requirements. `PtpConfigSlaveCvl.yaml` configes `linuxptp` services for an Intel E810 Columbiaville NIC. +<1> Can be one `PtpConfigMaster.yaml`, `PtpConfigSlave.yaml`, or `PtpConfigSlaveCvl.yaml` depending on your requirements. `PtpConfigSlaveCvl.yaml` configures `linuxptp` services for an Intel E810 Columbiaville NIC. For configurations based on `group-du-sno-ranGen.yaml` or `group-du-3node-ranGen.yaml`, use `PtpConfigSlave.yaml`. <2> Device specific interface name. <3> You must append the `--summary_interval -4` value to `ptp4lOpts` in `.spec.sourceFiles.spec.profile` to enable PTP fast events. <4> `ptpClockThreshold` configues how long the clock stays in clock holdover state. Holdover state is the period between local and master clock synchronizations. Offset is the time difference between the local and master clock. +. Apply the following `PolicyGenTemplate` changes to your specific site YAML files, for example, `example-sno-site.yaml`: + +.. In `.sourceFiles`, add the `Interconnect` CR file that configures the AMQ router to the `config-policy`: ++ +[source,yaml] +---- +- fileName: AmqInstance.yaml + policyName: "config-policy" +---- + . Merge any other required changes and files with your custom site repository. . Push the changes to your site configuration repository to deploy PTP fast events to new sites using GitOps ZTP. - -//. Optional: Use the Topology-Aware Lifecycle Operator to deploy PTP events to existing sites. diff --git a/modules/ztp-configuring-uefi-secure-boot.adoc b/modules/ztp-configuring-uefi-secure-boot.adoc new file mode 100644 index 0000000000..1ea9457e9c --- /dev/null +++ b/modules/ztp-configuring-uefi-secure-boot.adoc @@ -0,0 +1,89 @@ +// Module included in the following assemblies: +// +// scalability_and_performance/ztp-deploying-disconnected.adoc + +:_module-type: PROCEDURE +[id="ztp-configuring-uefi-secure-boot_{context}"] += Configuring UEFI secure boot for clusters using PolicyGenTemplate CRs + +You can configure UEFI secure boot for vRAN clusters that are deployed using the +GitOps zero touch provisioning (ZTP) pipeline. + +.Prerequisites + +* Create a Git repository where you manage your custom site configuration data. + +.Procedure + +. Create the following `MachineConfig` resource and save it in the `uefi-secure-boot.yaml` file: ++ +[source,yaml] +---- +apiVersion: machineconfiguration.openshift.io/v1 +kind: MachineConfig +metadata: + labels: + machineconfiguration.openshift.io/role: master + name: uefi-secure-boot +spec: + config: + ignition: + version: 3.1.0 + kernelArguments: + - efi=runtime +---- + +. In your Git repository custom `/siteconfig` directory, create a `/sno-extra-manifest` folder and add the `uefi-secure-boot.yaml` file, for example: ++ +[source,text] +---- +siteconfig +├── site1-sno-du.yaml +├── site2-standard-du.yaml +└── sno-extra-manifest + └── uefi-secure-boot.yaml +---- + +. In your cluster `SiteConfig` CR, specify the required values for `extraManifestPath` and `bootMode`: + +.. Enter the directory name in the `.spec.clusters.extraManifestPath` field, for example: ++ +[source,yaml] +---- +clusters: + - clusterName: "example-cluster" + extraManifestPath: sno-extra-manifest/ +---- + +.. Set the value for `.spec.clusters.nodes.bootMode` to `UEFISecureBoot`, for example: ++ +[source,yaml] +---- +nodes: + - hostName: "ran.example.lab" + bootMode: "UEFISecureBoot" +---- + +. Deploy the cluster using the GitOps ZTP pipeline. + +.Verification + +. Open a remote shell to the deployed cluster, for example: ++ +[source,terminal] +---- +$ oc debug node/node-1.example.com +---- + +. Verify that the `SecureBoot` feature is enabled: ++ +[source,terminal] +---- +sh-4.4# mokutil --sb-state +---- ++ +.Example output +[source,terminal] +---- +SecureBoot enabled +---- diff --git a/modules/ztp-creating-a-validator-inform-policy.adoc b/modules/ztp-creating-a-validator-inform-policy.adoc new file mode 100644 index 0000000000..01fb8302e8 --- /dev/null +++ b/modules/ztp-creating-a-validator-inform-policy.adoc @@ -0,0 +1,94 @@ +// Module included in the following assemblies: +// +// * scalability_and_performance/ztp-deploying-disconnected.adoc + +:_content-type: PROCEDURE +[id="ztp-creating-a-validator-inform-policy_{context}"] += Creating a validator inform policy + +Use the following procedure to create a validator inform policy that provides an indication of +when the zero touch provisioning (ZTP) installation and configuration of the deployed cluster is complete. This policy +can be used for deployments of single node clusters, three-node clusters, and standard clusters. + +.Procedure + +. Create a stand-alone `PolicyGenTemplate` custom resource (CR) that contains the source file +`validatorCRs/informDuValidator.yaml`. +You only need one stand-alone `PolicyGenTemplate` CR for each cluster type. ++ +.Single node clusters ++ +[source,yaml] +---- +group-du-sno-validator-ranGen.yaml +apiVersion: ran.openshift.io/v1 +kind: PolicyGenTemplate +metadata: + name: "group-du-sno-validator" <1> + namespace: "ztp-group" <2> +spec: + bindingRules: + group-du-sno: "" <3> + bindingExcludedRules: + ztp-done: "" <4> + mcp: "master" <5> + sourceFiles: + - fileName: validatorCRs/informDuValidator.yaml + remediationAction: inform <6> + policyName: "du-policy" <7> +---- ++ +.Three-node clusters ++ +[source,yaml] +---- +group-du-3node-validator-ranGen.yaml +apiVersion: ran.openshift.io/v1 +kind: PolicyGenTemplate +metadata: + name: "group-du-3node-validator" <1> + namespace: "ztp-group" <2> +spec: + bindingRules: + group-du-3node: "" <3> + bindingExcludedRules: + ztp-done: "" <4> + mcp: "master" <5> + sourceFiles: + - fileName: validatorCRs/informDuValidator.yaml + remediationAction: inform <6> + policyName: "du-policy" <7> +---- ++ +.Standard clusters ++ +[source,yaml] +---- +group-du-standard-validator-ranGen.yaml +apiVersion: ran.openshift.io/v1 +kind: PolicyGenTemplate +metadata: + name: "group-du-standard-validator" <1> + namespace: "ztp-group" <2> +spec: + bindingRules: + group-du-standard: "" <3> + bindingExcludedRules: + ztp-done: "" <4> + mcp: "worker" <5> + sourceFiles: + - fileName: validatorCRs/informDuValidator.yaml + remediationAction: inform <6> + policyName: "du-policy" <7> +---- +<1> The name of `PolicyGenTemplates` object. This name is also used as part of the names +for the `placementBinding`, `placementRule`, and `policy` that are created in the requested `namespace`. +<2> This value should match the `namespace` used in the group `PolicyGenTemplates`. +<3> The `group-du-*` label defined in `bindingRules` must exist in the `SiteConfig` files. +<4> The label defined in `bindingExcludedRules` must be`ztp-done:`. The `ztp-done` label is used in coordination with the {cgu-operator-full}. +<5> `mcp` defines the `MachineConfigPool` object that is used in the source file `validatorCRs/informDuValidator.yaml`. It should be `master` for single node and three-node cluster deployments and `worker` for standard cluster deployments. +<6> Optional. The default value is `inform`. +<7> This value is used as part of the name for the generated {rh-rhacm} policy. +The generated validator policy for the single node example is named `group-du-sno-validator-du-policy`. + +. Push the files to the ZTP Git repository. diff --git a/modules/ztp-creating-the-policygentemplate-cr.adoc b/modules/ztp-creating-the-policygentemplate-cr.adoc new file mode 100644 index 0000000000..7cb257e7dd --- /dev/null +++ b/modules/ztp-creating-the-policygentemplate-cr.adoc @@ -0,0 +1,32 @@ +// Module included in the following assemblies: +// +// *scalability_and_performance/ztp-deploying-disconnected.adoc + +:_content-type: CONCEPT +[id="ztp-creating-the-policygentemplate-cr_{context}"] += Creating the PolicyGenTemplate CR + +Use this procedure to create the `PolicyGenTemplate` custom resource (CR) for your site in your local clone of the Git repository. + +.Procedure + +. Choose an appropriate example from `out/argocd/example/policygentemplates`. This directory demonstrates a three-level policy framework that represents a well-supported low-latency profile tuned for the needs of 5G Telco DU deployments: ++ +* A single `common-ranGen.yaml` file that should apply to all types of sites. +* A set of shared `group-du-*-ranGen.yaml` files, each of which should be common across a set of similar clusters. +* An example `example-*-site.yaml` that can be copied and updated for each individual site. + +. Ensure that the labels defined in your `PolicyGenTemplate` `bindingRules` section correspond to the labels that are defined in the `SiteConfig` files of the clusters you are managing. + +. Ensure that the content of the overlaid spec files matches your desired end state. As a reference, the `out/source-crs` directory contains the full list of `source-crs` available to be included and overlaid by your `PolicyGenTemplate` templates. ++ +[NOTE] +==== +Depending on the specific requirements of your clusters, you might need more than a single group policy per cluster type, especially considering that the example group policies each have a single `PerformancePolicy.yaml` file that can only be shared across a set of clusters if those clusters consist of identical hardware configurations. +==== + +. Define all the policy namespaces in a YAML file similar to the example `out/argocd/example/policygentemplates/ns.yaml` file. + +. Add all the `PolicyGenTemplate` files and `ns.yaml` file to the `kustomization.yaml` file, similar to the example `out/argocd/example/policygentemplates/kustomization.yaml` file. + +. Commit the `PolicyGenTemplate` CRs, `ns.yaml` file, and the associated `kustomization.yaml` file in the Git repository. diff --git a/modules/ztp-creating-the-policygentemplates.adoc b/modules/ztp-creating-the-policygentemplates.adoc deleted file mode 100644 index cc25c4719d..0000000000 --- a/modules/ztp-creating-the-policygentemplates.adoc +++ /dev/null @@ -1,19 +0,0 @@ -// Module included in the following assemblies: -// -// *scalability_and_performance/ztp-deploying-disconnected.adoc - -:_content-type: PROCEDURE -[id="ztp-creating-the-policygentemplates_{context}"] -= Creating the PolicyGenTemplates - -Use the following procedure to create the `PolicyGenTemplates` you will need for generating policies in your Git repository for the hub cluster. - -.Procedure - -. Create the `PolicyGenTemplates` and save them to the zero touch provisioning (ZTP) Git repository accessible from the hub cluster and defined as a source repository of the ArgoCD application. - -. ArgoCD detects that the application is out of sync. Upon sync, either automatic or manual, ArgoCD applies the new `PolicyGenTemplate` to the hub cluster and launches the associated resource hooks. These hooks are responsible for generating the policy wrapped configuration CRs that apply to the spoke cluster and perform the following actions: -.. Create the {rh-rhacm-first} (ACM) policies according to the basic distributed unit (DU) profile and required customizations. -.. Apply the generated policies to the hub cluster. - -The ZTP process creates policies that direct ACM to apply the desired configuration to the cluster nodes. diff --git a/modules/ztp-creating-the-site-secrets.adoc b/modules/ztp-creating-the-site-secrets.adoc index 8c35e1a999..6fec240c8a 100644 --- a/modules/ztp-creating-the-site-secrets.adoc +++ b/modules/ztp-creating-the-site-secrets.adoc @@ -10,8 +10,8 @@ Add the required secrets for the site to the hub cluster. These resources must b .Procedure -. Create a secret for authenticating to the site Baseboard Management Controller (BMC). Ensure the secret name matches the name used in the `SiteConfig`. -In this example, the secret name is `test-sno-bmh-secret`: +. Create a secret for authenticating to the site Baseboard Management Controller +(BMC). Ensure that the secret name matches the name used in the `SiteConfig`. In this example, the secret name is `test-sno-bmh-secret`: + [source,yaml] ---- @@ -26,7 +26,9 @@ data: type: Opaque ---- -. Create the pull secret for the site. The pull secret must contain all credentials necessary for installing OpenShift and all add-on Operators. In this example, the secret name is `assisted-deployment-pull-secret`: +. Create the pull secret for the site. The pull secret must contain all credentials necessary +for installing OpenShift and all add-on Operators. In this example, the secret name is +`assisted-deployment-pull-secret`: + [source,yaml] ---- @@ -42,5 +44,6 @@ data: [NOTE] ==== -The secrets are referenced from the `SiteConfig` custom resource (CR) by name. The namespace must match the `SiteConfig` namespace. +The secrets are referenced from the `SiteConfig` custom resource (CR) by name. The namespace +must match the `SiteConfig` namespace. ==== diff --git a/modules/ztp-creating-the-siteconfig-custom-resources.adoc b/modules/ztp-creating-the-siteconfig-custom-resources.adoc deleted file mode 100644 index 9a965c58f0..0000000000 --- a/modules/ztp-creating-the-siteconfig-custom-resources.adoc +++ /dev/null @@ -1,103 +0,0 @@ -// Module included in the following assemblies: -// -// *scalability_and_performance/ztp-deploying-disconnected.adoc - -:_content-type: PROCEDURE -[id="ztp-creating-the-siteconfig-custom-resources_{context}"] -= Creating the SiteConfig custom resources - -ArgoCD acts as the engine for the GitOps method of site deployment. After completing a site plan that contains the required custom resources for the site installation, a policy generator creates the manifests and applies them to the hub cluster. - -.Procedure - -. Create one or more `SiteConfig` custom resources, `site-config.yaml` files, that contains the site-plan data for the -clusters. For example: -+ -[source,yaml] ----- -apiVersion: ran.openshift.io/v1 -kind: SiteConfig -metadata: - name: "test-sno" - namespace: "test-sno" -spec: - baseDomain: "clus2.t5g.lab.eng.bos.redhat.com" - pullSecretRef: - name: "assisted-deployment-pull-secret" - clusterImageSetNameRef: "openshift-4.11" - sshPublicKey: "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDB3dwhI5X0ZxGBb9VK7wclcPHLc8n7WAyKjTNInFjYNP9J+Zoc/ii+l3YbGUTuqilDwZN5rVIwBux2nUyVXDfaM5kPd9kACmxWtfEWTyVRootbrNWwRfKuC2h6cOd1IlcRBM1q6IzJ4d7+JVoltAxsabqLoCbK3svxaZoKAaK7jdGG030yvJzZaNM4PiTy39VQXXkCiMDmicxEBwZx1UsA8yWQsiOQ5brod9KQRXWAAST779gbvtgXR2L+MnVNROEHf1nEjZJwjwaHxoDQYHYKERxKRHlWFtmy5dNT6BbvOpJ2e5osDFPMEd41d2mUJTfxXiC1nvyjk9Irf8YJYnqJgBIxi0IxEllUKH7mTdKykHiPrDH5D2pRlp+Donl4n+sw6qoDc/3571O93+RQ6kUSAgAsvWiXrEfB/7kGgAa/BD5FeipkFrbSEpKPVu+gue1AQeJcz9BuLqdyPUQj2VUySkSg0FuGbG7fxkKeF1h3Sga7nuDOzRxck4I/8Z7FxMF/e8DmaBpgHAUIfxXnRqAImY9TyAZUEMT5ZPSvBRZNNmLbfex1n3NLcov/GEpQOqEYcjG5y57gJ60/av4oqjcVmgtaSOOAS0kZ3y9YDhjsaOcpmRYYijJn8URAH7NrW8EZsvAoF6GUt6xHq5T258c6xSYUm5L0iKvBqrOW9EjbLw== root@cnfdc2.clus2.t5g.lab.eng.bos.redhat.com" - clusters: - - clusterName: "test-sno" - clusterType: "sno" - clusterProfile: "du" - clusterLabels: - group-du-sno: "" - common: true - sites : "test-sno" - clusterNetwork: - - cidr: 1001:db9::/48 - hostPrefix: 64 - machineNetwork: - - cidr: 2620:52:0:10e7::/64 - serviceNetwork: - - 1001:db7::/112 - additionalNTPSources: - - 2620:52:0:1310::1f6 - nodes: - - hostName: "test-sno.clus2.t5g.lab.eng.bos.redhat.com" - bmcAddress: "idrac-virtualmedia+https://[2620:52::10e7:f602:70ff:fee4:f4e2]/redfish/v1/Systems/System.Embedded.1" - bmcCredentialsName: - name: "test-sno-bmh-secret" - bmcDisableCertificateVerification: true <1> - bootMACAddress: "0C:42:A1:8A:74:EC" - bootMode: "UEFI" - rootDeviceHints: - hctl: '0:1:0' - cpuset: "0-1,52-53" - nodeNetwork: - interfaces: - - name: eno1 - macAddress: "0C:42:A1:8A:74:EC" - config: - interfaces: - - name: eno1 - type: ethernet - state: up - macAddress: "0C:42:A1:8A:74:EC" - ipv4: - enabled: false - ipv6: - enabled: true - address: - - ip: 2620:52::10e7:e42:a1ff:fe8a:900 - prefix-length: 64 - dns-resolver: - config: - search: - - clus2.t5g.lab.eng.bos.redhat.com - server: - - 2620:52:0:1310::1f6 - routes: - config: - - destination: ::/0 - next-hop-interface: eno1 - next-hop-address: 2620:52:0:10e7::fc - table-id: 254 ----- -<1> If you are using `UEFI SecureBoot`, add this line to prevent failures due to invalid or local certificates. - -. Save the files and push them to the zero touch provisioning (ZTP) Git repository accessible from the hub cluster and defined as a source repository of the ArgoCD application. - -ArgoCD detects that the application is out of sync. Upon sync, either automatic or manual, ArgoCD synchronizes the `PolicyGenTemplate` to the hub cluster and launches the associated resource hooks. These hooks are responsible for generating the policy wrapped configuration CRs that apply to the spoke cluster. The resource hooks convert the site definitions to installation custom resources and applies them to the hub cluster: - -* `Namespace` - Unique per site -* `AgentClusterInstall` -* `BareMetalHost` -* `ClusterDeployment` -* `InfraEnv` -* `NMStateConfig` -* `ExtraManifestsConfigMap` - Extra manifests. The additional manifests include workload partitioning, chronyd, mountpoint hiding, sctp enablement, and more. -* `ManagedCluster` -* `KlusterletAddonConfig` - -{rh-rhacm-first} (ACM) deploys the hub cluster. diff --git a/modules/ztp-creating-ztp-custom-resources-for-multiple-managed-clusters.adoc b/modules/ztp-creating-ztp-custom-resources-for-multiple-managed-clusters.adoc index c97839518d..2100d6d857 100644 --- a/modules/ztp-creating-ztp-custom-resources-for-multiple-managed-clusters.adoc +++ b/modules/ztp-creating-ztp-custom-resources-for-multiple-managed-clusters.adoc @@ -6,8 +6,8 @@ [id="ztp-creating-ztp-custom-resources-for-multiple-managed-clusters_{context}"] = Creating ZTP custom resources for multiple managed clusters -If you are installing multiple managed clusters, zero touch provisioning (ZTP) uses ArgoCD and `SiteConfig` to manage the processes that create the custom resources (CR) and generate and apply the policies for multiple clusters, in batches of no more than 100, using the GitOps approach. +If you are installing multiple managed clusters, zero touch provisioning (ZTP) uses ArgoCD and `SiteConfig` files to manage the processes that create the CRs and generate and apply the policies for multiple clusters, in batches of no more than 100, using the GitOps approach. Installing and deploying the clusters is a two stage process, as shown here: -image::183_OpenShift_ZTP_0921.png[GitOps approach for Installing and deploying the clusters] +image::217_OpenShift_Zero_Touch_Provisioning_updates_0222_2.png[GitOps approach for Installing and deploying the clusters] diff --git a/modules/ztp-customizing-the-install-extra-manifests.adoc b/modules/ztp-customizing-the-install-extra-manifests.adoc new file mode 100644 index 0000000000..406cb5341e --- /dev/null +++ b/modules/ztp-customizing-the-install-extra-manifests.adoc @@ -0,0 +1,44 @@ +// Module included in the following assemblies: +// +// * scalability_and_performance/ztp-deploying-disconnected.adoc + +:_module-type: PROCEDURE +[id="ztp-customizing-the-install-extra-manifests_{context}"] += Customizing extra installation manifests in the ZTP GitOps pipeline + +You can define a set of extra manifests for inclusion in the installation phase of the zero touch provisioning (ZTP) GitOps pipeline. These manifests are linked to the `SiteConfig` custom resources (CRs) and are applied to the cluster during installation. Including `MachineConfig` CRs at install time makes the installation process more efficient. + +.Prerequisites + +* Create a Git repository where you manage your custom site configuration data. The repository must be accessible from the hub cluster and be defined as a source repository for the Argo CD application. + +.Procedure + +. Create a set of extra manifest CRs that the ZTP pipeline uses to customize the cluster installs. + +. In your custom `/siteconfig` directory, create an `/extra-manifest` folder for your extra manifests. The following example illustrates a sample `/siteconfig` with `/extra-manifest` folder: ++ +[source,text] +---- +siteconfig +├── site1-sno-du.yaml +├── site2-standard-du.yaml +└── extra-manifest + └── 01-example-machine-config.yaml +---- + +. Add your custom extra manifest CRs to the `siteconfig/extra-manifest` directory. + +. In your `SiteConfig` CR, enter the directory name in the `extraManifestPath` field, for example: ++ +[source,yaml] +---- +clusters: +- clusterName: "example-sno" + networkType: "OVNKubernetes" + extraManifestPath: extra-manifest +---- + +. Save the `SiteConfig` CRs and `/extra-manifest` CRs and push them to the site configuration repo. + +The ZTP pipeline appends the CRs in the `/extra-manifest` directory to the default set of extra manifests during cluster provisioning. diff --git a/modules/ztp-definition-of-done-for-ztp-installations.adoc b/modules/ztp-definition-of-done-for-ztp-installations.adoc new file mode 100644 index 0000000000..1934748d11 --- /dev/null +++ b/modules/ztp-definition-of-done-for-ztp-installations.adoc @@ -0,0 +1,33 @@ +// Module included in the following assemblies: +// +// * scalability_and_performance/ztp-deploying-disconnected.adoc + +:_content-type: CONCEPT +[id="ztp-definition-of-done-for-ztp-installations_{context}"] += Indication of done for ZTP installations + +Zero touch provisioning (ZTP) simplifies the process of checking the ZTP installation status for a cluster. The ZTP status moves through three phases: cluster installation, cluster configuration, and ZTP done. + +Cluster installation phase:: +The cluster installation phase is shown by the `ManagedCluster` CR `ManagedClusterJoined` condition. If the `ManagedCluster` CR does not have this condition, or the condition is set to `False`, the cluster is still in the installation phase. Additional details about installation are available from the `AgentClusterInstall` and `ClusterDeployment` CRs. For more information, see "Troubleshooting GitOps ZTP". + +Cluster configuration phase:: +The cluster configuration phase is shown by a `ztp-running` label applied the the `ManagedCluster` CR for the cluster. + +ZTP done:: +Cluster installation and configuration is complete in the ZTP done phase. This is shown by the removal of the `ztp-running` label and addition of the `ztp-done` label to the `ManagedCluster` CR. The `ztp-done` label shows that the configuration has been applied and the baseline DU configuration has completed cluster tuning. ++ +The transition to the ZTP done state is conditional on the compliant state of a {rh-rhacm-first} static validator inform policy. This policy captures the existing criteria for a completed installation and validates that it moves to a compliant state only when ZTP provisioning of the spoke cluster is complete. ++ +The validator inform policy ensures the configuration of the distributed unit (DU) cluster is fully applied and +Operators have completed their initialization. The policy validates the following: ++ +* The target `MachineConfigPool` contains the expected entries and has finished +updating. All nodes are available and not degraded. +* The SR-IOV Operator has completed initialization as indicated by at least one `SriovNetworkNodeState` with `syncStatus: Succeeded`. +* The PTP Operator daemon set exists. ++ +The policy captures the existing criteria for a completed installation and validates that it moves +to a compliant state only when ZTP provisioning of the spoke cluster is complete. ++ +The validator inform policy is included in the reference group `PolicyGenTemplate` CRs. For reliable indication of the ZTP done state, this validator inform policy must be included in the ZTP pipeline. diff --git a/modules/ztp-deploying-a-site.adoc b/modules/ztp-deploying-a-site.adoc new file mode 100644 index 0000000000..f218983800 --- /dev/null +++ b/modules/ztp-deploying-a-site.adoc @@ -0,0 +1,187 @@ +// Module included in the following assemblies: +// +// *scalability_and_performance/ztp-support-for-deployment-of-multi-node-clusters.adoc + +:_content-type: PROCEDURE +[id="ztp-deploying-a-site_{context}"] += Deploying a site + +Use the following procedure to prepare the hub cluster for site deployment and initiate zero touch provisioning (ZTP) by pushing custom resources (CRs) to your Git repository. + +.Procedure + +. Create the required secrets for the site. These resources must be in a namespace with a name matching the cluster name. In `out/argocd/example/siteconfig/example-sno.yaml`, the cluster name and namespace is `example-sno`. ++ +Create the namespace for the cluster using the following commands: ++ +[source,terminal] +---- +$ export CLUSTERNS=example-sno +---- ++ +[source,terminal] +---- +$ oc create namespace $CLUSTERNS +---- + +. Create a pull secret for the cluster. The pull secret must contain all the credentials necessary for installing {product-title} and all required Operators. In all of the example `SiteConfig` CRs, the pull secret is named `assisted-deployment-pull-secret`, as shown below: ++ +[source,terminal] +---- +$ oc apply -f - < + sshPublicKey: "ssh-rsa AAAA..." + clusters: + - clusterName: "example-sno" + networkType: "OVNKubernetes" + clusterLabels: <2> + # These example cluster labels correspond to the bindingRules in the PolicyGenTemplate examples in ../policygentemplates: + # ../policygentemplates/common-ranGen.yaml will apply to all clusters with 'common: true' + common: true + # ../policygentemplates/group-du-sno-ranGen.yaml will apply to all clusters with 'group-du-sno: ""' + group-du-sno: "" + # ../policygentemplates/example-sno-site.yaml will apply to all clusters with 'sites: "example-sno"' + # Normally this should match or contain the cluster name so it only applies to a single cluster + sites : "example-sno" + clusterNetwork: + - cidr: 1001:1::/48 + hostPrefix: 64 + machineNetwork: <3> + - cidr: 1111:2222:3333:4444::/64 + # For 3-node and standard clusters with static IPs, the API and Ingress IPs must be configured here + apiVIP: 1111:2222:3333:4444::1:1 <4> + ingressVIP: 1111:2222:3333:4444::1:2 <5> + + serviceNetwork: + - 1001:2::/112 + additionalNTPSources: + - 1111:2222:3333:4444::2 + nodes: + - hostName: "example-node1.example.com" <6> + role: "master" + bmcAddress: "idrac-virtualmedia+https://[1111:2222:3333:4444::bbbb:1]/redfish/v1/Systems/System.Embedded.1" <7> + bmcCredentialsName: + name: "example-node1-bmh-secret" <8> + bootMACAddress: "AA:BB:CC:DD:EE:11" + bootMode: "UEFI" + rootDeviceHints: + hctl: '0:1:0' + cpuset: "0-1,52-53" + nodeNetwork: <9> + interfaces: + - name: eno1 + macAddress: "AA:BB:CC:DD:EE:11" + config: + interfaces: + - name: eno1 + type: ethernet + state: up + macAddress: "AA:BB:CC:DD:EE:11" + ipv4: + enabled: false + ipv6: + enabled: true + address: + - ip: 1111:2222:3333:4444::1:1 + prefix-length: 64 + dns-resolver: + config: + search: + - example.com + server: + - 1111:2222:3333:4444::2 + routes: + config: + - destination: ::/0 + next-hop-interface: eno1 + next-hop-address: 1111:2222:3333:4444::1 + table-id: 254 +---- +<1> Applies to all cluster types. The value must match an image set available on the hub cluster. To see the list of supported versions on your hub, run `oc get clusterimagesets`. +<2> Applies to all cluster types. These values must correspond to the `PolicyGenTemplate` labels that you define in a later step. +<3> Applies to single node clusters. The value defines the cluster network sections for a single node deployment. +<4> Applies to three-node and standard clusters. The value defines the cluster network sections. +<5> Applies to three-node and standard clusters. The value defines the cluster network sections. +<6> Applies to all cluster types. For single node deployments, define one host. For three-node deployments, define three hosts. For standard deployments, define three hosts with `role: master` and two or more hosts defined with `role: worker`. +<7> Applies to all cluster types. Specifies the BMC address. +<8> Applies to all cluster types. Specifies the BMC credentials. +<9> Applies to all cluster types. Specifies the network settings for the node. + +.. You can inspect the default set of extra-manifest `MachineConfig` CRs in `out/argocd/extra-manifest`. It is automatically applied to the cluster when it is installed. ++ +Optional: To provision additional install-time manifests on the provisioned cluster, create a directory in your Git repository, for example, `sno-extra-manifest/`, and add your custom manifest CRs to this directory. If your `SiteConfig.yaml` refers to this directory in the `extraManifestPath` field, any CRs in this referenced directory are appended to the default set of extra manifests. + +. Add the `SiteConfig` CR to the `kustomization.yaml` file in the `generators` section, similar to the example shown in `out/argocd/example/siteconfig/kustomization.yaml`. + +. Commit your `SiteConfig` CR and associated `kustomization.yaml` in your Git repository. + +. Push your changes to the Git repository. The ArgoCD pipeline detects the changes and begins the site deployment. You can push the changes to the `SiteConfig` CR and the `PolicyGenTemplate` CR simultaneously. ++ +The `SiteConfig` CR creates the following CRs on the hub cluster: ++ +* `Namespace` - Unique per site +* `AgentClusterInstall` +* `BareMetalHost` - One per node +* `ClusterDeployment` +* `InfraEnv` +* `NMStateConfig` - One per node +* `ExtraManifestsConfigMap` - Extra manifests. The additional manifests include workload partitioning, chronyd, mountpoint hiding, sctp enablement, and more. +* `ManagedCluster` +* `KlusterletAddonConfig` diff --git a/modules/ztp-deploying-additional-changes-to-clusters.adoc b/modules/ztp-deploying-additional-changes-to-clusters.adoc new file mode 100644 index 0000000000..c18f29f6d3 --- /dev/null +++ b/modules/ztp-deploying-additional-changes-to-clusters.adoc @@ -0,0 +1,32 @@ +// Module included in the following assemblies: +// +// * scalability_and_performance/ztp-deploying-disconnected.adoc + +:_module-type: CONCEPT +[id="ztp-deploying-additional-changes-to-clusters_{context}"] += Deploying additional changes to clusters + +Custom resources (CRs) that are deployed through the GitOps zero touch provisioning (ZTP) pipeline support two goals: + +. Deploying additional Operators to spoke clusters that are required by typical RAN DU applications running at the network far-edge. + +. Customizing the {product-title} installation to provide a high performance platform capable of meeting the strict timing requirements in a minimal CPU budget. + +If you require cluster configuration changes outside of the base GitOps ZTP pipeline configuration, there are three options: + +Apply the additional configuration after the ZTP pipeline is complete:: + +When the GitOps ZTP pipeline deployment is complete, the deployed cluster is ready for application workloads. At this point, you can install additional Operators and apply configurations specific to your requirements. Ensure that additional configurations do not negatively affect the performance of the platform or allocated CPU budget. + +Add content to the ZTP library:: + +The base source CRs that you deploy with the GitOps ZTP pipeline can be augmented with custom content as required. + +Create extra manifests for the cluster installation:: + +Extra manifests are applied during installation and makes the installation process more efficient. + +[IMPORTANT] +==== +Providing additional source CRs or modifying existing source CRs can significantly impact the performance or CPU profile of {product-title}. +==== diff --git a/modules/ztp-disconnected-environment-prereqs.adoc b/modules/ztp-disconnected-environment-prereqs.adoc deleted file mode 100644 index fe563034d0..0000000000 --- a/modules/ztp-disconnected-environment-prereqs.adoc +++ /dev/null @@ -1,21 +0,0 @@ -// Module included in the following assemblies: -// -// scalability_and_performance/ztp-deploying-disconnected.adoc - -:_content-type: CONCEPT -[id="ztp-disconnected-environment-prereqs_{context}"] -= Disconnected environment prerequisites - -You must have a container image registry that supports link:https://docs.docker.com/registry/spec/manifest-v2-2/[Docker v2-2] in the location that will host the {product-title} cluster, such as one of the following registries: - -* link:https://www.redhat.com/en/technologies/cloud-computing/quay[Red Hat Quay] -* link:https://jfrog.com/artifactory/[JFrog Artifactory] -* link:https://www.sonatype.com/products/repository-oss?topnav=true[Sonatype Nexus Repository] -* link:https://goharbor.io/[Harbor] - -If you have an entitlement to Red Hat Quay, see the documentation on deploying Red Hat Quay link:https://access.redhat.com/documentation/en-us/red_hat_quay/3.5/html/deploy_red_hat_quay_for_proof-of-concept_non-production_purposes/[for proof-of-concept purposes] or link:https://access.redhat.com/documentation/en-us/red_hat_quay/3.5/html/deploy_red_hat_quay_on_openshift_with_the_quay_operator/[by using the Quay Operator]. If you need additional assistance selecting and installing a registry, contact your sales representative or Red Hat support. - -[NOTE] -==== -Red Hat does not test third party registries with {product-title}. -==== diff --git a/modules/ztp-du-host-bios-requirements.adoc b/modules/ztp-du-host-bios-requirements.adoc index 8383e0bb57..42f4f01561 100644 --- a/modules/ztp-du-host-bios-requirements.adoc +++ b/modules/ztp-du-host-bios-requirements.adoc @@ -9,11 +9,6 @@ Distributed unit (DU) hosts require the BIOS to be configured before the host can be provisioned. The BIOS configuration is dependent on the specific hardware that runs your DUs and the particular requirements of your installation. -[IMPORTANT] -==== -In this Developer Preview release, configuration and tuning of BIOS for DU bare-metal host machines is the responsibility of the customer. Automatic setting of BIOS is not handled by the zero touch provisioning workflow. -==== - .Procedure . Set the *UEFI/BIOS Boot Mode* to `UEFI`. diff --git a/modules/ztp-how-to-plan-your-ran-policies.adoc b/modules/ztp-how-to-plan-your-ran-policies.adoc new file mode 100644 index 0000000000..636e17d25e --- /dev/null +++ b/modules/ztp-how-to-plan-your-ran-policies.adoc @@ -0,0 +1,28 @@ +// Module included in the following assemblies: +// +// scalability_and_performance/ztp-deploying-disconnected.adoc + +:_content-type: CONCEPT +[id="ztp-how-to-plan-your-ran-policies_{context}"] += How to plan your RAN policies + +Zero touch provisioning (ZTP) uses {rh-rhacm-first} to apply the radio access network (RAN) configuration using a policy-based governance approach to apply the configuration. + +The policy generator or `PolicyGen` is a part of the GitOps ZTP tooling that facilitates creating {rh-rhacm} policies from a set of predefined custom resources. There are three main items: policy categorization, source CR policy, and the `PolicyGenTemplate` CR. `PolicyGen` uses these to generate the policies and their placement bindings and rules. + +The following diagram shows how the RAN policy generator interacts with GitOps and {rh-rhacm}. + +image::217_OpenShift_Zero_Touch_Provisioning_updates_0222_3.png[RAN policy generator] + +RAN policies are categorized into three main groups: + +Common:: A policy that exists in the `Common` category is applied to all clusters to be represented by the site plan. Cluster types include single node, three-node, and standard clusters. + +Groups:: A policy that exists in the `Groups` category is applied to a group of clusters. Every group of clusters could have their own policies that exist under the +`Groups` category. For example, `Groups/group1` can have its own policies that are applied to the clusters belonging to `group1`. +You can also define a group for each cluster type: single node, three-node, and standard clusters. + +Sites:: A policy that exists in the `Sites` category is applied to a specific cluster. Any cluster +could have its own policies that exist in the `Sites` category. +For example, `Sites/cluster1` has its own policies applied to `cluster1`. +You can also define an example site-specific configuration for each cluster type: single node, three-node, and standard clusters. diff --git a/modules/ztp-installing-preparing-mirror.adoc b/modules/ztp-installing-preparing-mirror.adoc deleted file mode 100644 index db8021d640..0000000000 --- a/modules/ztp-installing-preparing-mirror.adoc +++ /dev/null @@ -1,9 +0,0 @@ -// Module included in the following assemblies: -// -// scalability_and_performance/ztp-deploying-disconnected.adoc - -:_content-type: CONCEPT -[id="ztp-installing-preparing-mirror_{context}"] -= Preparing your mirror host - -Before you perform the mirror procedure, you must prepare the host to retrieve content and push it to the remote location. diff --git a/modules/ztp-installing-the-new-gitops-ztp-applications.adoc b/modules/ztp-installing-the-new-gitops-ztp-applications.adoc new file mode 100644 index 0000000000..8938ed5d62 --- /dev/null +++ b/modules/ztp-installing-the-new-gitops-ztp-applications.adoc @@ -0,0 +1,18 @@ +// Module included in the following assemblies: +// +// *scalability_and_performance/ztp-deploying-disconnected.adoc + +:_content-type: PROCEDURE +[id="ztp-installing-the-new-gitops-ztp-applications_{context}"] += Installing the new GitOps ZTP applications + +Using the extracted `argocd/deployment` directory, and after ensuring that the applications point to your Git repository, apply the full contents of the deployment directory. Applying the full contents of the directory ensures that all necessary resources for the applications are correctly configured. + +.Procedure + +* Apply the contents of the `argocd/deployment` directory using the following command: ++ +[source,terminal] +---- +$ oc apply -k out/argocd/deployment +---- diff --git a/modules/ztp-labeling-the-existing-clusters.adoc b/modules/ztp-labeling-the-existing-clusters.adoc new file mode 100644 index 0000000000..ed89b22a14 --- /dev/null +++ b/modules/ztp-labeling-the-existing-clusters.adoc @@ -0,0 +1,25 @@ +// Module included in the following assemblies: +// +// *scalability_and_performance/ztp-deploying-disconnected.adoc + +:_content-type: PROCEDURE +[id="ztp-labeling-the-existing-clusters_{context}"] += Labeling the existing clusters + +To ensure that existing clusters remain untouched by the tooling updates, all existing managed clusters must be labeled with the `ztp-done` label. + +.Procedure + +. Find a label selector that lists the managed clusters that were deployed with zero touch provisioning (ZTP), such as `local-cluster!=true`: ++ +[source,terminal] +---- +$ oc get managedcluster -l 'local-cluster!=true' +---- + +. Ensure that the resulting list contains all the managed clusters that were deployed with ZTP, and then use that selector to add the `ztp-done` label: ++ +[source,terminal] +---- +$ oc label managedcluster -l 'local-cluster!=true' ztp-done= +---- diff --git a/modules/ztp-low-latency-for-distributed-units-dus.adoc b/modules/ztp-low-latency-for-distributed-units-dus.adoc index b2ab98821a..3d3bdb447c 100644 --- a/modules/ztp-low-latency-for-distributed-units-dus.adoc +++ b/modules/ztp-low-latency-for-distributed-units-dus.adoc @@ -1,6 +1,6 @@ // Module included in the following assemblies: // -// *scalability_and_performance/ztp-deploying-disconnected.adoc +// * scalability_and_performance/ztp-deploying-disconnected.adoc :_content-type: CONCEPT [id="ztp-low-latency-for-distributed-units-dus_{context}"] @@ -9,8 +9,7 @@ Low latency is an integral part of the development of 5G networks. Telecommunications networks require as little signal delay as possible to ensure quality of service in a variety of critical use cases. -Low latency processing is essential for any communication with timing constraints that affect functionality and -security. For example, 5G Telco applications require a guaranteed one millisecond one-way latency to meet Internet of Things (IoT) requirements. Low latency is also critical for the future development of autonomous vehicles, smart factories, and online gaming. Networks in these environments require almost a real-time flow of data. +Low latency processing is essential for any communication with timing constraints that affect functionality and security. For example, 5G Telco applications require a guaranteed one millisecond one-way latency to meet Internet of Things (IoT) requirements. Low latency is also critical for the future development of autonomous vehicles, smart factories, and online gaming. Networks in these environments require almost a real-time flow of data. Low latency systems are about guarantees with regards to response and processing times. This includes keeping a communication protocol running smoothly, ensuring device security with fast responses to error conditions, or just making sure a system is not lagging behind when receiving a lot of data. Low latency is key for optimal synchronization of radio transmissions. diff --git a/modules/ztp-machine-config-operator.adoc b/modules/ztp-machine-config-operator.adoc deleted file mode 100644 index e751f8c696..0000000000 --- a/modules/ztp-machine-config-operator.adoc +++ /dev/null @@ -1,11 +0,0 @@ -// Module included in the following assemblies: -// -// scalability_and_performance/ztp-deploying-disconnected.adoc - -:_content-type: CONCEPT -[id="ztp-machine-config-operator_{context}"] -= Machine Config Operator - -The Machine Config Operator enables system definitions and low-level system settings such as workload partitioning, NTP, and SCTP. This Operator is installed with {product-title}. - -A performance profile and its created products are applied to a node according to an associated machine config pool (MCP). The MCP holds valuable information about the progress of applying the machine configurations created by performance addons that encompass kernel args, kube config, huge pages allocation, and deployment of the realtime kernel (rt-kernel). The performance addons controller monitors changes in the MCP and updates the performance profile status accordingly. diff --git a/modules/ztp-creating-siteconfig-custom-resources.adoc b/modules/ztp-manually-install-a-single-managed-cluster.adoc similarity index 82% rename from modules/ztp-creating-siteconfig-custom-resources.adoc rename to modules/ztp-manually-install-a-single-managed-cluster.adoc index 133d4b2b35..f4c0b26d12 100644 --- a/modules/ztp-creating-siteconfig-custom-resources.adoc +++ b/modules/ztp-manually-install-a-single-managed-cluster.adoc @@ -3,21 +3,19 @@ // *scalability_and_performance/ztp-deploying-disconnected.adoc :_content-type: PROCEDURE -[id="ztp-creating-siteconfig-custom-resources_{context}"] -= Creating custom resources to install a single managed cluster +[id="ztp-manually-install-a-single-managed-cluster_{context}"] += Manually install a single managed cluster -This procedure tells you how to manually create and deploy a single managed cluster. If you are creating multiple clusters, perhaps hundreds, use the `SiteConfig` method described in -“Creating ZTP custom resources for multiple managed clusters”. +This procedure tells you how to manually create and deploy a single managed cluster. If you are creating multiple clusters, perhaps hundreds, use the `SiteConfig` method described in “Creating ZTP custom resources for multiple managed clusters”. .Prerequisites -* Enable Assisted Installer Service. +* Enable the Assisted Installer service. * Ensure network connectivity: ** The container within the hub must be able to reach the Baseboard Management Controller (BMC) address of the target bare-metal host. -** The managed cluster must be able to resolve and reach the hub’s API `hostname` and `{asterisk}.app` hostname. -Example of the hub’s API and `{asterisk}.app` hostname: +** The managed cluster must be able to resolve and reach the hub’s API `hostname` and `{asterisk}.app` hostname. Here is an example of the hub’s API and `{asterisk}.app` hostname: + [source,terminal] ---- @@ -25,8 +23,7 @@ console-openshift-console.apps.hub-cluster.internal.domain.com api.hub-cluster.internal.domain.com ---- -** The hub must be able to resolve and reach the API and `{asterisk}.app` hostname of the managed cluster. -Here is an example of the managed cluster’s API and `{asterisk}.app` hostname: +** The hub must be able to resolve and reach the API and `{asterisk}.app` hostname of the managed cluster. Here is an example of the managed cluster’s API and `{asterisk}.app` hostname: + [source,terminal] ---- @@ -34,19 +31,19 @@ console-openshift-console.apps.sno-managed-cluster-1.internal.domain.com api.sno-managed-cluster-1.internal.domain.com ---- -** A DNS Server that is IP reachable from the target bare-metal host. +** A DNS server that is IP reachable from the target bare-metal host. * A target bare-metal host for the managed cluster with the following hardware minimums: ** 4 CPU or 8 vCPU ** 32 GiB RAM -** 120 GiB Disk for root filesystem +** 120 GiB disk for root file system -* When working in a disconnected environment, the release image needs to be mirrored. Use this command to mirror the release image: +* When working in a disconnected environment, the release image must be mirrored. Use this command to mirror the release image: + [source,terminal] ---- -oc adm release mirror -a +$ oc adm release mirror -a --from=quay.io/openshift-release-dev/ocp-release:{{ mirror_version_spoke_release }} --to={{ provisioner_cluster_registry }}/ocp4 --to-release-image={{ provisioner_cluster_registry }}/ocp4:{{ mirror_version_spoke_release }} @@ -54,7 +51,8 @@ provisioner_cluster_registry }}/ocp4:{{ mirror_version_spoke_release }} * You mirrored the ISO and `rootfs` used to generate the spoke cluster ISO to an HTTP server and configured the settings to pull images from there. + -The images must match the version of the `ClusterImageSet`. To deploy a 4.11.0 version, the `rootfs` and ISO need to be set at 4.11.0. +The images must match the version of the `ClusterImageSet`. To deploy a 4.9.0 version, the `rootfs` and +ISO must be set at 4.9.0. .Procedure @@ -66,9 +64,9 @@ The images must match the version of the `ClusterImageSet`. To deploy a 4.11.0 v apiVersion: hive.openshift.io/v1 kind: ClusterImageSet metadata: - name: openshift-4.11.0-rc.0 <1> + name: openshift-4.9.0-rc.0 <1> spec: - releaseImage: quay.io/openshift-release-dev/ocp-release:4.11.0-x86_64 <2> + releaseImage: quay.io/openshift-release-dev/ocp-release:4.9.0-x86_64 <2> ---- <1> The descriptive version that you want to deploy. <2> Specifies the `releaseImage` to deploy and determines the OS Image version. The discovery ISO is based on an OS image version as the `releaseImage`, or latest if the exact version is unavailable. @@ -149,15 +147,15 @@ spec: sshPublicKey: <5> ---- + -<1> The name of the ClusterImageSet custom resource used to install {product-title} on the bare-metal host. +<1> The name of the `ClusterImageSet` custom resource used to install {product-title} on the bare-metal host. <2> A block of IPv4 or IPv6 addresses in CIDR notation used for communication among cluster nodes. <3> A block of IPv4 or IPv6 addresses in CIDR notation used for the target bare-metal host external communication. Also used to determine the API and Ingress VIP addresses when provisioning DU single-node clusters. <4> A block of IPv4 or IPv6 addresses in CIDR notation used for cluster services internal communication. -<5> Entered as plain text. You can use the public key to SSH into the node after it has finished installing. +<5> A plain text string. You can use the public key to SSH into the node after it has finished installing. + [NOTE] ==== -If you want to configure a static IP for the managed cluster at this point, see the procedure in this document for configuring static IP addresses for managed clusters. +If you want to configure a static IP address for the managed cluster at this point, see the procedure in this document for configuring static IP addresses for managed clusters. ==== @@ -216,7 +214,7 @@ spec: enabled: false <1> ---- + -<1> Set to `true` to enable KlusterletAddonConfig or `false` to disable the KlusterletAddonConfig. Keep `searchCollector` disabled. +<1> Keep `searchCollector` disabled. Set to `true` to enable the `KlusterletAddonConfig` CR or `false` to disable the `KlusterletAddonConfig` CR. . Create the `ManagedCluster` custom resource: + @@ -244,13 +242,13 @@ spec: name: namespace: sshAuthorizedKey: <1> - agentLabels: <2> - location: "" + agentLabelSelector: + matchLabels: + cluster-name: pullSecretRef: name: assisted-deployment-pull-secret ---- <1> Entered as plain text. You can use the public key to SSH into the target bare-metal host when it boots from the ISO. -<2> Sets a label to match. The labels apply when the agents boot. . Create the `BareMetalHost` custom resource: + @@ -282,6 +280,6 @@ Optionally, you can add `bmac.agent-install.openshift.io/hostname: ` . After you have created the custom resources, push the entire directory of generated custom resources to the Git repository you created for storing the custom resources. -.Next step +.Next steps To provision additional clusters, repeat this procedure for each cluster. diff --git a/modules/ztp-monitoring-deployment-progress.adoc b/modules/ztp-monitoring-deployment-progress.adoc new file mode 100644 index 0000000000..23d674a237 --- /dev/null +++ b/modules/ztp-monitoring-deployment-progress.adoc @@ -0,0 +1,55 @@ +// Module included in the following assemblies: +// +// *scalability_and_performance/ztp-deploying-disconnected.adoc + +:_content-type: PROCEDURE +[id="ztp-monitoring-deployment-progress_{context}"] += Monitoring deployment progress + +The ArgoCD pipeline uses the `SiteConfig` and `PolicyGenTemplate` CRs in Git to generate the cluster configuration CRs and {rh-rhacm} policies and then sync them to the hub. You can monitor the progress of this synchronization can be monitored in the ArgoCD dashboard. + +.Procedure + +When the synchronization is complete, the installation generally proceeds as follows: + +. The Assisted Service Operator installs {product-title} on the cluster. You can monitor the progress of cluster installation from the {rh-rhacm} dashboard or from the command line: ++ +[source,terminal] +---- +$ export CLUSTER= +---- ++ +[source,terminal] +---- +$ oc get agentclusterinstall -n $CLUSTER $CLUSTER -o jsonpath='{.status.conditions[?(@.type=="Completed")]}' | jq +---- ++ +[source,terminal] +---- +$ curl -sk $(oc get agentclusterinstall -n $CLUSTER $CLUSTER -o jsonpath='{.status.debugInfo.eventsURL}') | jq '.[-2,-1]' +---- + +. The {cgu-operator-first} applies the configuration policies that are bound to the cluster. ++ +After the cluster installation is complete and the cluster becomes `Ready`, a `ClusterGroupUpgrade` CR corresponding to this cluster, with a list of ordered policies defined by the `ran.openshift.io/ztp-deploy-wave annotations`, is automatically created by the {cgu-operator}. The cluster's policies are applied in the order listed in `ClusterGroupUpgrade` CR. You can monitor the high-level progress of configuration policy reconciliation using the following commands: ++ +[source,terminal] +---- +$ export CLUSTER= +---- ++ +[source,terminal] +---- +$ oc get clustergroupupgrades -n ztp-install $CLUSTER -o jsonpath='{.status.conditions[?(@.type=="Ready")]}' +---- + +. You can monitor the detailed policy compliant status using the {rh-rhacm} dashboard or the command line: ++ +[source,terminal] +---- +$ oc get policies -n $CLUSTER +---- + +The final policy that becomes compliant is the one defined in the `*-du-validator-policy` policies. This policy, when compliant on a cluster, ensures that all cluster configuration, Operator installation, and Operator configuration is complete. + +After all policies become complaint, the `ztp-done` label is added to the cluster, indicating the entire ZTP pipeline is complete for the cluster. diff --git a/modules/ztp-pgt-config-best-practices.adoc b/modules/ztp-pgt-config-best-practices.adoc new file mode 100644 index 0000000000..35faa17c70 --- /dev/null +++ b/modules/ztp-pgt-config-best-practices.adoc @@ -0,0 +1,17 @@ +// Module included in the following assemblies: +// +// scalability_and_performance/ztp-deploying-disconnected.adoc + +:_module-type: CONCEPT +[id="ztp-pgt-config-best-practices_{context}"] += Best practices when customizing PolicyGenTemplate CRs + +Consider the following best practices when customizing site configuration `PolicyGenTemplate` CRs: + +* Use as few policies as necessary. Using fewer policies means using less resources. Each additional policy creates overhead for the hub cluster and the deployed spoke cluster. CRs are combined into policies based on the `policyName` field in the `PolicyGenTemplate` CR. CRs in the same `PolicyGenTemplate` which have the same value for `policyName` are managed under a single policy. + +* Use a single catalog source for all Operators. In disconnected environments, configure the registry as a single index containing all Operators. Each additional `CatalogSource` on the spoke clusters increases CPU usage. + +* `MachineConfig` CRs should be included as `extraManifests` in the `SiteConfig` CR so that they are applied during installation. This can reduce the overall time taken until the cluster is ready to deploy applications. + +* `PolicyGenTemplates` should override the channel field to explicitly identify the desired version. This ensures that changes in the source CR during upgrades does not update the generated subscription. diff --git a/modules/ztp-policygentemplates-for-ran.adoc b/modules/ztp-policygentemplates-for-ran.adoc new file mode 100644 index 0000000000..d63db28199 --- /dev/null +++ b/modules/ztp-policygentemplates-for-ran.adoc @@ -0,0 +1,34 @@ +// Module included in the following assemblies: +// +// scalability_and_performance/ztp-deploying-disconnected.adoc + +:_module-type: CONCEPT +[id="ztp-policygentemplates-for-ran_{context}"] += PolicyGenTemplate CRs for RAN deployments + +You use `PolicyGenTemplate` custom resources (CRs) to customize the configuration applied to the cluster using the GitOps zero touoch provisioning (ZTP) pipeline. The baseline configuration, obtained from the GitOps ZTP container, is designed to provide a set of critical features and node tuning settings that ensure the cluster can support the stringent performance and resource utilization constraints typical of RAN Distributed Unit (DU) applications. Changes or omissions from the baseline configuration can affect feature availability, performance, and resource utilization. Use `PolicyGenTemplate` CRs as the basis to create a hierarchy of configuration files tailored to your specific site requirements. + +The baseline `PolicyGenTemplate` CRs that are defined for RAN DU cluster configuration can be extracted from the GitOps ZTP `ztp-site-generator`. See "Preparing the ZTP Git repository" for further details. + +The `PolicyGenTemplate` CRs can be found in the `./out/argocd/example/policygentemplates` folder. The reference architecture has common, group, and site-specific configuration CRs. Each `PolicyGenTemplate` CR refers to other CRs that can be found in the `./out/source-crs` folder. + +The `PolicyGenTemplate` CRs relevant to RAN cluster configuration are described below. Variants are provided for the group `PolicyGenTemplate` CRs to account for differences in single-node, three-node compact, and standard cluster configurations. Similarly, site-specific configuration variants are provided for single-node clusters and multi-node (compact or standard) clusters. Use the group and site-specific configuration variants that are relevant for your deployment. + +.PolicyGenTemplate CRs for RAN deployments +[cols=2*, options="header"] +|==== +|PolicyGenTemplate CR +|Description + +|`common-ranGen.yaml` +|Contains a set of common RAN CRs that get applied to all clusters. These CRs subscribe to a set of operators providing cluster features typical for RAN as well as baseline cluster tuning. + +|`group-du-3node-ranGen.yaml` +|Contains the RAN policies for three-node clusters only. + +|`group-du-sno-ranGen.yaml` +|Contains the RAN policies for single-node clusters only. + +|`group-du-standard-ranGen.yaml` +|Contains the RAN policies for standard three control-plane clusters. +|==== diff --git a/modules/ztp-precision-time-protocol-operator.adoc b/modules/ztp-precision-time-protocol-operator.adoc deleted file mode 100644 index 881158144f..0000000000 --- a/modules/ztp-precision-time-protocol-operator.adoc +++ /dev/null @@ -1,9 +0,0 @@ -// Module included in the following assemblies: -// -// scalability_and_performance/ztp-deploying-disconnected.adoc - -:_content-type: CONCEPT -[id="ztp-precision-time-protocol-operator_{context}"] -= Precision Time Protocol Operator - -Precision Time Protocol (PTP) is used to synchronize clocks in a network. The PTP Operator discovers PTP-capable devices in the cluster and creates and manages `linuxptp` services for those devices. The PTP Operator also deploys a PTP fast events infrastructure. vDU applications use PTP fast events notifications to report on clock events that can negatively affect the performance and reliability of the application. PTP fast events are distributed over an Advanced Message Queuing Protocol (AMQP) event notification bus. diff --git a/modules/ztp-preparing-for-the-gitops-ztp-upgrade.adoc b/modules/ztp-preparing-for-the-gitops-ztp-upgrade.adoc new file mode 100644 index 0000000000..70d429f611 --- /dev/null +++ b/modules/ztp-preparing-for-the-gitops-ztp-upgrade.adoc @@ -0,0 +1,36 @@ +// Module included in the following assemblies: +// +// *scalability_and_performance/ztp-deploying-disconnected.adoc + +:_content-type: PROCEDURE +[id="ztp-preparing-for-the-gitops-ztp-upgrade_{context}"] += Preparing for the upgrade + +Use the following procedure to prepare your site for the GitOps zero touch provisioning (ZTP) upgrade. + +.Procedure + +. Obtain the latest version of the GitOps ZTP container from which you can extract a set of custom resources (CRs) used to configure the GitOps operator on the hub cluster for use in the GitOps ZTP solution. + +. Extract the `argocd/deployment` directory using the following commands: ++ +[source,terminal] +---- +$ mkdir -p ./out +---- ++ +[source,terminal] +---- +$ podman run --log-driver=none --rm registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.10 extract /home/ztp --tar | tar x -C ./out +---- ++ +The `/out` directory contains the following subdirectories: ++ +* `out/extra-manifest`: contains the source CR files that the `SiteConfig` CR uses to generate the extra manifest `configMap`. +* `out/source-crs`: contains the source CR files that the `PolicyGenTemplate` CR uses to generate the {rh-rhacm-first} policies. +* `out/argocd/deployment`: contains patches and YAML files to apply on the hub cluster for use in the next step of this procedure. +* `out/argocd/example`: contains example `SiteConfig` and `PolicyGenTemplate` files that represent the recommended configuration. + +. Update the `clusters-app.yaml` and `policies-app.yaml` files to reflect the name of your applications and the URL, branch, and path for your Git repository. + +If the upgrade includes changes to policies that may result in obsolete policies, these policies should be removed prior to performing the upgrade. diff --git a/modules/ztp-preparing-the-hub-cluster-for-ztp.adoc b/modules/ztp-preparing-the-hub-cluster-for-ztp.adoc index 85c0bfcf28..0c2c8f1205 100644 --- a/modules/ztp-preparing-the-hub-cluster-for-ztp.adoc +++ b/modules/ztp-preparing-the-hub-cluster-for-ztp.adoc @@ -8,103 +8,37 @@ You can configure your hub cluster with a set of ArgoCD applications that generate the required installation and policy custom resources (CR) for each site based on a zero touch provisioning (ZTP) GitOps flow. +.Prerequisites + +* Openshift Cluster 4.8 or 4.9 as the hub cluster +* {rh-rhacm-first} Operator 2.3 or 2.4 installed on the hub cluster +* Red Hat OpenShift GitOps Operator 1.3 on the hub cluster + .Procedure -. Install the Red Hat OpenShift GitOps Operator on your hub cluster. - -. Extract the administrator password for ArgoCD: -+ -[source,terminal] ----- -$ oc get secret openshift-gitops-cluster -n openshift-gitops -o jsonpath='{.data.admin\.password}' | base64 -d ----- +. Install the {cgu-operator-first}, which coordinates with any new sites added by ZTP and manages application of the `PolicyGenTemplate`-generated policies. . Prepare the ArgoCD pipeline configuration: -.. Extract the ArgoCD deployment CRs from the ZTP site generator container using the latest container image version: ++ +.. Create a Git repository with the directory structure similar to the example directory. For more information, see "Preparing the ZTP Git repository". + +.. Configure access to the repository using the ArgoCD UI. Under *Settings* configure the following: ++ +* *Repositories* - Add the connection information. The URL must end in `.git`, for example, `https://repo.example.com/repo.git` and credentials. + +* *Certificates* - Add the public certificate for the repository, if needed. + +.. Modify the two ArgoCD Applications, `out/argocd/deployment/clusters-app.yaml` and `out/argocd/deployment/policies-app.yaml`, based on your Git repository: ++ +* Update the URL to point to the Git repository. The URL must end with `.git`, for example, `https://repo.example.com/repo.git`. + +* The `targetRevision` must indicate which Git repository branch to monitor. + +* The path should specify the path to the `SiteConfig` or `PolicyGenTemplate` CRs, respectively. + +. Apply the pipeline configuration to your hub cluster using the following command: + [source,terminal] ---- -$ mkdir ztp -$ podman run --rm -v `pwd`/ztp:/mnt/ztp:Z registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.10.0-1 /bin/bash -c "cp -ar /usr/src/hook/ztp/* /mnt/ztp/" ----- -+ -The remaining steps in this section relate to the `ztp/gitops-subscriptions/argocd/` directory. - -.. Modify the source values of the two ArgoCD applications, `deployment/clusters-app.yaml` and `deployment/policies-app.yaml` with appropriate URL, `targetRevision` branch, and path values. The path values must match those used in your Git repository. -+ -Modify `deployment/clusters-app.yaml`: -+ -[source,yaml] ----- -apiVersion: v1 -kind: Namespace -metadata: - name: clusters-sub ---- -apiVersion: argoproj.io/v1alpha1 -kind: Application -metadata: - name: clusters - namespace: openshift-gitops -spec: - destination: - server: https://kubernetes.default.svc - namespace: clusters-sub - project: default - source: - path: ztp/gitops-subscriptions/argocd/resource-hook-example/siteconfig <1> - repoURL: https://github.com/openshift-kni/cnf-features-deploy <2> - targetRevision: master <3> - syncPolicy: - automated: - prune: true - selfHeal: true - syncOptions: - - CreateNamespace=true ----- -<1> The `ztp/gitops-subscriptions/argocd/` file path that contains the `siteconfig` CRs for the clusters. -<2> The URL of the Git repository that contains the `siteconfig` custom resources that define site configuration for installing clusters. -<3> The branch on the Git repository that contains the relevant site configuration data. - -.. Modify `deployment/policies-app.yaml`: -+ -[source,yaml] ----- -apiVersion: v1 -kind: Namespace -metadata: - name: policies-sub ---- -apiVersion: argoproj.io/v1alpha1 -kind: Application -metadata: - name: policies - namespace: openshift-gitops -spec: - destination: - server: https://kubernetes.default.svc - namespace: policies-sub - project: default - source: - directory: - recurse: true - path: ztp/gitops-subscriptions/argocd/resource-hook-example/policygentemplates <1> - repoURL: https://github.com/openshift-kni/cnf-features-deploy <2> - targetRevision: master <3> - syncPolicy: - automated: - prune: true - selfHeal: true - syncOptions: - - CreateNamespace=true ----- -<1> The `ztp/gitops-subscriptions/argocd/` file path that contains the `policygentemplates` CRs for the clusters. -<2> The URL of the Git repository that contains the `policygentemplates` custom resources that specify configuration data for the site. -<3> The branch on the Git repository that contains the relevant configuration data. - -. To apply the pipeline configuration to your hub cluster, enter this command: -+ -[source,terminal] ----- -$ oc apply -k ./deployment +$ oc apply -k out/argocd/deployment ---- diff --git a/modules/ztp-preparing-the-ztp-git-repository.adoc b/modules/ztp-preparing-the-ztp-git-repository.adoc index c5876a945e..cdd62638d2 100644 --- a/modules/ztp-preparing-the-ztp-git-repository.adoc +++ b/modules/ztp-preparing-the-ztp-git-repository.adoc @@ -12,13 +12,50 @@ Create a Git repository for hosting site configuration data. The zero touch prov . Create a directory structure with separate paths for the `SiteConfig` and `PolicyGenTemplate` custom resources (CR). -. Add `pre-sync.yaml` and `post-sync.yaml` from `resource-hook-example//` to the path for the `PolicyGenTemplate` CRs. - -. Add `pre-sync.yaml` and `post-sync.yaml` from `resource-hook-example//` to the path for the `SiteConfig` CRs. +. Export the `argocd` directory from the `ztp-site-generate` container image using the following commands: + -[NOTE] -==== -If your hub cluster operates in a disconnected environment, you must update the `image` for all four pre and post sync hook CRs. -==== +[source,terminal] +---- +$ podman pull registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.10 +---- ++ +[source,terminal] +---- +$ mkdir -p ./out +---- ++ +[source,terminal] +---- +$ podman run --log-driver=none --rm registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.10 extract /home/ztp --tar | tar x -C ./out +---- -. Apply the `policygentemplates.ran.openshift.io` and `siteconfigs.ran.openshift.io` CR definitions. +. Check that the `out` directory contains the following subdirectories: ++ +* `out/extra-manifest` contains the source CR files that `SiteConfig` uses to generate extra manifest `configMap`. +* `out/source-crs` contains the source CR files that `PolicyGenTemplate` uses to generate the {rh-rhacm-first} policies. +* `out/argocd/deployment` contains patches and YAML files to apply on the hub cluster for use in the next step of this procedure. +* `out/argocd/example` contains the examples for `SiteConfig` and `PolicyGenTemplate` files that represent the recommended configuration. + +The directory structure under `out/argocd/example` serves as a reference for the structure and content of your Git repository. The example includes `SiteConfig` and `PolicyGenTemplate` reference CRs for single-node, three-node, and standard clusters. Remove references to cluster types that you are not using. The following example describes a set of CRs for a network of single-node clusters: + +[source,terminal] +---- +example/ +├── policygentemplates +│ ├── common-ranGen.yaml +│ ├── example-sno-site.yaml +│ ├── group-du-sno-ranGen.yaml +│ ├── group-du-sno-validator-ranGen.yaml +│ ├── kustomization.yaml +│ └── ns.yaml +└── siteconfig + ├── example-sno.yaml + ├── KlusterletAddonConfigOverride.yaml + └── kustomization.yaml +---- + +Keep `SiteConfig` and `PolicyGenTemplate` CRs in separate directories. Both the `SiteConfig` and `PolicyGenTemplate` directories must contain a `kustomization.yaml` file that explicitly includes the files in that directory. + +This directory structure and the `kustomization.yaml` files must be committed and pushed to your Git repository. The initial push to Git should include the `kustomization.yaml` files. The `SiteConfig` (`example-sno.yaml`) and `PolicyGenTemplate` (`common-ranGen.yaml`, `group-du-sno*.yaml`, and `example-sno-site.yaml`) files can be omitted and pushed at a later time as required when deploying a site. + +The `KlusterletAddonConfigOverride.yaml` file is only required if one or more `SiteConfig` CRs which make reference to it are committed and pushed to Git. See `example-sno.yaml` for an example of how this is used. diff --git a/modules/ztp-prerequisites-for-deploying-the-ztp-pipeline.adoc b/modules/ztp-prerequisites-for-deploying-the-ztp-pipeline.adoc deleted file mode 100644 index bdfa6271ec..0000000000 --- a/modules/ztp-prerequisites-for-deploying-the-ztp-pipeline.adoc +++ /dev/null @@ -1,29 +0,0 @@ -// Module included in the following assemblies: -// -// *scalability_and_performance/ztp-deploying-disconnected.adoc - -:_content-type: CONCEPT -[id="ztp-prerequisites-for-deploying-the-ztp-pipeline_{context}"] -= Prerequisites for deploying the ZTP pipeline - -* {product-title} cluster version 4.8 or higher and Red Hat GitOps Operator is installed. -* {rh-rhacm-first} version 2.3 or above is installed. -* For disconnected environments, make sure your source data Git repository and `ztp-site-generator` container image are accessible from the hub cluster. -* If you want additional custom content, such as extra install manifests or custom resources (CR) for policies, add them to the `/usr/src/hook/ztp/source-crs/extra-manifest/` directory. Similarly, you can add additional configuration CRs, as referenced from a `PolicyGenTemplate`, to the `/usr/src/hook/ztp/source-crs/` directory. -** Create a `Containerfile` that adds your additional manifests to the Red Hat provided image, for example: -+ -[source,yaml] ----- -FROM /ztp-site-generator:latest <1> -COPY myInstallManifest.yaml /usr/src/hook/ztp/source-crs/extra-manifest/ -COPY mySourceCR.yaml /usr/src/hook/ztp/source-crs/ ----- -+ -<1> must point to a registry containing the `ztp-site-generator` container image provided by Red Hat. - -** Build a new container image that includes these additional files: -+ -[source,terminal] ----- -$> podman build Containerfile.example ----- diff --git a/modules/ztp-provisioning-edge-sites-at-scale.adoc b/modules/ztp-provisioning-edge-sites-at-scale.adoc index 191ad67493..ae58ef6cd4 100644 --- a/modules/ztp-provisioning-edge-sites-at-scale.adoc +++ b/modules/ztp-provisioning-edge-sites-at-scale.adoc @@ -3,7 +3,7 @@ // scalability_and_performance/ztp-deploying-disconnected.adoc :_content-type: CONCEPT -[id="provisioning-edge-sites-at-scale_{context}"] +[id="ztp-provisioning-edge-sites-at-scale_{context}"] = Provisioning edge sites at scale Telco edge computing presents extraordinary challenges with managing hundreds to tens of thousands of clusters in hundreds of thousands of locations. These challenges require fully-automated management solutions with, as closely as possible, zero human interaction. @@ -16,4 +16,4 @@ Service providers are deploying a more distributed mobile network architecture a The following diagram shows how ZTP works within a far edge framework. -image::176_OpenShift_zero_touch_provisioning_0821.png[ZTP in a far edge framework] +image::217_OpenShift_Zero_Touch_Provisioning_updates_0222_1.png[ZTP in a far edge framework] diff --git a/modules/ztp-querying-the-policy-compliance-status-for-each-cluster.adoc b/modules/ztp-querying-the-policy-compliance-status-for-each-cluster.adoc new file mode 100644 index 0000000000..61455f96af --- /dev/null +++ b/modules/ztp-querying-the-policy-compliance-status-for-each-cluster.adoc @@ -0,0 +1,62 @@ +// Module included in the following assemblies: +// +// * scalability_and_performance/ztp-deploying-disconnected.adoc + +:_content-type: PROCEDURE +[id="ztp-querying-the-policy-compliance-status-for-each-cluster_{context}"] += Querying the policy compliance status for each cluster + +After you have created the validator inform policies for your clusters and pushed them to the +the zero touch provisioning (ZTP) Git repository, you can check the status of each cluster for policy compliance. + +.Procedure + +. To query the status of the spoke clusters, use either the {rh-rhacm-first} web console or the CLI: ++ +* To query status from the {rh-rhacm} web console, perform the following actions: ++ +.. Click *Governance* -> *Find policies*. +.. Search for *du-validator-policy*. +.. Click into the policy. + +* To query status using the CLI, run the following command: ++ +[source,terminal] +---- +$ oc get policies du-validator-policy -n -o jsonpath={'.status.status'} | jq +---- ++ +When all of the policies including the validator inform policy applied to +the cluster become compliant, ZTP installation and configuration for this cluster is complete. + +. To query the cluster violation/compliant status from the ACM web console, click +*Governance* -> *Cluster violations*. + +. Check the validator policy compliant status for a cluster using the following commands: ++ +-- +.. Export the cluster name: ++ +[source,terminal] +---- +$ export CLUSTER= +---- + +.. Get the policy: ++ +[source,terminal] +---- +$ oc get policies -n $CLUSTER | grep +---- +-- ++ +Alternatively, you can use the following command: ++ +[source,terminal] +---- +$ oc get policies -n -o jsonpath="{.status.status[?(@.clustername=='$CLUSTER')]}" | jq +---- ++ +After the `*-validator-du-policy` {rh-rhacm} policy becomes compliant for the cluster, the +validator policy is unbound for this cluster and the `ztp-done` label is added to the cluster. +This acts as a persistent indicator that the whole ZTP pipeline has completed for the cluster. diff --git a/modules/ztp-removing-obsolete-content.adoc b/modules/ztp-removing-obsolete-content.adoc new file mode 100644 index 0000000000..dc1d0ae405 --- /dev/null +++ b/modules/ztp-removing-obsolete-content.adoc @@ -0,0 +1,26 @@ +// Module included in the following assemblies: +// +// *scalability_and_performance/ztp-deploying-disconnected.adoc + +:_content-type: PROCEDURE +[id="ztp-removing-obsolete-content_{context}"] += Removing obsolete content + +If a change to the `PolicyGenTemplate` file configuration results in obsolete policies, for example, policies are renamed, use the following procedure to remove those policies in an automated way. + +.Procedure + +. Remove the affected `PolicyGenTemplate` files from the Git repository, commit and push to the remote repository. + +. Wait for the changes to synchronize through the application and the affected policies to be removed from the hub cluster. + +. Add the updated `PolicyGenTemplate` files back to the Git repository, and then commit and push to the remote repository. + +Note that removing the zero touch provisioning (ZTP) distributed unit (DU) profile policies from the Git repository, and as a result also removing them from the hub cluster, does not affect any configuration of the managed spoke clusters. Removing a policy from the hub cluster does not delete it from the spoke cluster and the CRs managed by that policy. + +As an alternative, after making changes to `PolicyGenTemplate` files that result in obsolete policies, you can remove these policies from the hub cluster manually. You can delete policies from the {rh-rhacm} console using the *Governance* tab or by using the following command: + +[source,terminal] +---- +$ oc delete policy -n +---- diff --git a/modules/ztp-removing-the-argocd-pipeline.adoc b/modules/ztp-removing-the-argocd-pipeline.adoc deleted file mode 100644 index 6678e6bd72..0000000000 --- a/modules/ztp-removing-the-argocd-pipeline.adoc +++ /dev/null @@ -1,34 +0,0 @@ -// Module included in the following assemblies: -// -// *scalability_and_performance/ztp-deploying-disconnected.adoc - -:_content-type: PROCEDURE -[id="ztp-removing-the-argocd-pipeline_{context}"] -= Removing the ArgoCD pipeline - -Use the following procedure if you want to remove the ArgoCD pipeline and all generated artifacts. - -.Procedure - -. Detach all clusters from ACM. - -. Delete all `SiteConfig` and `PolicyGenTemplate` custom resources (CRs) from your Git repository. - -. Delete the following namespaces: -+ -* All policy namespaces: -+ -[source,terminal] ----- - $ oc get policy -A ----- -+ -* `clusters-sub` -* `policies-sub` - -. Process the directory using the Kustomize tool: -+ -[source,terminal] ----- - $ oc delete -k cnf-features-deploy/ztp/gitops-subscriptions/argocd/deployment ----- diff --git a/modules/ztp-required-changes-to-the-git-repository.adoc b/modules/ztp-required-changes-to-the-git-repository.adoc new file mode 100644 index 0000000000..7154e75d8e --- /dev/null +++ b/modules/ztp-required-changes-to-the-git-repository.adoc @@ -0,0 +1,82 @@ +// Module included in the following assemblies: +// +// *scalability_and_performance/ztp-deploying-disconnected.adoc + +:_content-type: CONCEPT +[id="ztp-required-changes-to-the-git-repository_{context}"] += Required changes to the Git repository + +When upgrading from an earlier release to {product-title} 4.10, additional requirements are placed on the contents of the Git repository. Existing content in the repository must be updated to reflect these changes. + +* Changes to `PolicyGenTemplate` files: ++ +All `PolicyGenTemplate` files must be created in a `Namespace` prefixed with `ztp`. This ensures that the GitOps zero touch provisioning (ZTP) application is able to manage the policy CRs generated by GitOps ZTP without conflicting with the way {rh-rhacm-first} manages the policies internally. + +* Remove the `pre-sync.yaml` and `post-sync.yaml` files: ++ +This step is optional but recommended. When the `kustomization.yaml` files are added, the `pre-sync.yaml` and `post-sync.yaml` files are no longer used. They must be removed to avoid confusion and can potentially cause errors if kustomization files are inadvertantly removed. Note that there is a set of `pre-sync.yaml` and `post-sync.yaml` files under both the `SiteConfig` and `PolicyGenTemplate` trees. + +* Add the `kustomization.yaml` file to the repository: ++ +All `SiteConfig` and `PolicyGenTemplate` CRs must be included in a `kustomization.yaml` file under their respective directory trees. For example: ++ +[source,terminal] +---- +├── policygentemplates +│ ├── site1-ns.yaml +│ ├── site1.yaml +│ ├── site2-ns.yaml +│ ├── site2.yaml +│ ├── common-ns.yaml +│ ├── common-ranGen.yaml +│ ├── group-du-sno-ranGen-ns.yaml +│ ├── group-du-sno-ranGen.yaml +│ └── kustomization.yaml +└── siteconfig + ├── site1.yaml + ├── site2.yaml + └── kustomization.yaml +---- ++ +[NOTE] +==== +The files listed in the `generator` sections must contain either `SiteConfig` or `PolicyGenTemplate` CRs only. If your existing YAML files contain other CRs, for example, `Namespace`, these other CRs must be pulled out into separate files and listed in the `resources` section. +==== ++ +The `PolicyGenTemplate` kustomization file must contain all `PolicyGenTemplate` YAML files in the `generator` section and `Namespace` CRs in the `resources` section. For example: ++ +[source,yaml] +---- +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization + +generators: +- common-ranGen.yaml +- group-du-sno-ranGen.yaml +- site1.yaml +- site2.yaml + +resources: +- common-ns.yaml +- group-du-sno-ranGen-ns.yaml +- site1-ns.yaml +- site2-ns.yaml +---- ++ +The `SiteConfig` kustomization file must contain all `SiteConfig` YAML files in the `generator` section and any other CRs in the resources: ++ +[source,terminal] +---- +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization + +generators: +- site1.yaml +- site2.yaml +---- + +* Review and incorporate recommended changes ++ +Each release may include additional recommended changes to the configuration applied to deployed clusters. Typically these changes result in lower CPU use by the OpenShift platform, additional features, or improved tuning of the platform. ++ +Review the reference `SiteConfig` and `PolicyGenTemplate` CRs applicable to the types of cluster in your network. These examples can be found in the `argocd/example` directory extracted from the GitOps ZTP container. diff --git a/modules/ztp-restarting-policies-reconciliation.adoc b/modules/ztp-restarting-policies-reconciliation.adoc new file mode 100644 index 0000000000..5bd0a28012 --- /dev/null +++ b/modules/ztp-restarting-policies-reconciliation.adoc @@ -0,0 +1,41 @@ +// Module included in the following assemblies: +// +// *scalability_and_performance/ztp-deploying-disconnected.adoc + +:_content-type: PROCEDURE +[id="ztp-restarting-policies-reconciliation_{context}"] += Restarting policies reconciliation + +Use the following procedure to restart policies reconciliation in the event of unexpected compliance issues. This procedure is required when the `ClusterGroupUpgrade` CR has timed out. + +.Procedure + +. A `ClusterGroupUpgrade` CR is generated in the namespace `ztp-install` by the {cgu-operator-full} after the managed spoke cluster becomes `Ready`: ++ +[source,terminal] +---- +$ export CLUSTER= +---- ++ +[source,terminal] +---- +$ oc get clustergroupupgrades -n ztp-install $CLUSTER +---- + +. If there are unexpected issues and the policies fail to become complaint within the configured timeout (the default is 4 hours), the status of the `ClusterGroupUpgrade` CR shows `UpgradeTimedOut`: ++ +[source,terminal] +---- +$ oc get clustergroupupgrades -n ztp-install $CLUSTER -o jsonpath='{.status.conditions[?(@.type=="Ready")]}' +---- + +. A `ClusterGroupUpgrade` CR in the `UpgradeTimedOut` state automatically restarts its policy reconciliation every hour. If you have changed your policies, you can start a retry immediately by deleting the existing `ClusterGroupUpgrade` CR. This triggers the automatic creation of a new `ClusterGroupUpgrade` CR that begins reconciling the policies immediately: ++ +[source,terminal] +---- +$ oc delete clustergroupupgrades -n ztp-install $CLUSTER +---- + +Note that when the `ClusterGroupUpgrade` CR completes with status `UpgradeCompleted` and the managed spoke cluster has the label `ztp-done` applied, you can make additional configuration changes using `PolicyGenTemplate`. Deleting the existing `ClusterGroupUpgrade` CR will not make the {cgu-operator} generate a new CR. + +At this point, ZTP has completed its interaction with the cluster and any further interactions should be treated as an upgrade. diff --git a/modules/ztp-roll-out-the-configuration-changes.adoc b/modules/ztp-roll-out-the-configuration-changes.adoc new file mode 100644 index 0000000000..a973587635 --- /dev/null +++ b/modules/ztp-roll-out-the-configuration-changes.adoc @@ -0,0 +1,11 @@ +// Module included in the following assemblies: +// +// *scalability_and_performance/ztp-deploying-disconnected.adoc + +:_content-type: CONCEPT +[id="ztp-roll-out-the-configuration-changes_{context}"] += Roll out the configuration changes + +If any configuration changes were included in the upgrade due to implementing recommended changes, the upgrade process results in a set of policy CRs on the hub cluster in the `Non-Compliant` state. As of the {product-title} 4.10 release, these policies are set to `inform` mode and are not pushed to the spoke clusters without an additional step by the user. This ensures that potentially disruptive changes to the clusters can be managed in terms of when the changes are made, for example, during a maintenance window, and how many clusters are updated concurrently. + +To roll out the changes, create one or more `ClusterGroupUpgrade` CRs as detailed in the {cgu-operator} documentation. The CR must contain the list of `Non-Compliant` policies that you want to push out to the spoke clusters as well as a list or selector of which clusters should be included in the update. diff --git a/modules/ztp-single-node-clusters.adoc b/modules/ztp-single-node-clusters.adoc index 56f90eeadd..61b799674f 100644 --- a/modules/ztp-single-node-clusters.adoc +++ b/modules/ztp-single-node-clusters.adoc @@ -4,7 +4,6 @@ :_content-type: CONCEPT [id="ztp-single-node-clusters_{context}"] - = Single-node clusters You use zero touch provisioning (ZTP) to deploy {sno} clusters to run distributed units (DUs) on small hardware footprints at disconnected diff --git a/modules/ztp-site-cleanup.adoc b/modules/ztp-site-cleanup.adoc index 7145845762..4b5693f859 100644 --- a/modules/ztp-site-cleanup.adoc +++ b/modules/ztp-site-cleanup.adoc @@ -2,13 +2,13 @@ // // *scalability_and_performance/ztp-deploying-disconnected.adoc -:_content-type: CONCEPT +:_content-type: PROCEDURE [id="ztp-site-cleanup_{context}"] = Site cleanup -To remove a site and the associated installation and policy custom resources (CRs), remove the `SiteConfig` and site-specific `PolicyGenTemplate` CRs from the Git repository. The pipeline hooks remove the generated CRs. +Remove a site and the associated installation and configuration policy CRs by removing the `SiteConfig` and `PolicyGenTemplate` file names from the `kustomization.yaml` file. When you run the ZTP pipeline again, the generated CRs are removed. If you want to permanently remove a site, you should also remove the `SiteConfig` and site-specific `PolicyGenTemplate` files from the Git repository. If you want to remove a site temporarily, for example when redeploying a site, you can leave the `SiteConfig` and site-specific `PolicyGenTemplate` CRs in the Git repository. [NOTE] ==== -Before removing a `SiteConfig` CR you must detach the cluster from ACM. +After removing the `SiteConfig` file, if the corresponding clusters remain in the detach process, check {rh-rhacm-first} for information about cleaning up the detached managed cluster. ==== diff --git a/modules/ztp-site-planning-for-du-deployments.adoc b/modules/ztp-site-planning-for-du-deployments.adoc deleted file mode 100644 index 757200e33f..0000000000 --- a/modules/ztp-site-planning-for-du-deployments.adoc +++ /dev/null @@ -1,21 +0,0 @@ -// Module included in the following assemblies: -// -// scalability_and_performance/ztp-deploying-disconnected.adoc - -:_content-type: CONCEPT -[id="ztp-site-planning-for-du-deployments_{context}"] -= Site planning considerations for distributed unit deployments - -Site planning for distributed units (DU) deployments is complex. The following is an overview of the tasks that you complete before the DU hosts are brought online in the production environment. - -* Develop a network model. The network model depends on various factors such as the size of the area of coverage, number of hosts, projected traffic load, DNS, and DHCP requirements. -* Decide how many DU radio nodes are required to provide sufficient coverage and redundancy for your network. -* Develop mechanical and electrical specifications for the DU host hardware. -* Develop a construction plan for individual DU site installations. -* Tune host BIOS settings for production, and deploy the BIOS configuration to the hosts. -* Install the equipment on-site, connect hosts to the network, and apply power. -* Configure on-site switches and routers. -* Perform basic connectivity tests for the host machines. -* Establish production network connectivity, and verify host connections to the network. -* Provision and deploy on-site DU hosts at scale. -* Test and verify on-site operations, performing load and scale testing of the DU hosts before finally bringing the DU infrastructure online in the live production environment. diff --git a/modules/ztp-sriov-operator.adoc b/modules/ztp-sriov-operator.adoc deleted file mode 100644 index d053203c56..0000000000 --- a/modules/ztp-sriov-operator.adoc +++ /dev/null @@ -1,13 +0,0 @@ -// Module included in the following assemblies: -// -// scalability_and_performance/ztp-deploying-disconnected.adoc - -:_content-type: CONCEPT -[id="ztp-sriov-operator_{context}"] -= SR-IOV Operator - -The Single Root I/O Virtualization (SR-IOV) Network Operator manages the SR-IOV network devices and network attachments in your cluster. - -The SR-IOV Operator allows network interfaces to be virtual and shared at a device level with networking functions running within the cluster. - -The SR-IOV Network Operator adds the `SriovOperatorConfig.sriovnetwork.openshift.io` CustomResourceDefinition resource. The Operator automatically creates a SriovOperatorConfig custom resource named `default` in the `openshift-sriov-network-operator` namespace. The `default` custom resource contains the SR-IOV Network Operator configuration for your cluster. diff --git a/modules/ztp-stopping-the-existing-gitops-ztp-applications.adoc b/modules/ztp-stopping-the-existing-gitops-ztp-applications.adoc new file mode 100644 index 0000000000..6ea404eb3c --- /dev/null +++ b/modules/ztp-stopping-the-existing-gitops-ztp-applications.adoc @@ -0,0 +1,32 @@ +// Module included in the following assemblies: +// +// *scalability_and_performance/ztp-deploying-disconnected.adoc + +:_content-type: PROCEDURE +[id="ztp-stopping-the-existing-gitops-ztp-applications_{context}"] += Stopping the existing GitOps ZTP applications + +Removing the existing applications ensures that any changes to existing content in the Git repository are not rolled out until the new version of the tooling is available. + +Use the application files from the `deployment` directory. If you used custom names for the applications, update the names in these files first. + +.Procedure + +. Perform a non-cascaded delete on the `clusters` application to leave all generated resources in place: ++ +[source,terminal] +---- +$ oc delete -f out/argocd/deployment/clusters-app.yaml +---- + +. Perform a cascaded delete on the `policies` application to remove all previous policies: ++ +[source,terminal] +---- +$ oc patch -f policies-app.yaml -p '{"metadata": {"finalizers": ["resources-finalizer.argocd.argoproj.io"]}}' --type merge +---- ++ +[source,terminal] +---- +$ oc delete -f out/argocd/deployment/policies-app.yaml +---- diff --git a/modules/ztp-support-for-deployment-of-multi-node-clusters.adoc b/modules/ztp-support-for-deployment-of-multi-node-clusters.adoc new file mode 100644 index 0000000000..6ef87c0d08 --- /dev/null +++ b/modules/ztp-support-for-deployment-of-multi-node-clusters.adoc @@ -0,0 +1,36 @@ +// Module included in the following assemblies: +// +// scalability_and_performance/ztp-deploying-disconnected.adoc + +:_content-type: CONCEPT +[id="ztp-support-for-deployment-of-multi-node-clusters_{context}"] += ZTP support for deployment of multi-node clusters + +The Telco 5G zero touch provisioning (ZTP) flow uses the Assisted Service, which is part of {rh-rhacm-first} on the hub cluster, to install clusters. This is done by generating all of the custom resources (CRs) required by Assisted Service including: + +* `AgentClusterInstall` +* `ClusterDeployment` +* `NMStateConfig` +* `ManagedCluster and `KlusterletAddonConfig` (integration with {rh-rhacm}) +* `InfraEnv` +* `BareMetalHost` +* `ConfigMap` for extra install manifests + +Extending ZTP to support three-node clusters and standard clusters requires updates to these CRs,including multiple instantiations of some. + +ZTP provides support for deploying single node clusters, three-node clusters, and standard OpenShift clusters. This includes the installation of OpenShift and deployment of the distributed units (DUs) at scale. + +The overall flow is identical to the ZTP support for single node clusters, with some differentiation in configuration depending on the type of cluster: + +`SiteConfig`: + +* For single node clusters, the `SiteConfig` file must have exactly one entry in the `nodes` section. +* For three-node clusters, the `SiteConfig` file must have exactly three entries defined in the `nodes` section. +* For standard clusters, the `SiteConfig` file must have exactly three entries in the `nodes` section with `role: master` and two or more additional entries with `role: worker`. + +`PolicyGenTemplate`: + +* The example common `PolicyGenTemplate` is common across all types of clusters. +* There are example group `PolicyGenTemplate` files for each single node, three-node, +and standard clusters. +* Site-specific `PolicyGenTemplate` files are still specific to each site. diff --git a/modules/ztp-talo-integration.adoc b/modules/ztp-talo-integration.adoc new file mode 100644 index 0000000000..a393d0df5b --- /dev/null +++ b/modules/ztp-talo-integration.adoc @@ -0,0 +1,53 @@ +// Module included in the following assemblies: +// +// * scalability_and_performance/ztp-deploying-disconnected.adoc + +:_module-type: CONCEPT +[id="ztp-talo-integration_{context}"] += GitOps ZTP and {cgu-operator-full} + +GitOps zero touch provisioning (ZTP) generates installation and configuration CRs from manifests stored in Git. These artifacts are applied to a centralized hub cluster where {rh-rhacm-first}, assisted installer service, and the {cgu-operator-first} use the CRs to install and configure the spoke cluster. The configuration phase of the ZTP pipeline uses the {cgu-operator} to orchestrate the application of the configuration CRs to the cluster. There are several key integration points between GitOps ZTP and the {cgu-operator}. + +Inform policies:: +By default, GitOps ZTP creates all policies with a remediation action of `inform`. These policies cause {rh-rhacm} to report on compliance status of clusters relevant to the policies but does not apply the desired configuration. During the ZTP installation, the {cgu-operator} steps through the created `inform` policies, creates a copy for the target spoke cluster(s) and changes the remediation action of the copy to `enforce`. This pushes the configuration to the spoke cluster. Outside of the ZTP phase of the cluster lifecycle, this setup allows changes to be made to policies without the risk of immediately rolling those changes out to all affected spoke clusters in the network. You can control the timing and the set of clusters that are remediated using {cgu-operator}. + +Automatic creation of ClusterGroupUpgrade CRs:: +The {cgu-operator} monitors the state of all `ManagedCluster` CRs on the hub cluster. Any `ManagedCluster` CR which does not have a `ztp-done` label applied, including newly created `ManagedCluster` CRs, causes the {cgu-operator} to automatically create a `ClusterGroupUpgrade` CR with the following characteristics: + +* The `ClusterGroupUpgrade` CR is created and enabled in the `ztp-install` namespace. +* `ClusterGroupUpgrade` CR has the same name as the `ManagedCluster` CR. +* The cluster selector includes only the cluster associated with that `ManagedCluster` CR. +* The set of managed policies includes all policies that {rh-rhacm} has bound to the cluster at the time the `ClusterGroupUpgrade` is created. +* Pre-caching is disabled. +* Timeout set to 4 hours (240 minutes). ++ +The automatic creation of an enabled `ClusterGroupUpgrade` ensures that initial zero-touch deployment of clusters proceeds without the need for user intervention. Additionally, the automatic creation of a `ClusterGroupUpgrade` CR for any `ManagedCluster` without the `ztp-done` label allows a failed ZTP installation to be restarted by simply deleting the `ClusterGroupUpgrade` CR for the cluster. + +Waves:: +Each policy generated from a `PolicyGenTemplate` CR includes a `ztp-deploy-wave` annotation. This annotation is based on the same annotation from each CR which is included in that policy. The wave annotation is used to order the policies in the auto-generated `ClusterGroupUpgrade` CR. ++ +[NOTE] +==== +All CRs in the same policy must have the same setting for the `ztp-deploy-wave` annotation. The default value of this annotation for each CR can be overridden in the `PolicyGenTemplate`. The wave annotation in the source CR is used for determining and setting the policy wave annotation. This annotation is removed from each built CR which is included in the generated policy at runtime. +==== ++ +The {cgu-operator} applies the configuration policies in the order specified by the wave annotations. The {cgu-operator} waits for each policy to be compliant before moving to the next policy. It is important to ensure that the wave annotation for each CR takes into account any prerequisites for those CRs to be applied to the cluster. For example, an Operator must be installed before or concurrently with the configuration for the Operator. Similarly, the `CatalogSource` for an Operator must be installed in a wave before or concurrently with the Operator Subscription. The default wave value for each CR takes these prerequisites into account. ++ +Multiple CRs and policies can share the same wave number. Having fewer policies can result in faster deployments and lower CPU usage. It is a best practice to group many CRs into relatively few waves. + +To check the default wave value in each source CR, run the following command against the `out/source-crs` directory that is extracted from the `ztp-site-generator` container image: ++ +[source,terminal] +---- +$ grep -r "ztp-deploy-wave" out/source-crs +---- + +Phase labels:: +The `ClusterGroupUpgrade` CR is automatically created and includes directives to annotate the `ManagedCluster` CR with labels at the start and end of the ZTP process. ++ +When ZTP configuration post-installation commences, the `ManagedCluster` has the `ztp-running` label applied. When all policies are remediated to the cluster and are fully compliant, these directives cause the {cgu-operator} to remove the `ztp-running` label and apply the `ztp-done` label. ++ +For deployments which make use of the `informDuValidator` policy, the `ztp-done` label is applied when the cluster is fully ready for deployment of applications. This includes all reconciliation and resulting effects of the ZTP applied configuration CRs. + +Linked CRs:: +The automatically created `ClusterGroupUpgrade` CR has the owner reference set as the `ManagedCluster` from which it was derived. This reference ensures that deleting the `ManagedCluster` CR causes the instance of the `ClusterGroupUpgrade` to be deleted along with any supporting resources. diff --git a/modules/ztp-tearing-down-the-pipeline.adoc b/modules/ztp-tearing-down-the-pipeline.adoc new file mode 100644 index 0000000000..1a8f90eee2 --- /dev/null +++ b/modules/ztp-tearing-down-the-pipeline.adoc @@ -0,0 +1,20 @@ +// Module included in the following assemblies: +// +// *scalability_and_performance/ztp-deploying-disconnected.adoc + +:_content-type: PROCEDURE +[id="ztp-tearing-down-the-pipeline_{context}"] += Tearing down the pipeline + +If you need to remove the ArgoCD pipeline and all generated artifacts follow this procedure: + +.Procedure + +. Detach all clusters from {rh-rhacm}. + +. Delete the `kustomization.yaml` file in the `deployment` directory using the following command: ++ +[source,terminal] +---- +$ oc delete -k out/argocd/deployment +---- diff --git a/modules/ztp-the-gitops-approach.adoc b/modules/ztp-the-gitops-approach.adoc index 9c5bb4dd91..6327e19492 100644 --- a/modules/ztp-the-gitops-approach.adoc +++ b/modules/ztp-the-gitops-approach.adoc @@ -6,7 +6,7 @@ [id="ztp-the-gitops-approach_{context}"] = The GitOps approach -ZTP uses the GitOps deployment set of practices for infrastructure deployment that allows developers to perform tasks that would otherwise fall under the purview of IT operations. GitOps achieves these tasks using declarative specifications stored in Git repositories, such as YAML files and other defined patterns, that provide a framework for deploying the infrastructure. The declarative output is leveraged by the Open Cluster Manager for multisite deployment. +ZTP uses the GitOps deployment set of practices for infrastructure deployment that allows developers to perform tasks that would otherwise fall under the purview of IT operations. GitOps achieves these tasks using declarative specifications stored in Git repositories, such as YAML files and other defined patterns, that provide a framework for deploying the infrastructure. The declarative output is leveraged by the Open Cluster Manager (OCM) for multisite deployment. One of the motivators for a GitOps approach is the requirement for reliability at scale. This is a significant challenge that GitOps helps solve. diff --git a/modules/ztp-the-policygentemplate.adoc b/modules/ztp-the-policygentemplate.adoc index 60bb4cdff4..54b8378b13 100644 --- a/modules/ztp-the-policygentemplate.adoc +++ b/modules/ztp-the-policygentemplate.adoc @@ -2,21 +2,22 @@ // // scalability_and_performance/ztp-deploying-disconnected.adoc -:_content-type: PROCEDURE +:_content-type: REFERENCE [id="ztp-the-policygentemplate_{context}"] -= The PolicyGenTemplate += About the PolicyGenTemplate -The `PolicyGenTemplate.yaml` file is a Custom Resource Definition (CRD) that tells PolicyGen where to categorize the generated policies and which items need to be overlaid. +The `PolicyGenTemplate.yaml` file is a custom resource definition (CRD) that tells the `PolicyGen` policy generator what CRs to include in the configuration, how to categorize the CRs into the generated policies, and what items in those CRs need to be updated with overlay content. -The following example shows the `PolicyGenTemplate.yaml` file: +The following example shows a `PolicyGenTemplate.yaml` file: [source,yaml] ---- +--- apiVersion: ran.openshift.io/v1 kind: PolicyGenTemplate metadata: name: "group-du-sno" - namespace: "group-du-sno" + namespace: "group-du-sno-policies" spec: bindingRules: group-du-sno: "" @@ -24,19 +25,68 @@ spec: sourceFiles: - fileName: ConsoleOperatorDisable.yaml policyName: "console-policy" + - fileName: ClusterLogForwarder.yaml + policyName: "log-forwarder-policy" + spec: + outputs: + - type: "kafka" + name: kafka-open + # below url is an example + url: tcp://10.46.55.190:9092/test + pipelines: + - name: audit-logs + inputRefs: + - audit + outputRefs: + - kafka-open + - name: infrastructure-logs + inputRefs: + - infrastructure + outputRefs: + - kafka-open - fileName: ClusterLogging.yaml - policyName: "cluster-log-policy" + policyName: "log-policy" spec: curation: curator: schedule: "30 3 * * *" - collection: - logs: - type: "fluentd" - fluentd: {} + collection: + logs: + type: "fluentd" + fluentd: {} + - fileName: MachineConfigSctp.yaml + policyName: "mc-sctp-policy" + metadata: + labels: + machineconfiguration.openshift.io/role: master + - fileName: PtpConfigSlave.yaml + policyName: "ptp-config-policy" + metadata: + name: "du-ptp-slave" + spec: + profile: + - name: "slave" + interface: "ens5f0" + ptp4lOpts: "-2 -s --summary_interval -4" + phc2sysOpts: "-a -r -n 24" + - fileName: SriovOperatorConfig.yaml + policyName: "sriov-operconfig-policy" + spec: + disableDrain: true + - fileName: MachineConfigAcceleratedStartup.yaml + policyName: "mc-accelerated-policy" + metadata: + name: 04-accelerated-container-startup-master + labels: + machineconfiguration.openshift.io/role: master + - fileName: DisableSnoNetworkDiag.yaml + policyName: "disable-network-diag" + metadata: + labels: + machineconfiguration.openshift.io/role: master ---- -The `group-du-ranGen.yaml` file defines a group of policies under a group named `group-du`. This file defines a `MachineConfigPool` `worker-du` that is used as the node selector for any other policy defined in `sourceFiles`. An ACM policy is generated for every source file that exists in `sourceFiles`. And, a single placement binding and placement rule is generated to apply the cluster selection rule for `group-du` policies. +The `group-du-ranGen.yaml` file defines a group of policies under a group named `group-du`. A {rh-rhacm-first} policy is generated for every source file that exists in `sourceFiles`. And, a single placement binding and placement rule is generated to apply the cluster selection rule for `group-du` policies. Using the source file `PtpConfigSlave.yaml` as an example, the `PtpConfigSlave` has a definition of a `PtpConfig` custom resource (CR). The generated policy for the `PtpConfigSlave` example is named `group-du-ptp-config-policy`. The `PtpConfig` CR defined in the generated `group-du-ptp-config-policy` is named `du-ptp-slave`. The `spec` defined in `PtpConfigSlave.yaml` is placed under `du-ptp-slave` along with the other `spec` items defined under the source file. @@ -71,7 +121,7 @@ spec: include: - '*' object-templates: - - complianceType: musthave <1> + - complianceType: musthave objectDefinition: apiVersion: ptp.openshift.io/v1 kind: PtpConfig @@ -100,4 +150,3 @@ spec: domainNumber 24 ..... ---- -<1> Displays the value of the `complianceType` field. The default value is `musthave` which indicates that an object must exist with the same `name` as specified in `object-templates`. To find the exact matches to roles and objects, set the value to `mustonlyhave`. For more information about the accepted values, see link:https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes/2.4/html-single/governance/index#configuration-policy-yaml-table[Configuration policy YAML table]. \ No newline at end of file diff --git a/modules/ztp-things-to-consider-when-creating-custom-resource-policies.adoc b/modules/ztp-things-to-consider-when-creating-custom-resource-policies.adoc deleted file mode 100644 index 82f234d833..0000000000 --- a/modules/ztp-things-to-consider-when-creating-custom-resource-policies.adoc +++ /dev/null @@ -1,15 +0,0 @@ -// Module included in the following assemblies: -// -// scalability_and_performance/ztp-deploying-disconnected.adoc - -:_content-type: CONCEPT -[id="ztp-things-to-consider-when-creating-custom-resource-policies_{context}"] -= Considerations when creating custom resource policies - -* The custom resources used to create the ACM policies should be defined with consideration of possible overlay to its metadata and spec/data. For example, if the custom resource `metadata.name` does not change between clusters then you should set the `metadata.name` value in the custom resource file. If the custom resource will have multiple instances in the same cluster, then the custom resource `metadata.name` must be defined in the policy template file. - -* In order to apply the node selector for a specific machine config pool, you have to set the node selector value as `$mcp` in order to let the policy generator overlay the `$mcp` value with the defined mcp in the policy template. - -* Subscription source files do not change. - -* To ensure that policy updates are applied, set the `complianceType` field to `mustonlyhave`. \ No newline at end of file diff --git a/modules/ztp-topology-aware-lifecycle-manager.adoc b/modules/ztp-topology-aware-lifecycle-manager.adoc new file mode 100644 index 0000000000..07c63b6177 --- /dev/null +++ b/modules/ztp-topology-aware-lifecycle-manager.adoc @@ -0,0 +1,9 @@ +// Module included in the following assemblies: +// +// *scalability_and_performance/ztp-deploying-disconnected.adoc + +:_content-type: CONCEPT +[id="ztp-topology-aware-lifecycle-manager_{context}"] += {cgu-operator-full} + +Install the {cgu-operator-first} on the hub cluster. diff --git a/modules/ztp-troubleshooting-gitops-ztp.adoc b/modules/ztp-troubleshooting-gitops-ztp.adoc index 2547de8126..d38f077f52 100644 --- a/modules/ztp-troubleshooting-gitops-ztp.adoc +++ b/modules/ztp-troubleshooting-gitops-ztp.adoc @@ -6,4 +6,4 @@ [id="ztp-troubleshooting-gitops-ztp_{context}"] = Troubleshooting GitOps ZTP -As noted, the ArgoCD pipeline synchronizes the `SiteConfig` and `PolicyGenTemplate` custom resources (CR) from the Git repository to the hub cluster. During this process, post-sync hooks create the installation and policy CRs that are also applied to the hub cluster. Use the following procedures to troubleshoot issues that might occur in this process. +The ArgoCD pipeline uses the `SiteConfig` and `PolicyGenTemplate` custom resources (CRs) from Git to generate the cluster configuration CRs and {rh-rhacm-first} policies. Use the following steps to troubleshoot issues that might occur during this process. diff --git a/modules/ztp-upgrading-gitops-ztp.adoc b/modules/ztp-upgrading-gitops-ztp.adoc new file mode 100644 index 0000000000..121dfc790b --- /dev/null +++ b/modules/ztp-upgrading-gitops-ztp.adoc @@ -0,0 +1,23 @@ +// Module included in the following assemblies: +// +// *scalability_and_performance/ztp-deploying-disconnected.adoc + +:_content-type: PROCEDURE +[id="ztp-upgrading-gitops-ztp_{context}"] += Upgrading GitOps ZTP + +You can upgrade the Gitops zero touch provisioning (ZTP) infrastructure independently from the underlying cluster, {rh-rhacm-first}, and {product-title} version running on the spoke clusters. This procedure guides you through the upgrade process to avoid impact on the spoke clusters. However, any changes to the content or settings of policies, including adding recommended content, results in changes that must be rolled out and reconciled to the spoke clusters. + +.Prerequisites + +* This procedure assumes that you have a fully operational hub cluster running the earlier version of the GitOps ZTP infrastructure. + +.Procedure + +At a high level, the strategy for upgrading the GitOps ZTP infrastructure is: + +. Label all existing clusters with the `ztp-done` label. +. Stop the ArgoCD applications. +. Install the new tooling. +. Update required content and optional changes in the Git repository. +. Update and restart the application configuration. diff --git a/modules/ztp-using-pgt-to-update-source-crs.adoc b/modules/ztp-using-pgt-to-update-source-crs.adoc new file mode 100644 index 0000000000..062077b33a --- /dev/null +++ b/modules/ztp-using-pgt-to-update-source-crs.adoc @@ -0,0 +1,150 @@ +// Module included in the following assemblies: +// +// scalability_and_performance/ztp-deploying-disconnected.adoc + +:_module-type: PROCEDURE +[id="ztp-using-pgt-to-update-source-crs_{context}"] += Using PolicyGenTemplate CRs to override source CRs content + +`PolicyGenTemplate` CRs allow you to overlay additional configuration details on top of the base source CRs provided in the `ztp-site-generate` container. You can think of `PolicyGenTemplate` CRs as a logical merge or patch to the base CR. Use `PolicyGenTemplate` CRs to update a single field of the base CR, or overlay the entire contents of the base CR. You can update values and insert fields that are not in the base CR. + +The following example procedure describes how to update fields in the generated `PerformanceProfile` CR for the reference configuration based on the `PolicyGenTemplate` CR in the `group-du-sno-ranGen.yaml` file. Use the procedure as a basis for modifying other parts of the `PolicyGenTemplate` based on your requirements. + +.Prerequisites + +* Create a Git repository where you manage your custom site configuration data. The repository must be accessible from the hub cluster and be defined as a source repository for Argo CD. + +.Procedure + +. Review the baseline source CR for existing content. You can review the source CRs listed in the reference `PolicyGenTemplate` CRs by extracting them from the zero touch provisioning (ZTP) container. + +.. Create an `/out` folder: ++ +[source,terminal] +---- +$ mkdir -p ./out +---- + +.. Extract the source CRs: ++ +[source,terminal] +---- +$ podman run --log-driver=none --rm registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.10 extract /home/ztp --tar | tar x -C ./out +---- + +. Review the baseline `PerformanceProfile` CR in `./out/source-crs/PerformanceProfile.yaml`: ++ +[source,yaml] +---- +apiVersion: performance.openshift.io/v2 +kind: PerformanceProfile +metadata: + name: $name + annotations: + ran.openshift.io/ztp-deploy-wave: "10" +spec: + additionalKernelArgs: + - "idle=poll" + - "rcupdate.rcu_normal_after_boot=0" + cpu: + isolated: $isolated + reserved: $reserved + hugepages: + defaultHugepagesSize: $defaultHugepagesSize + pages: + - size: $size + count: $count + node: $node + machineConfigPoolSelector: + pools.operator.machineconfiguration.openshift.io/$mcp: "" + net: + userLevelNetworking: true + nodeSelector: + node-role.kubernetes.io/$mcp: '' + numa: + topologyPolicy: "restricted" + realTimeKernel: + enabled: true +---- ++ +[NOTE] +==== +Any fields in the source CR which contain `$...` are removed from the generated CR if they are not provided in the `PolicyGenTemplate` CR. +==== + +. Update the `PolicyGenTemplate` entry for `PerformanceProfile` in the `group-du-sno-ranGen.yaml` reference file. The following example `PolicyGenTemplate` CR stanza supplies appropriate CPU specifications, sets the `hugepages` configuration, and adds a new field that sets `globallyDisableIrqLoadBalancing` to false. ++ +[source,yaml] +---- +- fileName: PerformanceProfile.yaml + policyName: "config-policy" + metadata: + name: openshift-node-performance-profile + spec: + cpu: + # These must be tailored for the specific hardware platform + isolated: "2-19,22-39" + reserved: "0-1,20-21" + hugepages: + defaultHugepagesSize: 1G + pages: + - size: 1G + count: 10 + globallyDisableIrqLoadBalancing: false +---- + +. Commit the `PolicyGenTemplate` change in Git, and then push to the Git repository being monitored by the GitOps ZTP argo CD application. + + +.Example output + +The ZTP application generates an ACM policy that contains the generated `PerformanceProfile` CR. The contents of that CR are derived by merging the `metadata` and `spec` contents from the `PerformanceProfile` entry in the `PolicyGenTemplate` onto the source CR. The resulting CR has the following content: + +[source,yaml] +---- +--- +apiVersion: performance.openshift.io/v2 +kind: PerformanceProfile +metadata: + name: openshift-node-performance-profile +spec: + additionalKernelArgs: + - idle=poll + - rcupdate.rcu_normal_after_boot=0 + cpu: + isolated: 2-19,22-39 + reserved: 0-1,20-21 + globallyDisableIrqLoadBalancing: false + hugepages: + defaultHugepagesSize: 1G + pages: + - count: 10 + size: 1G + machineConfigPoolSelector: + pools.operator.machineconfiguration.openshift.io/master: "" + net: + userLevelNetworking: true + nodeSelector: + node-role.kubernetes.io/master: "" + numa: + topologyPolicy: restricted + realTimeKernel: + enabled: true +---- + +[NOTE] +==== +In the `/source-crs` folder that you extract from the `ztp-site-generate` container, the `$` syntax is not used for template substitution as implied by the syntax. Rather, if the `policyGen` tool sees the `$` prefix for a string and you do not specify a value for that field in the related `PolicyGenTemplate` CR, the field is omitted from the output CR entirely. + +An exception to this is the `$mcp` variable in `/source-crs` YAML files that is substituted with the specified value for `mcp` from the `PolicyGenTemplate` CR. For example, in `example/policygentemplates/group-du-standard-ranGen.yaml`, the value for `mcp` is `worker`: + +[source,yaml] +---- +spec: + bindingRules: + group-du-standard: "" + mcp: "worker" +---- + +The `policyGen` tool replace instances of `$mcp` with `worker` in the output CRs. +==== diff --git a/modules/ztp-validating-the-generation-of-configuration-policy-crs.adoc b/modules/ztp-validating-the-generation-of-configuration-policy-crs.adoc new file mode 100644 index 0000000000..1e9b73bda7 --- /dev/null +++ b/modules/ztp-validating-the-generation-of-configuration-policy-crs.adoc @@ -0,0 +1,115 @@ +// Module included in the following assemblies: +// +// *scalability_and_performance/ztp-deploying-disconnected.adoc + +:_content-type: PROCEDURE +[id="ztp-validating-the-generation-of-configuration-policy-crs_{context}"] += Validating the generation of configuration policy CRs + +Policy custom resources (CRs) are generated in the same namespace as the `PolicyGenTemplate` from which they are created. The same troubleshooting flow applies to all policy CRs generated from a `PolicyGenTemplate` regardless of whether they are `ztp-common`, `ztp-group`, or `ztp-site` based, as shown using the following commands: + +[source,terminal] +---- +$ export NS= +---- + +[source,terminal] +---- +$ oc get policy -n $NS +---- + +The expected set of policy-wrapped CRs should be displayed. + +If the policies failed synchronization, use the following troubleshooting steps. + +.Procedure + +. To display detailed information about the policies, run the following command: ++ +[source,terminal] +---- +$ oc describe -n openshift-gitops application policies +---- + +. Check for `Status: Conditions:` to show the error logs. For example, setting an invalid `sourceFile→fileName:` generates the error shown below: ++ +[source,text] +---- +Status: + Conditions: + Last Transition Time: 2021-11-26T17:21:39Z + Message: rpc error: code = Unknown desc = `kustomize build /tmp/https___git.com/ran-sites/policies/ --enable-alpha-plugins` failed exit status 1: 2021/11/26 17:21:40 Error could not find test.yaml under source-crs/: no such file or directory +Error: failure in plugin configured via /tmp/kust-plugin-config-52463179; exit status 1: exit status 1 + Type: ComparisonError +---- + +. Check for `Status: Sync:`. If there are log errors at `Status: Conditions:`, the `Status: Sync:` shows `Unknown` or `Error`: ++ +[source,text] +---- +Status: + Sync: + Compared To: + Destination: + Namespace: policies-sub + Server: https://kubernetes.default.svc + Source: + Path: policies + Repo URL: https://git.com/ran-sites/policies/.git + Target Revision: master + Status: Error +---- + +. When {rh-rhacm-first} recognizes that policies apply to a `ManagedCluster` object, the policy CR objects are applied to the cluster namespace. Check to see if the policies were copied to the cluster namespace: ++ +[source,terminal] +---- +$ oc get policy -n $CLUSTER +---- ++ +.Example output: ++ +[source,terminal] +---- +NAME REMEDIATION ACTION COMPLIANCE STATE AGE +ztp-common.common-config-policy inform Compliant 13d +ztp-common.common-subscriptions-policy inform Compliant 13d +ztp-group.group-du-sno-config-policy inform Compliant 13d +Ztp-group.group-du-sno-validator-du-policy inform Compliant 13d +ztp-site.example-sno-config-policy inform Compliant 13d +---- ++ +{rh-rhacm} copies all applicable policies into the cluster namespace. The copied policy names have the format: `.-`. + +. Check the placement rule for any policies not copied to the cluster namespace. The `matchSelector` in the `PlacementRule` for those policies should match labels on the `ManagedCluster` object: ++ +[source,terminal] +---- +$ oc get placementrule -n $NS +---- + +. Note the `PlacementRule` name appropriate for the missing policy, common, group, or site, using the following command: ++ +[source,terminal] +---- +$ oc get placementrule -n $NS -o yaml +---- ++ +* The status-decisions should include your cluster name. +* The key-value pair of the `matchSelector` in the spec must match the labels on your managed cluster. + +. Check the labels on the `ManagedCluster` object using the following command: ++ +[source,terminal] +---- +$ oc get ManagedCluster $CLUSTER -o jsonpath='{.metadata.labels}' | jq +---- + +. Check to see which policies are compliant using the following command: ++ +[source,terminal] +---- +$ oc get policy -n $CLUSTER +---- ++ +If the `Namespace`, `OperatorGroup`, and `Subscription` policies are compliant but the Operator configuration policies are not, it is likely that the Operators did not install on the spoke cluster. This causes the Operator configuration policies to fail to apply because the CRD is not yet applied to the spoke. diff --git a/modules/ztp-validating-the-generation-of-installation-crs.adoc b/modules/ztp-validating-the-generation-of-installation-crs.adoc index d35a7f42a2..5b1fde84ed 100644 --- a/modules/ztp-validating-the-generation-of-installation-crs.adoc +++ b/modules/ztp-validating-the-generation-of-installation-crs.adoc @@ -1,4 +1,4 @@ -// Module included in the following assemblies: +file// Module included in the following assemblies: // // *scalability_and_performance/ztp-deploying-disconnected.adoc @@ -6,66 +6,58 @@ [id="ztp-validating-the-generation-of-installation-crs_{context}"] = Validating the generation of installation CRs -`SiteConfig` applies Installation custom resources (CR) to the hub cluster in a namespace with the name matching the site name. To check the status, enter the following command: +The GitOps zero touch provisioning (ZTP) infrastructure generates a set of installation CRs on the hub cluster in response to a `SiteConfig` CR pushed to your Git repository. You can check that the installation CRs were created by using the following command: [source,terminal] ---- $ oc get AgentClusterInstall -n ---- -If no object is returned, use the following procedure to troubleshoot the ArgoCD pipeline flow from `SiteConfig` to the installation CRs. +If no object is returned, use the following procedure to troubleshoot the ArgoCD pipeline flow from `SiteConfig` files to the installation CRs. .Procedure -. Check the synchronization of the `SiteConfig` to the hub cluster using either of the following commands: +. Verify that the `SiteConfig->ManagedCluster` was generated to the hub cluster: + [source,terminal] ---- -$ oc get siteconfig -A +$ oc get managedcluster ---- -+ -or -+ -[source,terminal] ----- -$ oc get siteconfig -n clusters-sub ----- -+ -If the `SiteConfig` is missing, one of the following situations has occurred: -* The *clusters* application failed to synchronize the CR from the Git repository to the hub. Use the following command to verify this: +. If the `SiteConfig` `ManagedCluster` is missing, see if the `clusters` application failed to synchronize the files from the Git repository to the hub: + [source,terminal] ---- $ oc describe -n openshift-gitops application clusters ---- -+ -Check for `Status: Synced` and that the `Revision:` is the SHA of the commit you pushed to the subscribed repository. -+ -* The pre-sync hook failed, possibly due to a failure to pull the container image. Check the ArgoCD dashboard for the status of the pre-sync job in the *clusters* application. -. Verify the post hook job ran: +. Check for `Status: Conditions:` to view the error logs. For example, setting an invalid value for `extraManifestPath:` in the `siteConfig` file raises an error as shown below: + -[source,terminal] +[source,text] ---- -$ oc describe job -n clusters-sub siteconfig-post +Status: + Conditions: + Last Transition Time: 2021-11-26T17:21:39Z + Message: rpc error: code = Unknown desc = `kustomize build /tmp/https___git.com/ran-sites/siteconfigs/ --enable-alpha-plugins` failed exit status 1: 2021/11/26 17:21:40 Error could not create extra-manifest ranSite1.extra-manifest3 stat extra-manifest3: no such file or directory +2021/11/26 17:21:40 Error: could not build the entire SiteConfig defined by /tmp/kust-plugin-config-913473579: stat extra-manifest3: no such file or directory +Error: failure in plugin configured via /tmp/kust-plugin-config-913473579; exit status 1: exit status 1 + Type: ComparisonError ---- -+ -* If successful, the returned output indicates `succeeded: 1`. -* If the job fails, ArgoCD retries it. In some cases, the first pass will fail and the second pass will indicate that the job passed. -. Check for errors in the post hook job: +. Check for `Status: Sync:`. If there are log errors, `Status: Sync:` could indicate an +`Unknown` error: + -[source,terminal] +[source,text] ---- -$ oc get pod -n clusters-sub +Status: + Sync: + Compared To: + Destination: + Namespace: clusters-sub + Server: https://kubernetes.default.svc + Source: + Path: sites-config + Repo URL: https://git.com/ran-sites/siteconfigs/.git + Target Revision: master + Status: Unknown ---- -+ -Note the name of the `siteconfig-post-xxxxx` pod: -+ -[source,terminal] ----- -$ oc logs -n clusters-sub siteconfig-post-xxxxx ----- -+ -If the logs indicate errors, correct the conditions and push the corrected `SiteConfig` or `PolicyGenTemplate` to the Git repository. diff --git a/modules/ztp-validating-the-generation-of-policy-crs.adoc b/modules/ztp-validating-the-generation-of-policy-crs.adoc deleted file mode 100644 index 5ea3675211..0000000000 --- a/modules/ztp-validating-the-generation-of-policy-crs.adoc +++ /dev/null @@ -1,112 +0,0 @@ -// Module included in the following assemblies: -// -// *scalability_and_performance/ztp-deploying-disconnected.adoc - -:_content-type: PROCEDURE -[id="ztp-validating-the-generation-of-policy-crs_{context}"] -= Validating the generation of policy CRs - -ArgoCD generates the policy custom resources (CRs) in the same namespace as the `PolicyGenTemplate` from which they were created. The same troubleshooting flow applies to all policy CRs generated from `PolicyGenTemplates` regardless of whether they are common, group, or site based. - -To check the status of the policy CRs, enter the following commands: - -[source,terminal] ----- -$ export NS= ----- - -[source,terminal] ----- -$ oc get policy -n $NS ----- - -The returned output displays the expected set of policy wrapped CRs. If no object is returned, use the following procedure to troubleshoot the ArgoCD pipeline flow from `SiteConfig` to the policy CRs. - -.Procedure - -. Check the synchronization of the `PolicyGenTemplate` to the hub cluster: -+ -[source,terminal] ----- -$ oc get policygentemplate -A ----- -or -+ -[source,terminal] ----- -$ oc get policygentemplate -n $NS ----- -+ -If the `PolicyGenTemplate` is not synchronized, one of the following situations has occurred: -+ -* The clusters application failed to synchronize the CR from the Git repository to the hub. Use the following command to verify this: -+ -[source,terminal] ----- -$ oc describe -n openshift-gitops application clusters ----- -+ -Check for `Status: Synced` and that the `Revision:` is the SHA of the commit you pushed to the subscribed repository. -+ -* The pre-sync hook failed, possibly due to a failure to pull the container image. Check the ArgoCD dashboard for the status of the pre-sync job in the *clusters* application. - -. Ensure the policies were copied to the cluster namespace. When ACM recognizes that policies apply to a `ManagedCluster`, ACM applies the policy CR objects to the cluster namespace: -+ -[source,terminal] ----- -$ oc get policy -n ----- -ACM copies all applicable common, group, and site policies here. The policy names are `` and ``. - -. Check the placement rule for any policies not copied to the cluster namespace. The `matchSelector` in the `PlacementRule` for those policies should match the labels on the `ManagedCluster`: -+ -[source,terminal] ----- -$ oc get placementrule -n $NS ----- - -. Make a note of the `PlacementRule` name for the missing common, group, or site policy: -+ -[source,terminal] ----- - oc get placementrule -n $NS -o yaml ----- -+ -* The `status decisions` value should include your cluster name. -* The `key value` of the `matchSelector` in the spec should match the labels on your managed cluster. Check the labels on `ManagedCluster`: -+ -[source,terminal] ----- - oc get ManagedCluster $CLUSTER -o jsonpath='{.metadata.labels}' | jq ----- -+ -.Example -[source,yaml] ----- -apiVersion: apps.open-cluster-management.io/v1 -kind: PlacementRule -metadata: - name: group-test1-policies-placementrules - namespace: group-test1-policies -spec: - clusterSelector: - matchExpressions: - - key: group-test1 - operator: In - values: - - "" -status: - decisions: - - clusterName: - clusterNamespace: ----- - -. Ensure all policies are compliant: -+ -[source,terminal] ----- - oc get policy -n $CLUSTER ----- -+ - -If the Namespace, OperatorGroup, and Subscription policies are compliant but the Operator configuration policies are not it is likely that the Operators did not install. diff --git a/modules/ztp-ztp-building-blocks.adoc b/modules/ztp-ztp-building-blocks.adoc index a54865841b..334ba99fec 100644 --- a/modules/ztp-ztp-building-blocks.adoc +++ b/modules/ztp-ztp-building-blocks.adoc @@ -7,13 +7,13 @@ = Zero touch provisioning building blocks -ACM deploys {sno}, which is {product-title} installed on single nodes, leveraging zero touch provisioning (ZTP). -The initial site plan is broken down into smaller components and initial configuration data is stored in a Git repository. Zero touch provisioning uses a declarative GitOps approach to deploy these nodes. -The deployment of the nodes includes: +{rh-rhacm-first} leverages zero touch provisioning (ZTP) to deploy single-node {product-title} clusters, three-node clusters, and standard clusters. The initial site plan is divided into smaller components and initial configuration data is stored in a Git repository. ZTP uses a declarative GitOps approach to deploy these clusters. + +The deployment of the clusters includes: * Installing the host operating system (RHCOS) on a blank server. -* Deploying {product-title} on single nodes. +* Deploying {product-title}. * Creating cluster policies and site subscriptions. diff --git a/modules/ztp-ztp-custom-resources.adoc b/modules/ztp-ztp-custom-resources.adoc index 72d26e456a..0d74e361c3 100644 --- a/modules/ztp-ztp-custom-resources.adoc +++ b/modules/ztp-ztp-custom-resources.adoc @@ -72,3 +72,22 @@ a| * `BMC Secret` authenticates into the target bare-metal host using its userna |Contains {product-title} image information such as the repository and image name. |Passed into resources to provide {product-title} images. |=== + +ZTP support for single node clusters, three-node clusters, and standard clusters requires updates to these CRs, including multiple instantiations of some. + +ZTP provides support for deploying single node clusters, three-node clusters, and standard OpenShift clusters. This includes the installation of OpenShift and deployment of the distributed units (DUs) at scale. + +The overall flow is identical to the ZTP support for single node clusters, with some differences in configuration depending on the type of cluster: + +`SiteConfig` file: + +* For single node clusters, the `SiteConfig` file must have exactly one entry in the `nodes` section. +* For three-node clusters, the `SiteConfig` file must have exactly three entries defined +in the `nodes` section. +* For standard clusters, the `SiteConfig` file must have exactly three entries in the `nodes` section with `role: master` and one or more additional entries with `role: worker`. + +`PolicyGenTemplate` file: + +* The example common `PolicyGenTemplate` file is common across all types of clusters. +* There are example group `PolicyGenTemplate` files for single node, three-node, and standard clusters. +* Site-specific `PolicyGenTemplate` files are still specific to each site. diff --git a/monitoring/using-rfhe.adoc b/monitoring/using-rfhe.adoc new file mode 100644 index 0000000000..b6c95ec40a --- /dev/null +++ b/monitoring/using-rfhe.adoc @@ -0,0 +1,51 @@ +:_content-type: ASSEMBLY +[id="using-rfhe"] += Monitoring bare-metal events with the {redfish-operator} +include::_attributes/common-attributes.adoc[] +:context: using-rfhe + +toc::[] + +:FeatureName: Bare Metal Event Relay +include::snippets/technology-preview.adoc[] + +[id="about-using-redfish-hardware-events"] +== About bare-metal events + +Use the {redfish-operator} to subscribe applications that run in your {product-title} cluster to events that are generated on the underlying bare-metal host. The Redfish service publishes events on a node and transmits them on an advanced message queue to subscribed applications. + +Bare-metal events are based on the open Redfish standard that is developed under the guidance of the Distributed Management Task Force (DMTF). Redfish provides a secure industry-standard protocol with a REST API. The protocol is used for the management of distributed, converged or software-defined resources and infrastructure. + +Hardware-related events published through Redfish includes: + +* Breaches of temperature limits +* Server status +* Fan status + +Begin using bare-metal events by deploying the {redfish-operator} Operator and subscribing your application to the service. The {redfish-operator} Operator installs and manages the lifecycle of the Redfish bare-metal event service. + +[NOTE] +==== +The {redfish-operator} works only with Redfish-capable devices on single-node clusters provisioned on bare-metal infrastructure. +==== + +include::modules/nw-rfhe-introduction.adoc[leveloffset=+1] + +include::modules/nw-rfhe-installing-operator-cli.adoc[leveloffset=+2] + +include::modules/nw-rfhe-installing-operator-web-console.adoc[leveloffset=+2] + +include::modules/hw-installing-amq-interconnect-messaging-bus.adoc[leveloffset=+1] + +[id="subscribing-hw-events"] +== Subscribing to Redfish BMC bare-metal events for a cluster node + +As a cluster administrator, you can subscribe to Redfish BMC events generated on a node in your cluster by creating a `BMCEventSubscription` custom resource (CR) for the node, creating a `HardwareEvent` CR for the event, and a `Secret` CR for the BMC. + +include::modules/nw-rfhe-creating-bmc-event-sub.adoc[leveloffset=+2] + +include::modules/nw-rfhe-quering-redfish-hardware-event-subs.adoc[leveloffset=+2] + +include::modules/nw-rfhe-creating-hardware-event.adoc[leveloffset=+2] + +include::modules/cnf-rfhe-notifications-api-refererence.adoc[leveloffset=+1] diff --git a/operators/operator-reference.adoc b/operators/operator-reference.adoc index 83bfe1d22f..e0d56d7f24 100644 --- a/operators/operator-reference.adoc +++ b/operators/operator-reference.adoc @@ -15,6 +15,7 @@ Cluster administrators can view platform Operators in the {product-title} web co Platform operators are not managed by Operator Lifecycle Manager (OLM) and OperatorHub. OLM and OperatorHub are part of the link:https://operatorframework.io/[Operator Framework] used in {product-title} for installing and running optional xref:../architecture/control-plane.adoc#olm-operators_control-plane[add-on Operators]. ==== +include::modules/baremetal-event-relay.adoc[leveloffset=+1] include::modules/cloud-credential-operator.adoc[leveloffset=+1] [discrete] diff --git a/scalability_and_performance/cnf-talm-for-cluster-upgrades.adoc b/scalability_and_performance/cnf-talm-for-cluster-upgrades.adoc new file mode 100644 index 0000000000..978a318213 --- /dev/null +++ b/scalability_and_performance/cnf-talm-for-cluster-upgrades.adoc @@ -0,0 +1,46 @@ +:_content-type: ASSEMBLY +[id="cnf-talm-for-cluster-updates"] += {cgu-operator-full} for cluster updates +include::_attributes/common-attributes.adoc[] +:context: cnf-topology-aware-lifecycle-manager + +toc::[] + +You can use the {cgu-operator-first} to manage the software lifecycle of multiple single-node OpenShift clusters. {cgu-operator} uses {rh-rhacm-first} policies to perform changes on the target clusters. + +:FeatureName: {cgu-operator-full} +include::snippets/technology-preview.adoc[] + +include::modules/cnf-about-topology-aware-lifecycle-manager-config.adoc[leveloffset=+1] + +include::modules/cnf-about-topology-aware-lifecycle-manager-policies.adoc[leveloffset=+1] + +include::modules/cnf-topology-aware-lifecycle-manager-installation-web-console.adoc[leveloffset=+1] + +include::modules/cnf-topology-aware-lifecycle-manager-installation-cli.adoc[leveloffset=+1] + +include::modules/cnf-topology-aware-lifecycle-manager-about-cgu-crs.adoc[leveloffset=+1] + +include::modules/cnf-about-topology-aware-lifecycle-manager-blocking-crs.adoc[leveloffset=+2] + +include::modules/cnf-topology-aware-lifecycle-manager-policies-concept.adoc[leveloffset=+1] + +[role="_additional-resources"] +.Additional resources + +For more information about `PolicyGenTemplate` CRD, see xref:../scalability_and_performance/ztp-deploying-disconnected.adoc#ztp-the-policygentemplate_ztp-deploying-disconnected[About the PolicyGenTemplate]. + +include::modules/cnf-topology-aware-lifecycle-manager-apply-policies.adoc[leveloffset=+2] + +include::modules/cnf-topology-aware-lifecycle-manager-precache-concept.adoc[leveloffset=+1] + +include::modules/cnf-topology-aware-lifecycle-manager-precache-feature.adoc[leveloffset=+2] + +include::modules/cnf-topology-aware-lifecycle-manager-troubleshooting.adoc[leveloffset=+1] + +[role="_additional-resources"] +.Additional resources + +* For information about troubleshooting, see xref:../support/troubleshooting/troubleshooting-operator-issues.adoc[OpenShift Container Platform Troubleshooting Operator Issues]. + +* For more information about using {cgu-operator-full} in the ZTP workflow, see xref:../scalability_and_performance/ztp-deploying-disconnected.adoc#cnf-topology-aware-lifecycle-manager[Updating managed policies with {cgu-operator-full}]. diff --git a/scalability_and_performance/ztp-deploying-disconnected.adoc b/scalability_and_performance/ztp-deploying-disconnected.adoc index 95a9660cf3..cb4238916c 100644 --- a/scalability_and_performance/ztp-deploying-disconnected.adoc +++ b/scalability_and_performance/ztp-deploying-disconnected.adoc @@ -9,57 +9,179 @@ toc::[] Use zero touch provisioning (ZTP) to provision distributed units at new edge sites in a disconnected environment. The workflow starts when the site is connected to the network and ends with the CNF workload deployed and running on the site nodes. -:FeatureName: ZTP for RAN deployments -include::snippets/technology-preview.adoc[leveloffset=+1] - include::modules/ztp-provisioning-edge-sites-at-scale.adoc[leveloffset=+1] +include::modules/about-ztp-and-distributed-units-on-openshift-clusters.adoc[leveloffset=+1] + include::modules/ztp-the-gitops-approach.adoc[leveloffset=+1] -include::modules/ztp-about-ztp-and-distributed-units-on-single-node-openshift-clusters.adoc[leveloffset=+1] - include::modules/ztp-ztp-building-blocks.adoc[leveloffset=+1] -include::modules/ztp-single-node-clusters.adoc[leveloffset=+1] +include::modules/ztp-how-to-plan-your-ran-policies.adoc[leveloffset=+1] +// Change title to How to plan your RAN policies -include::modules/ztp-site-planning-for-du-deployments.adoc[leveloffset=+1] + +// include::modules/ztp-single-node-clusters.adoc[leveloffset=+1] +// Remove this topic for now include::modules/ztp-low-latency-for-distributed-units-dus.adoc[leveloffset=+1] -include::modules/ztp-du-host-bios-requirements.adoc[leveloffset=+1] - include::modules/ztp-acm-preparing-to-install-disconnected-acm.adoc[leveloffset=+1] -include::modules/ztp-disconnected-environment-prereqs.adoc[leveloffset=+2] - -include::modules/installation-about-mirror-registry.adoc[leveloffset=+2] - [role="_additional-resources"] .Additional resources -* For information on viewing the CRI-O logs to view the image source, see xref:../installing/validating-an-installation.html#viewing-the-image-pull-source_validating-an-installation[Viewing the image pull source]. +* For more information about creating the disconnected mirror registry, see xref:../installing/disconnected_install/installing-mirroring-creating-registry.adoc#installing-mirroring-creating-registry[Creating a mirror registry]. -include::modules/ztp-installing-preparing-mirror.adoc[leveloffset=+2] +* For more information about mirroring OpenShift Platform image to the disconnected registry, see xref:../installing/disconnected_install/installing-mirroring-installation-images.html#installing-mirroring-installation-images[Mirroring images for a disconnected installation]. -include::modules/cli-installing-cli.adoc[leveloffset=+3] - -include::modules/installation-adding-registry-pull-secret.adoc[leveloffset=+3] - -include::modules/installation-mirror-repository.adoc[leveloffset=+3] - -include::modules/ztp-acm-adding-images-to-mirror-registry.adoc[leveloffset=+3] +include::modules/ztp-acm-adding-images-to-mirror-registry.adoc[leveloffset=+2] include::modules/ztp-acm-installing-disconnected-rhacm.adoc[leveloffset=+1] -// AI - include::modules/ztp-ai-install-ocp-clusters-on-bare-metal.adoc[leveloffset=+1] -// Custom resources +// Custom resources 340 include::modules/ztp-ztp-custom-resources.adoc[leveloffset=+1] -include::modules/ztp-creating-siteconfig-custom-resources.adoc[leveloffset=+1] +include::modules/ztp-policygentemplates-for-ran.adoc[leveloffset=+1] + +[role="_additional-resources"] +.Additional resources + +* For more information about extracting the `/argocd` directory from the `ztp-site-generator` container image, see xref:../scalability_and_performance/ztp-deploying-disconnected.adoc#ztp-policygentemplates-for-ran_ztp-deploying-disconnected[Preparing the ZTP Git repository]. + +include::modules/ztp-the-policygentemplate.adoc[leveloffset=+1] + +// Custom resources 340 1/2 + +include::modules/ztp-pgt-config-best-practices.adoc[leveloffset=+1] + +[role="_additional-resources"] +.Additional resources + +* For details about best practice for scaling clusters with {rh-rhacm-first}, see link:https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes/2.4/html/install/installing#performance-and-scalability[ACM performance and scalability considerations]. + +[NOTE] +==== +Scaling the hub cluster to managing large numbers of spoke clusters is affected by the number of policies created on the hub cluster. Grouping multiple configuration CRs into a single or limited number of policies is one way to reduce the overall number of policies on the hub cluster. When using the common/group/site hierarchy of policies for managing site configuration, it is especially important to combine site-specific configuration into a single policy. +==== + +// End of 340 content + +include::modules/ztp-creating-the-policygentemplate-cr.adoc[leveloffset=+1] + +include::modules/ztp-creating-ztp-custom-resources-for-multiple-managed-clusters.adoc[leveloffset=+1] + +include::modules/ztp-using-pgt-to-update-source-crs.adoc[leveloffset=+2] + +include::modules/ztp-configuring-ptp-fast-events.adoc[leveloffset=+2] + +include::modules/ztp-configuring-uefi-secure-boot.adoc[leveloffset=+2] + +include::modules/ztp-installing-the-gitops-ztp-pipeline.adoc[leveloffset=+1] + +include::modules/ztp-preparing-the-ztp-git-repository.adoc[leveloffset=+2] + +include::modules/ztp-preparing-the-hub-cluster-for-ztp.adoc[leveloffset=+2] + +// 340 2/2 + +include::modules/ztp-deploying-additional-changes-to-clusters.adoc[leveloffset=+2] + +[role="_additional-resources"] +.Additional resources + +* See xref:../scalability_and_performance/ztp-deploying-disconnected.adoc#ztp-adding-new-content-to-gitops-ztp_ztp-deploying-disconnected[Adding new content to the GitOps ZTP pipeline] for more information about adding or modifying existing source CRs in the `ztp-site-generate` container. + +* See xref:../scalability_and_performance/ztp-deploying-disconnected.adoc#ztp-customizing-the-install-extra-manifests_ztp-deploying-disconnected[Customizing the ZTP GitOps pipeline with extra manifests] for more information on adding extra manifests. + +include::modules/ztp-adding-new-content-to-gitops-ztp.adoc[leveloffset=+1] + +[role="_additional-resources"] +.Additional resources + +* Alternatively, you can patch the Argo CD instance as described in xref:../scalability_and_performance/ztp-deploying-disconnected.adoc#ztp-preparing-the-hub-cluster-for-ztp_ztp-deploying-disconnected[Preparing the hub cluster for ZTP] by modifying `argocd-openshift-gitops-patch.json` with an updated `initContainer` image before applying the patch file. + +include::modules/ztp-customizing-the-install-extra-manifests.adoc[leveloffset=+1] + +include::modules/ztp-deploying-a-site.adoc[leveloffset=+1] + +include::modules/ztp-talo-integration.adoc[leveloffset=+1] + +// End of 340 + +// include::modules/ztp-creating-the-site-secrets.adoc[leveloffset=+2] +// Remove this topic - keep the note and move to Step 3 in "Deploying a site" + +include::modules/ztp-monitoring-deployment-progress.adoc[leveloffset=+1] + +// Definition of done +include::modules/ztp-definition-of-done-for-ztp-installations.adoc[leveloffset=+1] +include::modules/ztp-creating-a-validator-inform-policy.adoc[leveloffset=+2] +include::modules/ztp-querying-the-policy-compliance-status-for-each-cluster.adoc[leveloffset=+2] + +include::modules/ztp-node-tuning-operator.adoc[leveloffset=+2] + +//Troubleshooting + +include::modules/ztp-troubleshooting-gitops-ztp.adoc[leveloffset=+1] + +include::modules/ztp-validating-the-generation-of-installation-crs.adoc[leveloffset=+2] + +include::modules/ztp-validating-the-generation-of-configuration-policy-crs.adoc[leveloffset=+2] + +include::modules/ztp-restarting-policies-reconciliation.adoc[leveloffset=+2] + +[role="_additional-resources"] +.Additional resources + +* For information about using {cgu-operator} to construct your own `ClusterGroupUpgrade` CR, see xref:../scalability_and_performance/cnf-talm-for-cluster-upgrades.adoc#talo-about-cgu-crs_cnf-topology-aware-lifecycle-manager[About the ClusterGroupUpgrade CR]. + +include::modules/ztp-site-cleanup.adoc[leveloffset=+1] + +[role="_additional-resources"] +.Additional resources + +* For information about removing a cluster, see link:https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes/2.4/html/clusters/managing-your-clusters#remove-managed-cluster[Removing a cluster from management]. + +include::modules/ztp-removing-obsolete-content.adoc[leveloffset=+2] + +include::modules/ztp-tearing-down-the-pipeline.adoc[leveloffset=+2] + +// Move this and "site cleanup" to after troubleshooting - create new modules from the +// https://github.com/openshift-kni/cnf-features-deploy/blob/master/ztp/gitops-subscriptions/argocd/Upgrade.md[Upgrading GitOps ZTP]. +// repo +include::modules/ztp-upgrading-gitops-ztp.adoc[leveloffset=+1] + +include::modules/ztp-preparing-for-the-gitops-ztp-upgrade.adoc[leveloffset=+2] + +include::modules/ztp-labeling-the-existing-clusters.adoc[leveloffset=+2] + +include::modules/ztp-stopping-the-existing-gitops-ztp-applications.adoc[leveloffset=+2] + +include::modules/ztp-topology-aware-lifecycle-manager.adoc[leveloffset=+2] + +[role="_additional-resources"] +.Additional resources +* For information about the {cgu-operator-first}, see xref:../scalability_and_performance/cnf-talm-for-cluster-upgrades.adoc#cnf-about-topology-aware-lifecycle-manager-config_cnf-topology-aware-lifecycle-manager[About the {cgu-operator-full} configuration]. + +include::modules/ztp-required-changes-to-the-git-repository.adoc[leveloffset=+2] + +include::modules/ztp-installing-the-new-gitops-ztp-applications.adoc[leveloffset=+2] + +include::modules/ztp-roll-out-the-configuration-changes.adoc[leveloffset=+2] + +[role="_additional-resources"] +.Additional resources +* For information about creating `ClusterGroupUpgrade` CRs, see xref:../scalability_and_performance/ztp-deploying-disconnected.adoc#talo-precache-autocreated-cgu-for-ztp_ztp-deploying-disconnected[About the auto-created ClusterGroupUpgrade CR for ZTP]. + +// Manual installation moved here + +include::modules/ztp-manually-install-a-single-managed-cluster.adoc[leveloffset=+1] + +include::modules/ztp-du-host-bios-requirements.adoc[leveloffset=+2] include::modules/ztp-configuring-a-static-ip.adoc[leveloffset=+2] @@ -71,63 +193,48 @@ include::modules/ztp-configuring-the-cluster-for-a-disconnected-environment.adoc include::modules/ztp-configuring-ipv6.adoc[leveloffset=+2] +include::modules/ztp-generating-ran-policies.adoc[leveloffset=+1] + include::modules/ztp-troubleshooting-the-managed-cluster.adoc[leveloffset=+2] +// TALO -// RAN policies - -include::modules/ztp-applying-the-ran-policies-for-monitoring-cluster-activity.adoc[leveloffset=+1] - -include::modules/ztp-applying-source-custom-resource-policies.adoc[leveloffset=+2] - -include::modules/ztp-the-policygentemplate.adoc[leveloffset=+2] - -include::modules/ztp-things-to-consider-when-creating-custom-resource-policies.adoc[leveloffset=+2] - -include::modules/ztp-generating-ran-policies.adoc[leveloffset=+2] - - -// Cluster provisioning - -include::modules/ztp-cluster-provisioning.adoc[leveloffset=+1] - -include::modules/ztp-machine-config-operator.adoc[leveloffset=+2] - -include::modules/ztp-node-tuning-operator.adoc[leveloffset=+2] - -include::modules/ztp-sriov-operator.adoc[leveloffset=+2] - -include::modules/ztp-precision-time-protocol-operator.adoc[leveloffset=+2] +include::modules/cnf-topology-aware-lifecycle-manager.adoc[leveloffset=+1] [role="_additional-resources"] .Additional resources -* For more information about using PTP hardware in your cluster nodes, see xref:../networking/using-ptp.adoc#using-ptp[Using PTP hardware]. +* For more information about the {cgu-operator-full}, see xref:../scalability_and_performance/cnf-talm-for-cluster-upgrades.adoc#cnf-about-topology-aware-lifecycle-manager-config_cnf-topology-aware-lifecycle-manager[About the {cgu-operator-full}]. -include::modules/ztp-creating-ztp-custom-resources-for-multiple-managed-clusters.adoc[leveloffset=+1] +include::modules/cnf-topology-aware-lifecycle-manager-autocreate-cgu-cr-ztp.adoc[leveloffset=+2] -include::modules/ztp-prerequisites-for-deploying-the-ztp-pipeline.adoc[leveloffset=+2] +include::modules/cnf-topology-aware-lifecycle-manager-preparing-for-updates.adoc[leveloffset=+1] -include::modules/ztp-installing-the-gitops-ztp-pipeline.adoc[leveloffset=+2] +[role="_additional-resources"] +.Additional resources -include::modules/ztp-preparing-the-ztp-git-repository.adoc[leveloffset=+3] +* For more information about how to update ZTP, see xref:../scalability_and_performance/ztp-deploying-disconnected.adoc#ztp-upgrading-gitops-ztp_ztp-deploying-disconnected[Upgrading GitOps ZTP]. -include::modules/ztp-preparing-the-hub-cluster-for-ztp.adoc[leveloffset=+3] +* For more information about how to mirror an {product-title} image repository, see xref:../installing/disconnected_install/installing-mirroring-installation-images.adoc#installation-mirror-repository_installing-mirroring-installation-images[Mirroring the {product-title} image repository]. -include::modules/ztp-creating-the-site-secrets.adoc[leveloffset=+2] +* For more information about how to mirror Operator catalogs for disconnected clusters, see xref:../installing/disconnected_install/installing-mirroring-installation-images.adoc#olm-mirror-catalog_installing-mirroring-installation-images[Mirroring Operator catalogs for use with disconnected clusters]. -include::modules/ztp-creating-the-siteconfig-custom-resources.adoc[leveloffset=+2] +* For more information about how to prepare the disconnected environment and mirroring the desired image repository, see xref:../scalability_and_performance/ztp-deploying-disconnected.adoc#ztp-acm-preparing-to-install-disconnected-acm_ztp-deploying-disconnected[Preparing the disconnected environment]. -include::modules/ztp-creating-the-policygentemplates.adoc[leveloffset=+2] +* For more information about update channels and releases, see xref:../updating/understanding-upgrade-channels-release.adoc[Understanding upgrade channels and releases]. -include::modules/ztp-checking-the-installation-status.adoc[leveloffset=+2] +include::modules/cnf-topology-aware-lifecycle-manager-platform-update.adoc[leveloffset=+2] -include::modules/ztp-site-cleanup.adoc[leveloffset=+2] +[role="_additional-resources"] +.Additional resources -include::modules/ztp-removing-the-argocd-pipeline.adoc[leveloffset=+3] +* For more information about mirroring the images in a disconnected environment, xref:../scalability_and_performance/ztp-deploying-disconnected.adoc#ztp-acm-preparing-to-install-disconnected-acm_ztp-deploying-disconnected[Preparing the disconnected environment] -include::modules/ztp-troubleshooting-gitops-ztp.adoc[leveloffset=+1] +include::modules/cnf-topology-aware-lifecycle-manager-operator-update.adoc[leveloffset=+2] -include::modules/ztp-validating-the-generation-of-installation-crs.adoc[leveloffset=+2] +[role="_additional-resources"] +.Additional resources -include::modules/ztp-validating-the-generation-of-policy-crs.adoc[leveloffset=+2] +* For more information about updating GitOps ZTP, see xref:../scalability_and_performance/ztp-deploying-disconnected.adoc#ztp-upgrading-gitops-ztp_ztp-deploying-disconnected[Upgrading GitOps ZTP]. + +include::modules/cnf-topology-aware-lifecycle-manager-operator-and-platform-update.adoc[leveloffset=+2] diff --git a/snippets/developer-preview.adoc b/snippets/developer-preview.adoc new file mode 100644 index 0000000000..dc2d7c30a9 --- /dev/null +++ b/snippets/developer-preview.adoc @@ -0,0 +1,10 @@ +// When including this file, ensure that {FeatureName} is set immediately before the include. Otherwise it will result in an incorrect replacement. +// Note: Developer Preview features are not typically included in core OpenShift documentation. + +[IMPORTANT] +==== +[subs="attributes+"] +{FeatureName} is a Developer Preview feature only. Developer Preview features are not supported with Red Hat production service level agreements (SLAs) and are not functionally complete or production-ready. Do not use Developer Preview features for production or business-critical workloads. Developer Preview features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process. These features might not have any documentation, and testing is limited. Red Hat might provide ways to submit feedback on Developer Preview features without an associated SLA. +==== +// Undefine {FeatureName} attribute, so that any mistakes are easily spotted +:!FeatureName: