1
0
mirror of https://github.com/openshift/openshift-docs.git synced 2026-02-05 12:46:18 +01:00

TELCODOCS-364, 370, 340, 261, 285: 4.10 consolidated RAN docs

This commit is contained in:
Alexandra Molnar
2022-05-06 12:26:53 +01:00
parent 36ec506416
commit f513e56cd4
107 changed files with 5356 additions and 1147 deletions

View File

@@ -132,5 +132,10 @@ endif::[]
:ibmzProductName: IBM Z
// Red Hat Quay Container Security Operator
:rhq-cso: Red Hat Quay Container Security Operator
:sno: single-node OpenShift
:sno-caps: Single-node OpenShift
:sno: single-node Openshift
:sno-caps: Single-node Openshift
//TALO and Redfish events Operators
:cgu-operator-first: Topology Aware Lifecycle Manager (TALM)
:cgu-operator-full: Topology Aware Lifecycle Manager
:cgu-operator: TALM
:redfish-operator: Bare Metal Event Relay

View File

@@ -2223,6 +2223,8 @@ Topics:
File: managing-alerts
- Name: Reviewing monitoring dashboards
File: reviewing-monitoring-dashboards
- Name: Monitoring bare-metal events
File: using-rfhe
- Name: Accessing third-party monitoring APIs
File: accessing-third-party-monitoring-apis
- Name: Troubleshooting monitoring issues
@@ -2278,6 +2280,8 @@ Topics:
Distros: openshift-origin,openshift-enterprise
- Name: Improving cluster stability in high latency environments using worker latency profiles
File: scaling-worker-latency-profiles
- Name: Topology Aware Lifecycle Manager for cluster updates
File: cnf-talm-for-cluster-upgrades
Distros: openshift-origin,openshift-enterprise
- Name: Creating a performance profile
File: cnf-create-performance-profiles

Binary file not shown.

After

Width:  |  Height:  |  Size: 76 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 74 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 100 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 76 KiB

View File

@@ -0,0 +1,25 @@
// Module included in the following assemblies:
//
// *scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: CONCEPT
[id="about-ztp-and-distributed-units-on-openshift-clusters_{context}"]
= About ZTP and distributed units on OpenShift clusters
You can install a distributed unit (DU) on {product-title} clusters at scale with {rh-rhacm-first} using the assisted installer (AI) and the policy generator with core-reduction technology enabled. The DU installation is done using zero touch provisioning (ZTP) in a disconnected environment.
{rh-rhacm} manages clusters in a hub-and-spoke architecture, where a single hub cluster manages many spoke clusters. {rh-rhacm} applies radio access network (RAN) policies from predefined custom resources (CRs). Hub clusters running ACM provision and deploy the spoke clusters using ZTP and AI. DU installation follows the AI installation of {product-title} on each cluster.
The AI service handles provisioning of {product-title} on single node clusters, three-node clusters, or standard clusters running on bare metal. ACM ships with and deploys the AI when the `MultiClusterHub` custom resource is installed.
With ZTP and AI, you can provision {product-title} clusters to run your DUs at scale. A high-level overview of ZTP for distributed units in a disconnected environment is as follows:
* A hub cluster running {rh-rhacm-first} manages a disconnected internal registry that mirrors the {product-title} release images. The internal registry is used to provision the spoke clusters.
* You manage the bare metal host machines for your DUs in an inventory file that uses YAML for formatting. You store the inventory file in a Git repository.
* You install the DU bare metal host machines on site, and make the hosts ready for provisioning. To be ready for provisioning, the following is required for each bare metal host:
** Network connectivity - including DNS for your network. Hosts should be reachable through the hub and managed spoke clusters. Ensure there is layer 3 connectivity between the hub and the host where you want to install your hub cluster.
** Baseboard Management Controller (BMC) details for each host - ZTP uses BMC details to connect the URL and credentials for accessing the BMC. ZTP manages the spoke cluster definition CRs, with the exception of the `BMCSecret` CR, which you create manually. These define the relevant elements for the managed clusters.

View File

@@ -0,0 +1,52 @@
// Module included in the following assemblies:
//
// * operators/operator-reference.adoc
[id="baremetal-event-relay_{context}"]
= {redfish-operator}
[discrete]
== Purpose
The OpenShift {redfish-operator} manages the life-cycle of the Bare Metal Event Relay. The Bare Metal Event Relay enables you to configure the types of cluster event that are monitored using Redfish hardware events.
[discrete]
== Configuration Objects
You can use this command to edit the configuration after installation: for example, the webhook port.
You can edit configuration objects with:
[source,terminal]
----
$ oc -n [namespace] edit cm hw-event-proxy-operator-manager-config
----
[source,terminal]
----
apiVersion: controller-runtime.sigs.k8s.io/v1alpha1
kind: ControllerManagerConfig
health:
healthProbeBindAddress: :8081
metrics:
bindAddress: 127.0.0.1:8080
webhook:
port: 9443
leaderElection:
leaderElect: true
resourceName: 6e7a703c.redhat-cne.org
----
[discrete]
== Project
link:https://github.com/redhat-cne/hw-event-proxy-operator[hw-event-proxy-operator]
[discrete]
== CRD
The proxy enables applications running on bare-metal clusters to respond quickly to Redfish hardware changes and failures such as breaches of temperature thresholds, fan failure, disk loss, power outages, and memory failure, reported using the HardwareEvent CR.
`hardwareevents.event.redhat-cne.org`:
* Scope: Namespaced
* CR: HardwareEvent
* Validation: Yes
[discrete]
== Additional Resources
You can learn more in the topic xref:../monitoring/using-rfhe.adoc[Monitoring Redfish hardware events].

View File

@@ -0,0 +1,381 @@
// Module included in the following assemblies:
// Epic CNF-2600 (CNF-2133) (4.10), Story TELCODOCS-285
// * scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: PROCEDURE
[id="cnf-about-topology-aware-lifecycle-manager-blocking-crs_{context}"]
= Blocking ClusterGroupUpgrade CRs
You can create multiple `ClusterGroupUpgrade` CRs and control their order of application.
For example, if you create `ClusterGroupUpgrade` CR C that blocks the start of `ClusterGroupUpgrade` CR A, then `ClusterGroupUpgrade` CR A cannot start until the status of `ClusterGroupUpgrade` CR C becomes `UpgradeComplete`.
One `ClusterGroupUpgrade` CR can have multiple blocking CRs. In this case, all the blocking CRs must complete before the upgrade for the current CR can start.
.Prerequisites
* Install the {cgu-operator-first}.
* Provision one or more managed clusters.
* Log in as a user with `cluster-admin` privileges.
* Create {rh-rhacm} policies in the hub cluster.
.Procedure
. Save the content of the `ClusterGroupUpgrade` CRs in the `cgu-a.yaml`, `cgu-b.yaml`, and `cgu-c.yaml` files.
+
[source,yaml]
----
apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
name: cgu-a
namespace: default
spec:
blockingCRs: <1>
- name: cgu-c
namespace: default
clusters:
- spoke1
- spoke2
- spoke3
enable: false
managedPolicies:
- policy1-common-cluster-version-policy
- policy2-common-pao-sub-policy
- policy3-common-ptp-sub-policy
remediationStrategy:
canaries:
- spoke1
maxConcurrency: 2
timeout: 240
status:
conditions:
- message: The ClusterGroupUpgrade CR is not enabled
reason: UpgradeNotStarted
status: "False"
type: Ready
copiedPolicies:
- cgu-a-policy1-common-cluster-version-policy
- cgu-a-policy2-common-pao-sub-policy
- cgu-a-policy3-common-ptp-sub-policy
managedPoliciesForUpgrade:
- name: policy1-common-cluster-version-policy
namespace: default
- name: policy2-common-pao-sub-policy
namespace: default
- name: policy3-common-ptp-sub-policy
namespace: default
placementBindings:
- cgu-a-policy1-common-cluster-version-policy
- cgu-a-policy2-common-pao-sub-policy
- cgu-a-policy3-common-ptp-sub-policy
placementRules:
- cgu-a-policy1-common-cluster-version-policy
- cgu-a-policy2-common-pao-sub-policy
- cgu-a-policy3-common-ptp-sub-policy
remediationPlan:
- - spoke1
- - spoke2
----
<1> Defines the blocking CRs. The `cgu-a` update cannot start until `cgu-c` is complete.
+
[source,yaml]
----
apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
name: cgu-b
namespace: default
spec:
blockingCRs: <1>
- name: cgu-a
namespace: default
clusters:
- spoke4
- spoke5
enable: false
managedPolicies:
- policy1-common-cluster-version-policy
- policy2-common-pao-sub-policy
- policy3-common-ptp-sub-policy
- policy4-common-sriov-sub-policy
remediationStrategy:
maxConcurrency: 1
timeout: 240
status:
conditions:
- message: The ClusterGroupUpgrade CR is not enabled
reason: UpgradeNotStarted
status: "False"
type: Ready
copiedPolicies:
- cgu-b-policy1-common-cluster-version-policy
- cgu-b-policy2-common-pao-sub-policy
- cgu-b-policy3-common-ptp-sub-policy
- cgu-b-policy4-common-sriov-sub-policy
managedPoliciesForUpgrade:
- name: policy1-common-cluster-version-policy
namespace: default
- name: policy2-common-pao-sub-policy
namespace: default
- name: policy3-common-ptp-sub-policy
namespace: default
- name: policy4-common-sriov-sub-policy
namespace: default
placementBindings:
- cgu-b-policy1-common-cluster-version-policy
- cgu-b-policy2-common-pao-sub-policy
- cgu-b-policy3-common-ptp-sub-policy
- cgu-b-policy4-common-sriov-sub-policy
placementRules:
- cgu-b-policy1-common-cluster-version-policy
- cgu-b-policy2-common-pao-sub-policy
- cgu-b-policy3-common-ptp-sub-policy
- cgu-b-policy4-common-sriov-sub-policy
remediationPlan:
- - spoke4
- - spoke5
status: {}
----
<1> The `cgu-b` update cannot start until `cgu-a` is complete.
+
[source,yaml]
----
apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
name: cgu-c
namespace: default
spec: <1>
clusters:
- spoke6
enable: false
managedPolicies:
- policy1-common-cluster-version-policy
- policy2-common-pao-sub-policy
- policy3-common-ptp-sub-policy
- policy4-common-sriov-sub-policy
remediationStrategy:
maxConcurrency: 1
timeout: 240
status:
conditions:
- message: The ClusterGroupUpgrade CR is not enabled
reason: UpgradeNotStarted
status: "False"
type: Ready
copiedPolicies:
- cgu-c-policy1-common-cluster-version-policy
- cgu-c-policy4-common-sriov-sub-policy
managedPoliciesCompliantBeforeUpgrade:
- policy2-common-pao-sub-policy
- policy3-common-ptp-sub-policy
managedPoliciesForUpgrade:
- name: policy1-common-cluster-version-policy
namespace: default
- name: policy4-common-sriov-sub-policy
namespace: default
placementBindings:
- cgu-c-policy1-common-cluster-version-policy
- cgu-c-policy4-common-sriov-sub-policy
placementRules:
- cgu-c-policy1-common-cluster-version-policy
- cgu-c-policy4-common-sriov-sub-policy
remediationPlan:
- - spoke6
status: {}
----
<1> The `cgu-c` update does not have any blocking CRs. {cgu-operator} starts the `cgu-c` update when the `enable` field is set to `true`.
. Create the `ClusterGroupUpgrade` CRs by running the following command for each relevant CR:
+
[source,terminal]
----
$ oc apply -f <name>.yaml
----
. Start the update process by running the following command for each relevant CR:
+
[source,terminal]
----
$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/<name> \
--type merge -p '{"spec":{"enable":true}}'
----
+
The following examples show `ClusterGroupUpgrade` CRs where the `enable` field is set to `true`:
+
.Example for `cgu-a` with blocking CRs
+
[source,yaml]
----
apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
name: cgu-a
namespace: default
spec:
blockingCRs:
- name: cgu-c
namespace: default
clusters:
- spoke1
- spoke2
- spoke3
enable: true
managedPolicies:
- policy1-common-cluster-version-policy
- policy2-common-pao-sub-policy
- policy3-common-ptp-sub-policy
remediationStrategy:
canaries:
- spoke1
maxConcurrency: 2
timeout: 240
status:
conditions:
- message: 'The ClusterGroupUpgrade CR is blocked by other CRs that have not yet
completed: [cgu-c]' <1>
reason: UpgradeCannotStart
status: "False"
type: Ready
copiedPolicies:
- cgu-a-policy1-common-cluster-version-policy
- cgu-a-policy2-common-pao-sub-policy
- cgu-a-policy3-common-ptp-sub-policy
managedPoliciesForUpgrade:
- name: policy1-common-cluster-version-policy
namespace: default
- name: policy2-common-pao-sub-policy
namespace: default
- name: policy3-common-ptp-sub-policy
namespace: default
placementBindings:
- cgu-a-policy1-common-cluster-version-policy
- cgu-a-policy2-common-pao-sub-policy
- cgu-a-policy3-common-ptp-sub-policy
placementRules:
- cgu-a-policy1-common-cluster-version-policy
- cgu-a-policy2-common-pao-sub-policy
- cgu-a-policy3-common-ptp-sub-policy
remediationPlan:
- - spoke1
- - spoke2
status: {}
----
<1> Shows the list of blocking CRs.
+
.Example for `cgu-b` with blocking CRs
+
[source,yaml]
----
apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
name: cgu-b
namespace: default
spec:
blockingCRs:
- name: cgu-a
namespace: default
clusters:
- spoke4
- spoke5
enable: true
managedPolicies:
- policy1-common-cluster-version-policy
- policy2-common-pao-sub-policy
- policy3-common-ptp-sub-policy
- policy4-common-sriov-sub-policy
remediationStrategy:
maxConcurrency: 1
timeout: 240
status:
conditions:
- message: 'The ClusterGroupUpgrade CR is blocked by other CRs that have not yet
completed: [cgu-a]' <1>
reason: UpgradeCannotStart
status: "False"
type: Ready
copiedPolicies:
- cgu-b-policy1-common-cluster-version-policy
- cgu-b-policy2-common-pao-sub-policy
- cgu-b-policy3-common-ptp-sub-policy
- cgu-b-policy4-common-sriov-sub-policy
managedPoliciesForUpgrade:
- name: policy1-common-cluster-version-policy
namespace: default
- name: policy2-common-pao-sub-policy
namespace: default
- name: policy3-common-ptp-sub-policy
namespace: default
- name: policy4-common-sriov-sub-policy
namespace: default
placementBindings:
- cgu-b-policy1-common-cluster-version-policy
- cgu-b-policy2-common-pao-sub-policy
- cgu-b-policy3-common-ptp-sub-policy
- cgu-b-policy4-common-sriov-sub-policy
placementRules:
- cgu-b-policy1-common-cluster-version-policy
- cgu-b-policy2-common-pao-sub-policy
- cgu-b-policy3-common-ptp-sub-policy
- cgu-b-policy4-common-sriov-sub-policy
remediationPlan:
- - spoke4
- - spoke5
status: {}
----
<1> Shows the list of blocking CRs.
+
.Example for `cgu-c` with blocking CRs
+
[source,yaml]
----
apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
name: cgu-c
namespace: default
spec:
clusters:
- spoke6
enable: true
managedPolicies:
- policy1-common-cluster-version-policy
- policy2-common-pao-sub-policy
- policy3-common-ptp-sub-policy
- policy4-common-sriov-sub-policy
remediationStrategy:
maxConcurrency: 1
timeout: 240
status:
conditions:
- message: The ClusterGroupUpgrade CR has upgrade policies that are still non compliant <1>
reason: UpgradeNotCompleted
status: "False"
type: Ready
copiedPolicies:
- cgu-c-policy1-common-cluster-version-policy
- cgu-c-policy4-common-sriov-sub-policy
managedPoliciesCompliantBeforeUpgrade:
- policy2-common-pao-sub-policy
- policy3-common-ptp-sub-policy
managedPoliciesForUpgrade:
- name: policy1-common-cluster-version-policy
namespace: default
- name: policy4-common-sriov-sub-policy
namespace: default
placementBindings:
- cgu-c-policy1-common-cluster-version-policy
- cgu-c-policy4-common-sriov-sub-policy
placementRules:
- cgu-c-policy1-common-cluster-version-policy
- cgu-c-policy4-common-sriov-sub-policy
remediationPlan:
- - spoke6
status:
currentBatch: 1
remediationPlanForBatch:
spoke6: 0
----
<1> The `cgu-c` update does not have any blocking CRs.

View File

@@ -0,0 +1,18 @@
// Module included in the following assemblies:
// Epic CNF-2600 (CNF-2133) (4.10), Story TELCODOCS-285
// * scalability_and_performance/cnf-talm-for-cluster-upgrades.adoc
:_content-type: CONCEPT
[id="cnf-about-topology-aware-lifecycle-manager-config_{context}"]
= About the {cgu-operator-full} configuration
The {cgu-operator-first} manages the deployment of {rh-rhacm-first} policies for one or more {product-title} clusters. Using {cgu-operator} in a large network of clusters allows the phased rollout of policies to the clusters in limited batches. This helps to minimize possible service disruptions when updating. With {cgu-operator}, you can control the following actions:
* The timing of the update
* The number of {rh-rhacm}-managed clusters
* The subset of managed clusters to apply the policies to
* The update order of the clusters
* The set of policies remediated to the cluster
* The order of policies remediated to the cluster
{cgu-operator} supports the orchestration of the {product-title} y-stream and z-stream updates, and day-two operations on y-streams and z-streams.

View File

@@ -0,0 +1,21 @@
// Module included in the following assemblies:
// Epic CNF-2600 (CNF-2133) (4.10), Story TELCODOCS-285
// * scalability_and_performance/cnf-talm-for-cluster-upgrades.adoc
:_content-type: CONCEPT
[id="cnf-about-topology-aware-lifecycle-manager-about-policies_{context}"]
= About managed policies used with {cgu-operator-full}
The {cgu-operator-first} uses {rh-rhacm} policies for cluster updates.
{cgu-operator} can be used to manage the rollout of any policy CR where the `remediationAction` field is set to `inform`.
Supported use cases include the following:
* Manual user creation of policy CRs
* Automatically generated policies from the `PolicyGenTemplate` custom resource definition (CRD)
For policies that update an Operator subscription with manual approval, {cgu-operator} provides additional functionality that approves the installation of the updated Operator.
For more information about managed policies, see link:https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes/2.4/html-single/governance/index#policy-overview[Policy Overview] in the {rh-rhacm} documentation.
For more information about the `PolicyGenTemplate` CRD, see the "About the PolicyGenTemplate" section in "Deploying distributed units at scale in a disconnected environment".

View File

@@ -1,112 +0,0 @@
// CNF-950 4.7 Provisioning and deploying a Distributed Unit (DU) manually
// Module included in the following assemblies:
//
// *scalability_and_performance/cnf-provisioning-and-deploying-a-distributed-unit.adoc
[id="cnf-provisioning-deploying-a-distributed-unit-(du)-manually_{context}"]
= Provisioning and deploying a distributed unit (DU) manually
Radio access network (RAN) is composed of central units (CU), distributed units (DU), and radio units (RU).
RAN from the telecommunications standard perspective is shown below:
image::135_OpenShift_Distributed_Unit_0121.svg[High level RAN overview]
From the three components composing RAN, the CU and DU can be virtualized and implemented as cloud-native functions.
The CU and DU split architecture is driven by real-time computing and networking requirements. A DU can be seen as a real-time part of a
telecommunication baseband unit.
One distributed unit may aggregate several cells. A CU can be seen as a non-realtime part of a baseband unit, aggregating
traffic from one or more distributed units.
A cell in the context of a DU can be seen as a real-time application performing intensive digital signal processing, data transfer,
and algorithmic tasks.
Cells often use hardware acceleration (FPGA, GPU, eASIC) for DSP processing offload, but there are also software-only implementations
(FlexRAN), based on AVX-512 instructions.
Running cell application on COTS hardware requires the following features to be enabled:
* Real-time kernel
* CPU isolation
* NUMA awareness
* Huge pages memory management
* Precision timing synchronization using PTP
* AVX-512 instruction set (for Flexran and / or FPGA implementation)
* Additional features depending on the RAN Operator requirements
Accessing hardware acceleration devices and high throughput network interface controllers by virtualized software applications
requires use of SR-IOV and Passthrough PCI device virtualization.
In addition to the compute and acceleration requirements, DUs operate on multiple internal and external networks.
[id="cnf-manifest-structure_{context}"]
== The manifest structure
The profile is built from one cluster specific folder and one or more site-specific folders.
This is done to address a deployment that includes remote worker nodes, with several sites belonging to the same cluster.
The [`cluster-config`](ran-profile/cluster-config) directory contains performance and PTP customizations based upon
Operator deployments in [`deploy`](../feature-configs/deploy) folder.
The [`site.1.fqdn`](site.1.fqdn) folder contains site-specific network customizations.
[id="cnf-du-prerequisites_{context}"]
== Prerequisites
Before installing the Operators and deploying the DU, perform the following steps.
. Create a machine config pool for the RAN worker nodes. For example:
+
[source,yaml]
----
cat <<EOF | oc apply -f -
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
name: worker-cnf
labels:
machineconfiguration.openshift.io/role: worker-cnf
spec:
machineConfigSelector:
matchExpressions:
- {
key: machineconfiguration.openshift.io/role,
operator: In,
values: [worker-cnf, worker],
}
paused: false
nodeSelector:
matchLabels:
node-role.kubernetes.io/worker-cnf: ""
EOF
----
. Include the worker node in the above machine config pool by labeling it with the `node-role.kubernetes.io/worker-cnf` label:
+
[source,terminal]
----
$ oc label --overwrite node/<your node name> node-role.kubernetes.io/worker-cnf=""
----
. Label the node as PTP slave (DU only):
+
[source,terminal]
----
$ oc label --overwrite node/<your node name> ptp/slave=""
----
[id="cnf-du-configuration-notes_{context}"]
== SR-IOV configuration notes
The `SriovNetworkNodePolicy` object must be configured differently for different NIC models and placements.
|====================
|*Manufacturer* |*deviceType* |*isRdma*
|Intel |vfio-pci or netdevice |false
|Mellanox |netdevice |structure
|====================
In addition, when configuring the `nicSelector`, the `pfNames` value must match the intended interface name on the specific host.
If there is a mixed cluster where some of the nodes are deployed with Intel NICs and some with Mellanox, several SR-IOV configurations can be
created with the same `resourceName`. The device plug-in will discover only the available ones and will put the capacity on the node accordingly.

View File

@@ -0,0 +1,161 @@
// Module included in the following assemblies:
//
// * monitoring/using-rfhe.adoc
:_content-type: REFERENCE
[id="cnf-rfhe-notifications-api-refererence_{context}"]
= Subscribing applications to bare-metal events REST API reference
Use the bare-metal events REST API to subscribe an application to the bare-metal events that are generated on the parent node.
Subscribe applications to Redfish events by using the resource address `/cluster/node/<node_name>/redfish/event`, where `<node_name>` is the cluster node running the application.
Deploy your `cloud-event-consumer` application container and `cloud-event-proxy` sidecar container in a separate application pod. The `cloud-event-consumer` application subscribes to the `cloud-event-proxy` container in the application pod.
Use the following API endpoints to subscribe the `cloud-event-consumer` application to Redfish events posted by the `cloud-event-proxy` container at [x-]`http://localhost:8089/api/cloudNotifications/v1/` in the application pod:
* `/api/cloudNotifications/v1/subscriptions`
- `POST`: Creates a new subscription
- `GET`: Retrieves a list of subscriptions
* `/api/cloudNotifications/v1/subscriptions/<subscription_id>`
- `GET`: Returns details for the specified subscription ID
* `api/cloudNotifications/v1/subscriptions/status/<subscription_id>`
- `PUT`: Creates a new status ping request for the specified subscription ID
* `/api/cloudNotifications/v1/health`
- `GET`: Returns the health status of `cloudNotifications` API
[NOTE]
====
`9089` is the default port for the `cloud-event-consumer` container deployed in the application pod. You can configure a different port for your application as required.
====
[discrete]
== api/cloudNotifications/v1/subscriptions
[discrete]
=== HTTP method
`GET api/cloudNotifications/v1/subscriptions`
[discrete]
==== Description
Returns a list of subscriptions. If subscriptions exist, a `200 OK` status code is returned along with the list of subscriptions.
.Example API response
[source,json]
----
[
{
"id": "ca11ab76-86f9-428c-8d3a-666c24e34d32",
"endpointUri": "http://localhost:9089/api/cloudNotifications/v1/dummy",
"uriLocation": "http://localhost:8089/api/cloudNotifications/v1/subscriptions/ca11ab76-86f9-428c-8d3a-666c24e34d32",
"resource": "/cluster/node/openshift-worker-0.openshift.example.com/redfish/event"
}
]
----
[discrete]
=== HTTP method
`POST api/cloudNotifications/v1/subscriptions`
[discrete]
==== Description
Creates a new subscription. If a subscription is successfully created, or if it already exists, a `201 Created` status code is returned.
.Query parameters
|===
| Parameter | Type
| subscription
| data
|===
.Example payload
[source,json]
----
{
"uriLocation": "http://localhost:8089/api/cloudNotifications/v1/subscriptions",
"resource": "/cluster/node/openshift-worker-0.openshift.example.com/redfish/event"
}
----
[discrete]
== api/cloudNotifications/v1/subscriptions/<subscription_id>
[discrete]
=== HTTP method
`GET api/cloudNotifications/v1/subscriptions/<subscription_id>`
[discrete]
==== Description
Returns details for the subscription with ID `<subscription_id>`
.Query parameters
|===
| Parameter | Type
| `<subscription_id>`
| string
|===
.Example API response
[source,json]
----
{
"id":"ca11ab76-86f9-428c-8d3a-666c24e34d32",
"endpointUri":"http://localhost:9089/api/cloudNotifications/v1/dummy",
"uriLocation":"http://localhost:8089/api/cloudNotifications/v1/subscriptions/ca11ab76-86f9-428c-8d3a-666c24e34d32",
"resource":"/cluster/node/openshift-worker-0.openshift.example.com/redfish/event"
}
----
[discrete]
== api/cloudNotifications/v1/subscriptions/status/<subscription_id>
[discrete]
=== HTTP method
`PUT api/cloudNotifications/v1/subscriptions/status/<subscription_id>`
[discrete]
==== Description
Creates a new status ping request for subscription with ID `<subscription_id>`. If a subscription is present, the status request is successful and a `202 Accepted` status code is returned.
.Query parameters
|===
| Parameter | Type
| `<subscription_id>`
| string
|===
.Example API response
[source,json]
----
{"status":"ping sent"}
----
[discrete]
== api/cloudNotifications/v1/health/
[discrete]
=== HTTP method
`GET api/cloudNotifications/v1/health/`
[discrete]
==== Description
Returns the health status for the `cloudNotifications` REST API.
.Example API response
[source,terminal]
----
OK
----

View File

@@ -0,0 +1,250 @@
// Module included in the following assemblies:
// Epic CNF-2600 (CNF-2133) (4.10), Story TELCODOCS-285
// * scalability_and_performance/cnf-talm-for-cluster-upgrades.adoc
:_content-type: CONCEPT
[id="talo-about-cgu-crs_{context}"]
= About the ClusterGroupUpgrade CR
The {cgu-operator-first} builds the remediation plan from the `ClusterGroupUpgrade` CR for a group of clusters. You can define the following specifications in a `ClusterGroupUpgrade` CR:
* Clusters in the group
* Blocking `ClusterGroupUpgrade` CRs
* Applicable list of managed policies
* Number of concurrent updates
* Applicable canary updates
* Actions to perform before and after the update
* Update timing
As {cgu-operator} works through remediation of the policies to the specified clusters, the `ClusterGroupUpgrade` CR can have the following states:
* `UpgradeNotStarted`
* `UpgradeCannotStart`
* `UpgradeNotComplete`
* `UpgradeTimedOut`
* `UpgradeCompleted`
* `PrecachingRequired`
[NOTE]
====
After {cgu-operator} completes a cluster update, the cluster does not update again under the control of the same `ClusterGroupUpgrade` CR. You must create a new `ClusterGroupUpgrade` CR in the following cases:
* When you need to update the cluster again
* When the cluster changes to non-compliant with the `inform` policy after being updated
====
[id="upgrade_not_started"]
== The UpgradeNotStarted state
The initial state of the `ClusterGroupUpgrade` CR is `UpgradeNotStarted`.
{cgu-operator} builds a remediation plan based on the following fields:
* The `clusterSelector` field specifies the labels of the clusters that you want to update.
* The `clusters` field specifies a list of clusters to update.
* The `canaries` field specifies the clusters for canary updates.
* The `maxConcurrency` field specifies the number of clusters to update in a batch.
You can use the `clusters` and the `clusterSelector` fields together to create a combined list of clusters.
The remediation plan starts with the clusters listed in the `canaries` field. Each canary cluster forms a single-cluster batch.
[NOTE]
====
Any failures during the update of a canary cluster stops the update process.
====
The `ClusterGroupUpgrade` CR transitions to the `UpgradeNotCompleted` state after the remediation plan is successfully created and after the `enable` field is set to `true`. At this point, {cgu-operator} starts to update the non-compliant clusters with the specified managed policies.
[NOTE]
====
You can only make changes to the `spec` fields if the `ClusterGroupUpgrade` CR is either in the `UpgradeNotStarted` or the `UpgradeCannotStart` state.
====
.Sample `ClusterGroupUpgrade` CR in the `UpgradeNotStarted` state
[source,yaml]
----
apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
name: cgu-upgrade-complete
namespace: default
spec:
clusters: <1>
- spoke1
enable: false
managedPolicies: <2>
- policy1-common-cluster-version-policy
- policy2-common-pao-sub-policy
remediationStrategy: <3>
canaries: <4>
- spoke1
maxConcurrency: 1 <5>
timeout: 240
status: <6>
conditions:
- message: The ClusterGroupUpgrade CR is not enabled
reason: UpgradeNotStarted
status: "False"
type: Ready
copiedPolicies:
- cgu-upgrade-complete-policy1-common-cluster-version-policy
- cgu-upgrade-complete-policy2-common-pao-sub-policy
managedPoliciesForUpgrade:
- name: policy1-common-cluster-version-policy
namespace: default
- name: policy2-common-pao-sub-policy
namespace: default
placementBindings:
- cgu-upgrade-complete-policy1-common-cluster-version-policy
- cgu-upgrade-complete-policy2-common-pao-sub-policy
placementRules:
- cgu-upgrade-complete-policy1-common-cluster-version-policy
- cgu-upgrade-complete-policy2-common-pao-sub-policy
remediationPlan:
- - spoke1
----
<1> Defines the list of clusters to update.
<2> Lists the user-defined set of policies to remediate.
<3> Defines the specifics of the cluster updates.
<4> Defines the clusters for canary updates.
<5> Defines the maximum number of concurrent updates in a batch. The number of remediation batches is the number of canary clusters, plus the number of clusters, except the canary clusters, divided by the `maxConcurrency` value. The clusters that are already compliant with all the managed policies are excluded from the remediation plan.
<6> Displays information about the status of the updates.
[id="upgrade_cannot_start"]
== The UpgradeCannotStart state
In the `UpgradeCannotStart` state, the update cannot start because of the following reasons:
* Blocking CRs are missing from the system
* Blocking CRs have not yet finished
[id="upgrade_not_completed"]
== The UpgradeNotCompleted state
In the `UpgradeNotCompleted` state, {cgu-operator} enforces the policies following the remediation plan defined in the `UpgradeNotStarted` state.
Enforcing the policies for subsequent batches starts immediately after all the clusters of the current batch are compliant with all the managed policies. If the batch times out, {cgu-operator} moves on to the next batch. The timeout value of a batch is the `spec.timeout` field divided by the number of batches in the remediation plan.
[NOTE]
====
The managed policies apply in the order that they are listed in the `managedPolicies` field in the `ClusterGroupUpgrade` CR. One managed policy is applied to the specified clusters at a time. After the specified clusters comply with the current policy, the next managed policy is applied to the next non-compliant cluster.
====
.Sample `ClusterGroupUpgrade` CR in the `UpgradeNotCompleted` state
[source,yaml]
----
apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
name: cgu-upgrade-complete
namespace: default
spec:
clusters:
- spoke1
enable: true <1>
managedPolicies:
- policy1-common-cluster-version-policy
- policy2-common-pao-sub-policy
remediationStrategy:
maxConcurrency: 1
timeout: 240
status: <2>
conditions:
- message: The ClusterGroupUpgrade CR has upgrade policies that are still non compliant
reason: UpgradeNotCompleted
status: "False"
type: Ready
copiedPolicies:
- cgu-upgrade-complete-policy1-common-cluster-version-policy
- cgu-upgrade-complete-policy2-common-pao-sub-policy
managedPoliciesForUpgrade:
- name: policy1-common-cluster-version-policy
namespace: default
- name: policy2-common-pao-sub-policy
namespace: default
placementBindings:
- cgu-upgrade-complete-policy1-common-cluster-version-policy
- cgu-upgrade-complete-policy2-common-pao-sub-policy
placementRules:
- cgu-upgrade-complete-policy1-common-cluster-version-policy
- cgu-upgrade-complete-policy2-common-pao-sub-policy
remediationPlan:
- - spoke1
status:
currentBatch: 1
remediationPlanForBatch: <3>
spoke1: 0
----
<1> The update starts when the value of the `spec.enable` field is `true`.
<2> The `status` fields change accordingly when the update begins.
<3> Lists the clusters in the batch and the index of the policy that is being currently applied to each cluster. The index of the policies starts with `0` and the index follows the order of the `status.managedPoliciesForUpgrade` list.
[id="upgrade_timed_out"]
== The UpgradeTimedOut state
In the `UpgradeTimedOut` state, {cgu-operator} checks every hour if all the policies for the `ClusterGroupUpgrade` CR are compliant. The checks continue until the `ClusterGroupUpgrade` CR is deleted or the updates are completed.
The periodic checks allow the updates to complete if they get prolonged due to network, CPU, or other issues.
{cgu-operator} transitions to the `UpgradeTimedOut` state in two cases:
* When the current batch contains canary updates and the cluster in the batch does not comply with all the managed policies within the batch timeout.
* When the clusters do not comply with the managed policies within the `timeout` value specified in the `remediationStrategy` field.
If the policies are compliant, {cgu-operator} transitions to the `UpgradeCompleted` state.
[id="upgrade_completed"]
== The UpgradeCompleted state
In the `UpgradeCompleted` state, the cluster updates are complete.
.Sample `ClusterGroupUpgrade` CR in the `UpgradeCompleted` state
[source,yaml]
----
apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
name: cgu-upgrade-complete
namespace: default
spec:
actions:
afterCompletion:
deleteObjects: true <1>
clusters:
- spoke1
enable: true
managedPolicies:
- policy1-common-cluster-version-policy
- policy2-common-pao-sub-policy
remediationStrategy:
maxConcurrency: 1
timeout: 240
status: <2>
conditions:
- message: The ClusterGroupUpgrade CR has all clusters compliant with all the managed policies
reason: UpgradeCompleted
status: "True"
type: Ready
managedPoliciesForUpgrade:
- name: policy1-common-cluster-version-policy
namespace: default
- name: policy2-common-pao-sub-policy
namespace: default
remediationPlan:
- - spoke1
status:
remediationPlanForBatch:
spoke1: -2 <3>
----
<1> The value of `spec.action.afterCompletion.deleteObjects` field is `true` by default. After the update is completed, {cgu-operator} deletes the underlying {rh-rhacm} objects that were created during the update. This option is to prevent the {rh-rhacm} hub from continuously checking for compliance after a successful update.
<2> The `status` fields show that the updates completed successfully.
<3> Displays that all the policies are applied to the cluster.
[id="precaching-required"]
[discreet]
== The PrecachingRequired state
In the `PrecachingRequired` state, the clusters need to have images pre-cached before the update can start. For more information about pre-caching, see the "Using the container image pre-cache feature" section.

View File

@@ -0,0 +1,363 @@
// Module included in the following assemblies:
// Epic CNF-2600 (CNF-2133) (4.10), Story TELCODOCS-285
// * scalability_and_performance/cnf-talm-for-cluster-upgrades.adoc
:_content-type: PROCEDURE
[id="talo-apply-policies_{context}"]
= Applying update policies to managed clusters
You can update your managed clusters by applying your policies.
.Prerequisites
* Install the {cgu-operator-first}.
* Provision one or more managed clusters.
* Log in as a user with `cluster-admin` privileges.
* Create {rh-rhacm} policies in the hub cluster.
.Procedure
. Save the contents of the `ClusterGroupUpgrade` CR in the `cgu-1.yaml` file.
+
[source,yaml]
----
apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
name: cgu-1
namespace: default
spec:
managedPolicies: <1>
- policy1-common-cluster-version-policy
- policy2-common-pao-sub-policy
- policy3-common-ptp-sub-policy
- policy4-common-sriov-sub-policy
enable: false
clusters: <2>
- spoke1
- spoke2
- spoke5
- spoke6
remediationStrategy:
maxConcurrency: 2 <3>
timeout: 240 <4>
----
<1> The name of the policies to apply.
<2> The list of clusters to update.
<3> The `maxConcurrency` field signifies the number of clusters updated at the same time.
<4> The update timeout in minutes.
. Create the `ClusterGroupUpgrade` CR by running the following command:
+
[source,terminal]
----
$ oc create -f cgu-1.yaml
----
.. Check if the `ClusterGroupUpgrade` CR was created in the hub cluster by running the following command:
+
[source,terminal]
----
$ oc get cgu --all-namespaces
----
+
.Example output
+
[source,terminal]
----
NAMESPACE NAME AGE
default cgu-1 8m55s
----
.. Check the status of the update by running the following command:
+
[source,terminal]
----
$ oc get cgu -n default cgu-1 -ojsonpath='{.status}' | jq
----
+
.Example output
+
[source,json]
----
{
"computedMaxConcurrency": 2,
"conditions": [
{
"lastTransitionTime": "2022-02-25T15:34:07Z",
"message": "The ClusterGroupUpgrade CR is not enabled", <1>
"reason": "UpgradeNotStarted",
"status": "False",
"type": "Ready"
}
],
"copiedPolicies": [
"cgu-policy1-common-cluster-version-policy",
"cgu-policy2-common-pao-sub-policy",
"cgu-policy3-common-ptp-sub-policy",
"cgu-policy4-common-sriov-sub-policy"
],
"managedPoliciesContent": {
"policy1-common-cluster-version-policy": "null",
"policy2-common-pao-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"performance-addon-operator\",\"namespace\":\"openshift-performance-addon-operator\"}]",
"policy3-common-ptp-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"ptp-operator-subscription\",\"namespace\":\"openshift-ptp\"}]",
"policy4-common-sriov-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"sriov-network-operator-subscription\",\"namespace\":\"openshift-sriov-network-operator\"}]"
},
"managedPoliciesForUpgrade": [
{
"name": "policy1-common-cluster-version-policy",
"namespace": "default"
},
{
"name": "policy2-common-pao-sub-policy",
"namespace": "default"
},
{
"name": "policy3-common-ptp-sub-policy",
"namespace": "default"
},
{
"name": "policy4-common-sriov-sub-policy",
"namespace": "default"
}
],
"managedPoliciesNs": {
"policy1-common-cluster-version-policy": "default",
"policy2-common-pao-sub-policy": "default",
"policy3-common-ptp-sub-policy": "default",
"policy4-common-sriov-sub-policy": "default"
},
"placementBindings": [
"cgu-policy1-common-cluster-version-policy",
"cgu-policy2-common-pao-sub-policy",
"cgu-policy3-common-ptp-sub-policy",
"cgu-policy4-common-sriov-sub-policy"
],
"placementRules": [
"cgu-policy1-common-cluster-version-policy",
"cgu-policy2-common-pao-sub-policy",
"cgu-policy3-common-ptp-sub-policy",
"cgu-policy4-common-sriov-sub-policy"
],
"precaching": {
"spec": {}
},
"remediationPlan": [
[
"spoke1",
"spoke2"
],
[
"spoke5",
"spoke6"
]
],
"status": {}
}
----
<1> The `spec.enable` field in the `ClusterGroupUpgrade` CR is set to `false`.
.. Check the status of the policies by running the following command:
+
[source,terminal]
----
$ oc get policies -A
----
+
.Example output
[source,terminal]
----
NAMESPACE NAME REMEDIATION ACTION COMPLIANCE STATE AGE
default cgu-policy1-common-cluster-version-policy enforce 17m <1>
default cgu-policy2-common-pao-sub-policy enforce 17m
default cgu-policy3-common-ptp-sub-policy enforce 17m
default cgu-policy4-common-sriov-sub-policy enforce 17m
default policy1-common-cluster-version-policy inform NonCompliant 15h
default policy2-common-pao-sub-policy inform NonCompliant 15h
default policy3-common-ptp-sub-policy inform NonCompliant 18m
default policy4-common-sriov-sub-policy inform NonCompliant 18m
----
<1> The `spec.remediationAction` field of policies currently applied on the clusters is set to `enforce`. The managed policies in `inform` mode from the `ClusterGroupUpgrade` CR remain in `inform` mode during the update.
. Change the value of the `spec.enable` field to `true` by running the following command:
+
[source,terminal]
----
$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-1 \
--patch '{"spec":{"enable":true}}' --type=merge
----
.Verification
. Check the status of the update again by running the following command:
+
[source,terminal]
----
$ oc get cgu -n default cgu-1 -ojsonpath='{.status}' | jq
----
+
.Example output
+
[source,json]
----
{
"computedMaxConcurrency": 2,
"conditions": [ <1>
{
"lastTransitionTime": "2022-02-25T15:34:07Z",
"message": "The ClusterGroupUpgrade CR has upgrade policies that are still non compliant",
"reason": "UpgradeNotCompleted",
"status": "False",
"type": "Ready"
}
],
"copiedPolicies": [
"cgu-policy1-common-cluster-version-policy",
"cgu-policy2-common-pao-sub-policy",
"cgu-policy3-common-ptp-sub-policy",
"cgu-policy4-common-sriov-sub-policy"
],
"managedPoliciesContent": {
"policy1-common-cluster-version-policy": "null",
"policy2-common-pao-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"performance-addon-operator\",\"namespace\":\"openshift-performance-addon-operator\"}]",
"policy3-common-ptp-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"ptp-operator-subscription\",\"namespace\":\"openshift-ptp\"}]",
"policy4-common-sriov-sub-policy": "[{\"kind\":\"Subscription\",\"name\":\"sriov-network-operator-subscription\",\"namespace\":\"openshift-sriov-network-operator\"}]"
},
"managedPoliciesForUpgrade": [
{
"name": "policy1-common-cluster-version-policy",
"namespace": "default"
},
{
"name": "policy2-common-pao-sub-policy",
"namespace": "default"
},
{
"name": "policy3-common-ptp-sub-policy",
"namespace": "default"
},
{
"name": "policy4-common-sriov-sub-policy",
"namespace": "default"
}
],
"managedPoliciesNs": {
"policy1-common-cluster-version-policy": "default",
"policy2-common-pao-sub-policy": "default",
"policy3-common-ptp-sub-policy": "default",
"policy4-common-sriov-sub-policy": "default"
},
"placementBindings": [
"cgu-policy1-common-cluster-version-policy",
"cgu-policy2-common-pao-sub-policy",
"cgu-policy3-common-ptp-sub-policy",
"cgu-policy4-common-sriov-sub-policy"
],
"placementRules": [
"cgu-policy1-common-cluster-version-policy",
"cgu-policy2-common-pao-sub-policy",
"cgu-policy3-common-ptp-sub-policy",
"cgu-policy4-common-sriov-sub-policy"
],
"precaching": {
"spec": {}
},
"remediationPlan": [
[
"spoke1",
"spoke2"
],
[
"spoke5",
"spoke6"
]
],
"status": {
"currentBatch": 1,
"currentBatchStartedAt": "2022-02-25T15:54:16Z",
"remediationPlanForBatch": {
"spoke1": 0,
"spoke2": 1
},
"startedAt": "2022-02-25T15:54:16Z"
}
}
----
<1> Reflects the update progress of the current batch. Run this command again to receive updated information about the progress.
. If the policies include Operator subscriptions, you can check the installation progress directly on the single-node cluster.
.. Export the `KUBECONFIG` file of the single-node cluster you want to check the installation progress for by running the following command:
+
[source,terminal]
----
$ export KUBECONFIG=<cluster_kubeconfig_absolute_path>
----
.. Check all the subscriptions present on the single-node cluster and look for the one in the policy you are trying to install through the `ClusterGroupUpgrade` CR by running the following command:
+
[source,terminal]
----
$ oc get subs -A | grep -i <subscription_name>
----
+
.Example output for `cluster-logging` policy
+
[source,terminal]
----
NAMESPACE NAME PACKAGE SOURCE CHANNEL
openshift-logging cluster-logging cluster-logging redhat-operators stable
----
. If one of the managed policies includes a `ClusterVersion` CR, check the status of platform updates in the current batch by running the following command against the spoke cluster:
+
[source,terminal]
----
$ oc get clusterversion
----
+
.Example output
+
[source,terminal]
----
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.9.5 True True 43s Working towards 4.9.7: 71 of 735 done (9% complete)
----
. Check the Operator subscription by running the following command:
+
[source,terminal]
----
$ oc get subs -n <operator-namespace> <operator-subscription> -ojsonpath="{.status}"
----
. Check the install plans present on the single-node cluster that is associated with the desired subscription by running the following command:
+
[source,terminal]
----
$ oc get installplan -n <subscription_namespace>
----
+
.Example output for `cluster-logging` Operator
+
[source,terminal]
----
NAMESPACE NAME CSV APPROVAL APPROVED
openshift-logging install-6khtw cluster-logging.5.3.3-4 Manual true <1>
----
<1> The install plans have their `Approval` field set to `Manual` and their `Approved` field changes from `true` to `false` after {cgu-operator} approves the install plan.
. Check if the cluster service version for the Operator of the policy that the `ClusterGroupUpgrade` is installing reached the `Succeeded` phase by running the following command:
+
[source,terminal]
----
$ oc get csv -n <operator_namespace>
----
+
.Example output for OpenShift Logging Operator
+
[source,terminal]
----
NAME DISPLAY VERSION REPLACES PHASE
cluster-logging.5.4.2 Red Hat OpenShift Logging 5.4.2 Succeeded
----

View File

@@ -0,0 +1,63 @@
// Module included in the following assemblies:
// Epic CNF-2600 (CNF-2133) (4.10), Story TELCODOCS-285
// * scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: PROCEDURE
[id="talo-precache-autocreated-cgu-for-ztp_{context}"]
= About the auto-created ClusterGroupUpgrade CR for ZTP
{cgu-operator} has a controller called `ManagedClusterForCGU` that monitors the `Ready` state of the `ManagedCluster` CRs on the hub cluster and creates the `ClusterGroupUpgrade` CRs for ZTP (zero touch provisioning).
For any managed cluster in the `Ready` state without a "ztp-done" label applied, the `ManagedClusterForCGU` controller automatically creates a `ClusterGroupUpgrade` CR in the `ztp-install` namespace with its associated {rh-rhacm} policies that are created during the ZTP process. {cgu-operator} then remediates the set of configuration policies that are listed in the auto-created `ClusterGroupUpgrade` CR to push the configuration CRs to the managed cluster.
[NOTE]
====
If the managed cluster has no bound policies when the cluster becomes `Ready`, no `ClusterGroupUpgrade` CR is created.
====
.Example of an auto-created `ClusterGroupUpgrade` CR for ZTP
[source,yaml]
----
apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
generation: 1
name: spoke1
namespace: ztp-install
ownerReferences:
- apiVersion: cluster.open-cluster-management.io/v1
blockOwnerDeletion: true
controller: true
kind: ManagedCluster
name: spoke1
uid: 98fdb9b2-51ee-4ee7-8f57-a84f7f35b9d5
resourceVersion: "46666836"
uid: b8be9cd2-764f-4a62-87d6-6b767852c7da
spec:
actions:
afterCompletion:
addClusterLabels:
ztp-done: "" <1>
deleteClusterLabels:
ztp-running: ""
deleteObjects: true
beforeEnable:
addClusterLabels:
ztp-running: "" <2>
clusters:
- spoke1
enable: true
managedPolicies:
- common-spoke1-config-policy
- common-spoke1-subscriptions-policy
- group-spoke1-config-policy
- spoke1-config-policy
- group-spoke1-validator-du-policy
preCaching: false
remediationStrategy:
maxConcurrency: 1
timeout: 240
----
<1> Applied to the managed cluster when {cgu-operator} completes the cluster configuration.
<2> Applied to the managed cluster when {cgu-operator} starts deploying the configuration policies.

View File

@@ -0,0 +1,72 @@
// Module included in the following assemblies:
// Epic CNF-2600 (CNF-2133) (4.10), Story TELCODOCS-285
// * scalability_and_performance/cnf-talm-for-cluster-upgrades.adoc
:_content-type: PROCEDURE
[id="installing-topology-aware-lifecycle-manager-using-cli_{context}"]
= Installing the {cgu-operator-full} by using the CLI
You can use the OpenShift CLI (`oc`) to install the {cgu-operator-first}.
.Prerequisites
* Install the OpenShift CLI (`oc`).
* Install the latest version of the {rh-rhacm} Operator.
* Set up a hub cluster with disconnected registry.
* Log in as a user with `cluster-admin` privileges.
.Procedure
. Create a `Subscription` CR:
.. Define the `Subscription` CR and save the YAML file, for example, `talm-subscription.yaml`:
+
[source,yaml]
----
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: openshift-topology-aware-lifecycle-manager-subscription
namespace: openshift-operators
spec:
channel: "stable"
name: topology-aware-lifecycle-manager
source: redhat-operators
sourceNamespace: openshift-marketplace
----
.. Create the `Subscription` CR by running the following command:
+
[source,terminal]
----
$ oc create -f talm-subscription.yaml
----
.Verification
. Verify that the installation succeeded by inspecting the CSV resource:
+
[source,terminal]
----
$ oc get csv -n openshift-operators
----
+
.Example output
[source,terminal]
----
NAME DISPLAY VERSION REPLACES PHASE
topology-aware-lifecycle-manager.4.10.0-202206301927 Topology Aware Lifecycle Manager 4.10.0-202206301927 Succeeded
----
. Verify that the {cgu-operator} is up and running:
+
[source,terminal]
----
$ oc get deploy -n openshift-operators
----
+
.Example output
[source,terminal]
----
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
openshift-operators cluster-group-upgrades-controller-manager 1/1 1 1 14s
----

View File

@@ -0,0 +1,36 @@
// Module included in the following assemblies:
// Epic CNF-2600 (CNF-2133) (4.10), Story TELCODOCS-285
// * scalability_and_performance/cnf-talm-for-cluster-upgrades.adoc
:_content-type: PROCEDURE
[id="installing-topology-aware-lifecycle-manager-using-web-console_{context}"]
= Installing the {cgu-operator-full} by using the web console
You can use the {product-title} web console to install the {cgu-operator-full}.
.Prerequisites
// Based on polarion test cases
* Install the latest version of the {rh-rhacm} Operator.
* Set up a hub cluster with disconnected regitry.
* Log in as a user with `cluster-admin` privileges.
.Procedure
. In the {product-title} web console, navigate to *Operators* -> *OperatorHub*.
. Search for the *{cgu-operator-full}* from the list of available Operators, and then click *Install*.
. Keep the default selection of *Installation mode* ["All namespaces on the cluster (default)"] and *Installed Namespace* ("openshift-operators") to ensure that the Operator is installed properly.
. Click *Install*.
.Verification
To confirm that the installation is successful:
. Navigate to the *Operators* -> *Installed Operators* page.
. Check that the Operator is installed in the `All Namespaces` namespace and its status is `Succeeded`.
If the Operator is not installed successfully:
. Navigate to the *Operators* -> *Installed Operators* page and inspect the `Status` column for any errors or failures.
. Navigate to the *Workloads* -> *Pods* page and check the logs in any containers in the `cluster-group-upgrades-controller-manager` pod that are reporting issues.

View File

@@ -0,0 +1,136 @@
// Module included in the following assemblies:
// Epic CNF-2600 (CNF-2133) (4.10), Story TELCODOCS-285
// * scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: PROCEDURE
[id="talo-operator-and-platform-update_{context}"]
= Performing a platform and an Operator update together
You can perform a platform and an Operator update at the same time.
.Prerequisites
* Install the {cgu-operator-first}.
* Update ZTP to the latest version.
* Provision one or more managed clusters with ZTP.
* Log in as a user with `cluster-admin` privileges.
* Create {rh-rhacm} policies in the hub cluster.
.Procedure
. Create the `PolicyGenTemplate` CR for the updates by following the steps described in the "Performing a platform update" and "Performing an Operator update" sections.
. Apply the prep work for the platform and the Operator update.
.. Save the content of the `ClusterGroupUpgrade` CR with the policies for platform update preparation work, catalog source updates, and target clusters to the `cgu-platform-operator-upgrade-prep.yml` file, for example:
+
[source,yaml]
----
apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
name: cgu-platform-operator-upgrade-prep
namespace: default
spec:
managedPolicies:
- du-upgrade-platform-upgrade-prep
- du-upgrade-operator-catsrc-policy
clusterSelector:
- group-du-sno
remediationStrategy:
maxConcurrency: 10
enable: true
----
.. Apply the `cgu-platform-operator-upgrade-prep.yml` file to the hub cluster by running the following command:
+
[source,terminal]
----
$ oc apply -f cgu-platform-operator-upgrade-prep.yml
----
.. Monitor the process. Upon completion, ensure that the policy is compliant by running the following command:
+
[source,terminal]
----
$ oc get policies --all-namespaces
----
. Create the `ClusterGroupUpdate` CR for the platform and the Operator update with the `spec.enable` field set to `false`.
.. Save the contents of the platform and Operator update `ClusterGroupUpdate` CR with the policies and the target clusters to the `cgu-platform-operator-upgrade.yml` file, as shown in the following example:
+
[source,yaml]
----
apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
name: cgu-du-upgrade
namespace: default
spec:
managedPolicies:
- du-upgrade-platform-upgrade <1>
- du-upgrade-operator-catsrc-policy <2>
- common-subscriptions-policy <3>
preCaching: true
clusterSelector:
- group-du-sno
remediationStrategy:
maxConcurrency: 1
enable: false
----
<1> This is the platform update policy.
<2> This is the policy containing the catalog source information for the Operators to be updated. It is needed for the pre-caching feature to determine which Operator images to download to the spoke cluster.
<3> This is the policy to update the Operators.
.. Apply the `cgu-platform-operator-upgrade.yml` file to the hub cluster by running the following command:
+
[source,terminal]
----
$ oc apply -f cgu-platform-operator-upgrade.yml
----
. Optional: Pre-cache the images for the platform and the Operator update.
.. Enable pre-caching in the `ClusterGroupUpgrade` CR by running the following command:
+
[source,terminal]
----
$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-du-upgrade \
--patch '{"spec":{"preCaching": true}}' --type=merge
----
.. Monitor the update process and wait for the pre-caching to complete. Check the status of pre-caching by running the following command on the spoke cluster:
+
[source,terminal]
----
$ oc get jobs,pods -n openshift-talm-pre-cache
----
.. Check if the pre-caching is completed before starting the update by running the following command:
+
[source,terminal]
----
$ oc get cgu cgu-du-upgrade -ojsonpath='{.status.conditions}'
----
. Start the platform and Operator update.
.. Enable the `cgu-du-upgrade` `ClusterGroupUpgrade` CR to start the platform and the Operator update by running the following command:
+
[source,terminal]
----
$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-du-upgrade \
--patch '{"spec":{"enable":true, "preCaching": false}}' --type=merge
----
.. Monitor the process. Upon completion, ensure that the policy is compliant by running the following command:
+
[source,terminal]
----
$ oc get policies --all-namespaces
----
+
[NOTE]
====
The CRs for the platform and Operator updates can be created from the beginning by configuring the setting to `spec.enable: true`. In this case, the update starts immediately after pre-caching completes and there is no need to manually enable the CR.
Both pre-caching and the update create extra resources, such as policies, placement bindings, placement rules, managed cluster actions, and managed cluster view, to help complete the procedures. Setting the `afterCompletion.deleteObjects` field to `true` deletes all these resources after the updates complete.
====

View File

@@ -0,0 +1,263 @@
// Module included in the following assemblies:
// Epic CNF-2600 (CNF-2133) (4.10), Story TELCODOCS-285
// * scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: PROCEDURE
[id="talo-operator-update_{context}"]
= Performing an Operator update
You can perform an Operator update with the {cgu-operator}.
.Prerequisites
* Install the {cgu-operator-first}.
* Update ZTP to the latest version.
* Provision one or more managed clusters with ZTP.
* Mirror the desired index image, bundle images, and all Operator images referenced in the bundle images.
* Log in as a user with `cluster-admin` privileges.
* Create {rh-rhacm} policies in the hub cluster.
.Procedure
. Update the `PolicyGenTemplate` CR for the Operator update.
.. Update the `du-upgrade` `PolicyGenTemplate` CR with the following additional contents in the `du-upgrade.yaml` file:
+
[source,yaml]
----
apiVersion: ran.openshift.io/v1
kind: PolicyGenTemplate
metadata:
name: "du-upgrade"
namespace: "ztp-group-du-sno"
spec:
bindingRules:
group-du-sno: ""
mcp: "master"
remediationAction: inform
sourceFiles:
- fileName: DefaultCatsrc.yaml
remediationAction: inform
policyName: "operator-catsrc-policy"
metadata:
name: redhat-operators
spec:
displayName: Red Hat Operators Catalog
image: registry.example.com:5000/olm/redhat-operators:v4.10 <1>
updateStrategy: <2>
registryPoll:
interval: 1h
----
<1> The index image URL contains the desired Operator images. If the index images are always pushed to the same image name and tag, this change is not needed.
<2> Set how frequently the Operator Lifecycle Manager (OLM) polls the index image for new Operator versions with the `registryPoll.interval` field. This change is not needed if a new index image tag is always pushed for y-stream and z-stream Operator updates. The `registryPoll.interval` field can be set to a shorter interval to expedite the update, however shorter intervals increase computational load. To counteract this, you can restore `registryPoll.interval` to the default value once the update is complete.
.. This update generates one policy, `du-upgrade-operator-catsrc-policy`, to update the `redhat-operators` catalog source with the new index images that contain the desired Operators images.
+
[NOTE]
====
If you want to use the image pre-caching for Operators and there are Operators from a different catalog source other than `redhat-operators`, you must perform the following tasks:
* Prepare a separate catalog source policy with the new index image or registry poll interval update for the different catalog source.
* Prepare a separate subscription policy for the desired Operators that are from the different catalog source.
====
+
For example, the desired SRIOV-FEC Operator is available in the `certified-operators` catalog source. To update the catalog source and the Operator subscription, add the following contents to generate two policies, `du-upgrade-fec-catsrc-policy` and `du-upgrade-subscriptions-fec-policy`:
+
[source,yaml]
----
apiVersion: ran.openshift.io/v1
kind: PolicyGenTemplate
metadata:
name: "du-upgrade"
namespace: "ztp-group-du-sno"
spec:
bindingRules:
group-du-sno: ""
mcp: "master"
remediationAction: inform
sourceFiles:
- fileName: DefaultCatsrc.yaml
remediationAction: inform
policyName: "fec-catsrc-policy"
metadata:
name: certified-operators
spec:
displayName: Intel SRIOV-FEC Operator
image: registry.example.com:5000/olm/far-edge-sriov-fec:v4.10
updateStrategy:
registryPoll:
interval: 10m
- fileName: AcceleratorsSubscription.yaml
policyName: "subscriptions-fec-policy"
spec:
channel: "stable"
source: certified-operators
----
.. Remove the specified subscriptions channels in the common `PolicyGenTemplate` CR, if they exist. The default subscriptions channels from the ZTP image are used for the update.
+
[NOTE]
====
The default channel for the Operators applied through ZTP 4.10 is `stable`, except for the `performance-addon-operator`. The default channel for PAO is `4.10`. You can also specify the default channels in the common `PolicyGenTemplate` CR.
====
.. Push the `PolicyGenTemplate` CRs updates to the ZTP Git repository.
+
ArgoCD pulls the changes from the Git repository and generates the policies on the hub cluster.
.. Check the created policies by running the following command:
+
[source,terminal]
----
$ oc get policies -A | grep -E "catsrc-policy|subscription"
----
. Apply the required catalog source updates before starting the Operator update.
.. Save the content of the `ClusterGroupUpgrade` CR named `operator-upgrade-prep` with the catalog source policies and the target spoke clusters to the `cgu-operator-upgrade-prep.yml` file:
+
[source,yaml]
----
apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
name: cgu-operator-upgrade-prep
namespace: default
spec:
clusters:
- spoke1
enable: true
managedPolicies:
- du-upgrade-operator-catsrc-policy
remediationStrategy:
maxConcurrency: 1
----
.. Apply the policy to the hub cluster by running the following command:
+
[source,terminal]
----
$ oc apply -f cgu-operator-upgrade-prep.yml
----
.. Monitor the update process. Upon completion, ensure that the policy is compliant by running the following command:
+
[source,terminal]
----
$ oc get policies -A | grep -E "catsrc-policy"
----
. Create the `ClusterGroupUpgrade` CR for the Operator update with the `spec.enable` field set to `false`.
.. Save the content of the Operator update `ClusterGroupUpgrade` CR with the `du-upgrade-operator-catsrc-policy` policy and the subscription policies created from the common `PolicyGenTemplate` and the target clusters to the `cgu-operator-upgrade.yml` file, as shown in the following example:
+
[source,yaml]
----
apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
name: cgu-operator-upgrade
namespace: default
spec:
managedPolicies:
- du-upgrade-operator-catsrc-policy <1>
- common-subscriptions-policy <2>
preCaching: false
clusters:
- spoke1
remediationStrategy:
maxConcurrency: 1
enable: false
----
<1> The policy is needed by the image pre-caching feature to retrieve the operator images from the catalog source.
<2> The policy contains Operator subscriptions. If you have upgraded ZTP from 4.9 to 4.10 by following "Upgrade ZTP from 4.9 to 4.10", all Operator subscriptions are grouped into the `common-subscriptions-policy` policy.
+
[NOTE]
====
One `ClusterGroupUpgrade` CR can only pre-cache the images of the desired Operators defined in the subscription policy from one catalog source included in the `ClusterGroupUpgrade` CR. If the desired Operators are from different catalog sources, such as in the example of the SRIOV-FEC Operator, another `ClusterGroupUpgrade` CR must be created with `du-upgrade-fec-catsrc-policy` and `du-upgrade-subscriptions-fec-policy` policies for the SRIOV-FEC Operator images pre-caching and update.
====
.. Apply the `ClusterGroupUpgrade` CR to the hub cluster by running the following command:
+
[source,terminal]
----
$ oc apply -f cgu-operator-upgrade.yml
----
. Optional: Pre-cache the images for the Operator update.
.. Before starting image pre-caching, verify the subscription policy is `NonCompliant` at this point by running the following command:
+
[source,terminal]
----
$ oc get policy common-subscriptions-policy -n <policy_namespace>
----
+
.Example output
+
[source,terminal]
----
NAME REMEDIATION ACTION COMPLIANCE STATE AGE
common-subscriptions-policy inform NonCompliant 27d
----
.. Enable pre-caching in the `ClusterGroupUpgrade` CR by running the following command:
+
[source,terminal]
----
$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-operator-upgrade \
--patch '{"spec":{"preCaching": true}}' --type=merge
----
.. Monitor the process and wait for the pre-caching to complete. Check the status of pre-caching by running the following command on the spoke cluster:
+
[source,terminal]
----
$ oc get cgu cgu-operator-upgrade -o jsonpath='{.status.precaching.status}'
----
.. Check if the pre-caching is completed before starting the update by running the following command:
+
[source,terminal]
----
$ oc get cgu -n default cgu-operator-upgrade -ojsonpath='{.status.conditions}' | jq
----
+
.Example output
+
[source,json]
----
[
{
"lastTransitionTime": "2022-03-08T20:49:08.000Z",
"message": "The ClusterGroupUpgrade CR is not enabled",
"reason": "UpgradeNotStarted",
"status": "False",
"type": "Ready"
},
{
"lastTransitionTime": "2022-03-08T20:55:30.000Z",
"message": "Precaching is completed",
"reason": "PrecachingCompleted",
"status": "True",
"type": "PrecachingDone"
}
]
----
. Start the Operator update.
.. Enable the `cgu-operator-upgrade` `ClusterGroupUpgrade` CR and disable pre-caching to start the Operator update by running the following command:
+
[source,terminal]
----
$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-operator-upgrade \
--patch '{"spec":{"enable":true, "preCaching": false}}' --type=merge
----
.. Monitor the process. Upon completion, ensure that the policy is compliant by running the following command:
+
[source,terminal]
----
$ oc get policies --all-namespaces
----

View File

@@ -0,0 +1,196 @@
// Module included in the following assemblies:
// Epic CNF-2600 (CNF-2133) (4.10), Story TELCODOCS-285
// * scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: PROCEDURE
[id="talo-platform-update_{context}"]
= Performing a platform update
You can perform a platform update with the {cgu-operator}.
.Prerequisites
* Install the {cgu-operator-first}.
* Update ZTP to the latest version.
* Provision one or more managed clusters with ZTP.
* Mirror the desired image repository.
* Log in as a user with `cluster-admin` privileges.
* Create {rh-rhacm} policies in the hub cluster.
.Procedure
. Create a `PolicyGenTemplate` CR for the platform update:
.. Save the following contents of the `PolicyGenTemplate` CR in the `du-upgrade.yaml` file.
+
.Example of `PolicyGenTemplate` for platform update
+
[source,yaml]
----
apiVersion: ran.openshift.io/v1
kind: PolicyGenTemplate
metadata:
name: "du-upgrade"
namespace: "ztp-group-du-sno"
spec:
bindingRules:
group-du-sno: ""
mcp: "master"
remediationAction: inform
sourceFiles:
- fileName: ImageSignature.yaml <1>
policyName: "platform-upgrade-prep"
binaryData:
${DIGEST_ALGO}-${DIGEST_ENCODED}: ${SIGNATURE_BASE64} <2>
- fileName: DisconnectedICSP.yaml
policyName: "platform-upgrade-prep"
metadata:
name: disconnected-internal-icsp-for-ocp
spec:
repositoryDigestMirrors: <3>
- mirrors:
- quay-intern.example.com/ocp4/openshift-release-dev
source: quay.io/openshift-release-dev/ocp-release
- mirrors:
- quay-intern.example.com/ocp4/openshift-release-dev
source: quay.io/openshift-release-dev/ocp-v4.0-art-dev
- fileName: ClusterVersion.yaml <4>
policyName: "platform-upgrade-prep"
metadata:
name: version
annotations:
ran.openshift.io/ztp-deploy-wave: "1"
spec:
channel: "stable-4.10"
upstream: http://upgrade.example.com/images/upgrade-graph_stable-4.10
- fileName: ClusterVersion.yaml <5>
policyName: "platform-upgrade"
metadata:
name: version
spec:
channel: "stable-4.10"
upstream: http://upgrade.example.com/images/upgrade-graph_stable-4.10
desiredUpdate:
version: 4.10.4
status:
history:
- version: 4.10.4
state: "Completed"
----
<1> The `ConfigMap` CR contains the signature of the desired release image to update to.
<2> Shows the image signature of the desired {product-title} release. Get the signature from the `checksum-${OCP_RELASE_NUMBER}.yaml` file you saved when following the procedures in the "Setting up the environment" section.
<3> Shows the mirror repository that contains the desired {product-title} image. Get the mirrors from the `imageContentSources.yaml` file that you saved when following the procedures in the "Setting up the environment" section.
<4> Shows the `ClusterVersion` CR to update upstream.
<5> Shows the `ClusterVersion` CR to trigger the update. The `channel`, `upstream`, and `desiredVersion` fields are all required for image pre-caching.
+
The `PolicyGenTemplate` CR generates two policies:
* The `du-upgrade-platform-upgrade-prep` policy does the preparation work for the platform update. It creates the `ConfigMap` CR for the desired release image signature, creates the image content source of the mirrored release image repository, and updates the cluster version with the desired update channel and the update graph reachable by the spoke cluster in the disconnected environment.
* The `du-upgrade-platform-upgrade` policy is used to perform platform upgrade.
.. Add the `du-upgrade.yaml` file contents to the `kustomization.yaml` file located in the ZTP Git repository for the `PolicyGenTemplate` CRs and push the changes to the Git repository.
+
ArgoCD pulls the changes from the Git repository and generates the policies on the hub cluster.
.. Check the created policies by running the following command:
+
[source,terminal]
----
$ oc get policies -A | grep platform-upgrade
----
. Apply the required update resources before starting the platform update with the {cgu-operator}.
.. Save the content of the `platform-upgrade-prep` `ClusterUpgradeGroup` CR with the `du-upgrade-platform-upgrade-prep` policy and the target spoke clusters to the `cgu-platform-upgrade-prep.yml` file, as shown in the following example:
+
[source,yaml]
----
apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
name: cgu-platform-upgrade-prep
namespace: default
spec:
managedPolicies:
- du-upgrade-platform-upgrade-prep
clusters:
- spoke1
remediationStrategy:
maxConcurrency: 1
enable: true
----
.. Apply the policy to the hub cluster by running the following command:
+
[source,terminal]
----
$ oc apply -f cgu-platform-upgrade-prep.yml
----
.. Monitor the update process. Upon completion, ensure that the policy is compliant by running the following command:
+
[source,terminal]
----
$ oc get policies --all-namespaces
----
. Create the `ClusterGroupUpdate` CR for the platform update with the `spec.enable` field set to `false`.
.. Save the content of the platform update `ClusterGroupUpdate` CR with the `du-upgrade-platform-upgrade` policy and the target clusters to the `cgu-platform-upgrade.yml` file, as shown in the following example:
+
[source,yaml]
----
apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
name: cgu-platform-upgrade
namespace: default
spec:
managedPolicies:
- du-upgrade-platform-upgrade
preCaching: false
clusters:
- spoke1
remediationStrategy:
maxConcurrency: 1
enable: false
----
.. Apply the `ClusterGroupUpdate` CR to the hub cluster by running the following command:
+
[source,terminal]
----
$ oc apply -f cgu-platform-upgrade.yml
----
. Optional: Pre-cache the images for the platform update.
.. Enable pre-caching in the `ClusterGroupUpdate` CR by running the following command:
+
[source,terminal]
----
$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-platform-upgrade \
--patch '{"spec":{"preCaching": true}}' --type=merge
----
.. Monitor the update process and wait for the pre-caching to complete. Check the status of pre-caching by running the following command on the hub cluster:
+
[source,terminal]
----
$ oc get cgu cgu-platform-upgrade -o jsonpath='{.status.precaching.status}'
----
. Start the platform update:
.. Enable the `cgu-platform-upgrade` policy and disable pre-caching by running the following command:
+
[source,terminal]
----
$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-platform-upgrade \
--patch '{"spec":{"enable":true, "preCaching": false}}' --type=merge
----
.. Monitor the process. Upon completion, ensure that the policy is compliant by running the following command:
+
[source,terminal]
----
$ oc get policies --all-namespaces
----

View File

@@ -0,0 +1,19 @@
// Module included in the following assemblies:
// Epic CNF-2600 (CNF-2133) (4.10), Story TELCODOCS-285
// * scalability_and_performance/cnf-talm-for-cluster-upgrades.adoc
:_content-type: CONCEPT
[id="talo-policies-concept_{context}"]
= Update policies on managed clusters
The {cgu-operator-first} remediates a set of `inform` policies for the clusters specified in the `ClusterGroupUpgrade` CR. {cgu-operator} remediates `inform` policies by making `enforce` copies of the managed {rh-rhacm} policies. Each copied policy has its own corresponding {rh-rhacm} placement rule and {rh-rhacm} placement binding.
One by one, {cgu-operator} adds each cluster from the current batch to the placement rule that corresponds with the applicable managed policy. If a cluster is already compliant with a policy, {cgu-operator} skips applying that policy on the compliant cluster. {cgu-operator} then moves on to applying the next policy to the non-compliant cluster. After {cgu-operator} completes the updates in a batch, all clusters are removed from the placement rules associated with the copied policies. Then, the update of the next batch starts.
If a spoke cluster does not report any compliant state to {rh-rhacm}, the managed policies on the hub cluster can be missing status information that {cgu-operator} needs. {cgu-operator} handles these cases in the following ways:
* If a policy's `status.compliant` field is missing, {cgu-operator} ignores the policy and adds a log entry. Then, {cgu-operator} continues looking at the policy's `status.status` field.
* If a policy's `status.status` is missing, {cgu-operator} produces an error.
* If a cluster's compliance status is missing in the policy's `status.status` field, {cgu-operator} considers that cluster to be non-compliant with that policy.
For more information about {rh-rhacm} policies, see link:https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes/2.4/html-single/governance/index#policy-overview[Policy overview].

View File

@@ -0,0 +1,28 @@
// Module included in the following assemblies:
// Epic CNF-2600 (CNF-2133) (4.10), Story TELCODOCS-285
// * scalability_and_performance/cnf-talm-for-cluster-upgrades.adoc
:_content-type: CONCEPT
[id="talo-precache-feature-concept_{context}"]
= Using the container image pre-cache feature
Clusters might have limited bandwidth to access the container image registry, which can cause a timeout before the updates are completed.
[NOTE]
====
The time of the update is not set by {cgu-operator}. You can apply the `ClusterGroupUpgrade` CR at the beginning of the update by manual application or by external automation.
====
The container image pre-caching starts when the `preCaching` field is set to `true` in the `ClusterGroupUpgrade` CR. After a successful pre-caching process, you can start remediating policies. The remediation actions start when the `enable` field is set to `true`.
The pre-caching process can be in the following statuses:
`PrecacheNotStarted`:: This is the initial state all clusters are automatically assigned to on the first reconciliation pass of the `ClusterGroupUpgrade` CR.
+
In this state, {cgu-operator} deletes any pre-caching namespace and hub view resources of spoke clusters that remain from previous incomplete updates. {cgu-operator} then creates a new `ManagedClusterView` resource for the spoke pre-caching namespace to verify its deletion in the `PrecachePreparing` state.
`PrecachePreparing`:: Cleaning up any remaining resources from previous incomplete updates is in progress.
`PrecacheStarting`:: Pre-caching job prerequisites and the job are created.
`PrecacheActive`:: The job is in "Active" state.
`PrecacheSucceeded`:: The pre-cache job has succeeded.
`PrecacheTimeout`:: The artifact pre-caching has been partially done.
`PrecacheUnrecoverableError`:: The job ends with a non-zero exit code.

View File

@@ -0,0 +1,161 @@
// Module included in the following assemblies:
// Epic CNF-2600 (CNF-2133) (4.10), Story TELCODOCS-285
// * scalability_and_performance/cnf-talm-for-cluster-upgrades.adoc
:_content-type: PROCEDURE
[id="talo-precache-start_and_update_{context}"]
= Creating a ClusterGroupUpgrade CR with pre-caching
The pre-cache feature allows the required container images to be present on the spoke cluster before the update starts.
.Prerequisites
* Install the {cgu-operator-first}.
* Provision one or more managed clusters.
* Log in as a user with `cluster-admin` privileges.
.Procedure
. Save the contents of the `ClusterGroupUpgrade` CR with the `preCaching` field set to `true` in the `clustergroupupgrades-group-du.yaml` file:
+
[source,yaml]
----
apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
name: du-upgrade-4918
namespace: ztp-group-du-sno
spec:
preCaching: true <1>
clusters:
- cnfdb1
- cnfdb2
enable: false
managedPolicies:
- du-upgrade-platform-upgrade
remediationStrategy:
maxConcurrency: 2
timeout: 240
----
<1> The `preCaching` field is set to `true`, which enables {cgu-operator} to pull the container images before starting the update.
. When you want to start the update, apply the `ClusterGroupUpgrade` CR by running the following command:
+
[source,terminal]
----
$ oc apply -f clustergroupupgrades-group-du.yaml
----
.Verification
. Check if the `ClusterGroupUpgrade` CR exists in the hub cluster by running the following command:
+
[source,terminal]
----
$ oc get cgu -A
----
+
.Example output
+
[source,terminal]
----
NAMESPACE NAME AGE
ztp-group-du-sno du-upgrade-4918 10s <1>
----
<1> The CR is created.
. Check the status of the pre-caching task by running the following command:
+
[source,terminal]
----
$ oc get cgu -n ztp-group-du-sno du-upgrade-4918 -o jsonpath='{.status}'
----
+
.Example output
+
[source,json]
----
{
"conditions": [
{
"lastTransitionTime": "2022-01-27T19:07:24Z",
"message": "Precaching is not completed (required)", <1>
"reason": "PrecachingRequired",
"status": "False",
"type": "Ready"
},
{
"lastTransitionTime": "2022-01-27T19:07:24Z",
"message": "Precaching is required and not done",
"reason": "PrecachingNotDone",
"status": "False",
"type": "PrecachingDone"
},
{
"lastTransitionTime": "2022-01-27T19:07:34Z",
"message": "Pre-caching spec is valid and consistent",
"reason": "PrecacheSpecIsWellFormed",
"status": "True",
"type": "PrecacheSpecValid"
}
],
"precaching": {
"clusters": [
"cnfdb1" <2>
],
"spec": {
"platformImage": "image.example.io"},
"status": {
"cnfdb1": "Active"}
}
}
----
<1> Displays that the update is in progress.
<2> Displays the list of identified clusters.
. Check the status of the pre-caching job by running the following command on the spoke cluster:
+
[source,terminal]
----
$ oc get jobs,pods -n openshift-talm-pre-cache
----
+
.Example output
+
[source,terminal]
----
NAME COMPLETIONS DURATION AGE
job.batch/pre-cache 0/1 3m10s 3m10s
NAME READY STATUS RESTARTS AGE
pod/pre-cache--1-9bmlr 1/1 Running 0 3m10s
----
. Check the status of the `ClusterGroupUpgrade` CR by running the following command:
+
[source,terminal]
----
$ oc get cgu -n ztp-group-du-sno du-upgrade-4918 -o jsonpath='{.status}'
----
+
.Example output
+
[source,json]
----
"conditions": [
{
"lastTransitionTime": "2022-01-27T19:30:41Z",
"message": "The ClusterGroupUpgrade CR has all clusters compliant with all the managed policies",
"reason": "UpgradeCompleted",
"status": "True",
"type": "Ready"
},
{
"lastTransitionTime": "2022-01-27T19:28:57Z",
"message": "Precaching is completed",
"reason": "PrecachingCompleted",
"status": "True",
"type": "PrecachingDone" <1>
}
----
<1> The pre-cache tasks are done.

View File

@@ -0,0 +1,107 @@
// Module included in the following assemblies:
// Epic CNF-2600 (CNF-2133) (4.10), Story TELCODOCS-285
// * scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: PROCEDURE
[id="talo-platform-prepare-end-to-end_{context}"]
= End-to-end procedures for updating clusters in a disconnected environment
If you have deployed spoke clusters with distributed unit (DU) profiles using the GitOps ZTP with the {cgu-operator-first} pipeline described in "Deploying distributed units at scale in a disconnected environment", this procedure describes how to upgrade your spoke clusters and Operators.
[id="talo-platform-prepare-for-update_{context}"]
== Preparing for the updates
If both the hub and the spoke clusters are running {product-title} 4.9, you must update ZTP from version 4.9 to 4.10. If {product-title} 4.10 is used, you can set up the environment.
[id="talo-platform-prepare-for-update-env-setup_{context}"]
== Setting up the environment
{cgu-operator} can perform both platform and Operator updates.
You must mirror both the platform image and Operator images that you want to update to in your mirror registry before you can use {cgu-operator} to update your disconnected clusters. Complete the following steps to mirror the images:
* For platform updates, you must perform the following steps:
+
. Mirror the desired {product-title} image repository. Ensure that the desired platform image is mirrored by following the "Mirroring the {product-title} image repository" procedure linked in the Additional Resources. Save the contents of the `imageContentSources` section in the `imageContentSources.yaml` file:
+
.Example output
[source,yaml]
----
imageContentSources:
- mirrors:
- mirror-ocp-registry.ibmcloud.io.cpak:5000/openshift-release-dev/openshift4
source: quay.io/openshift-release-dev/ocp-release
- mirrors:
- mirror-ocp-registry.ibmcloud.io.cpak:5000/openshift-release-dev/openshift4
source: quay.io/openshift-release-dev/ocp-v4.0-art-dev
----
. Save the image signature of the desired platform image that was mirrored. You must add the image signature to the `PolicyGenTemplate` CR for platform updates. To get the image signature, perform the following steps:
.. Specify the desired {product-title} tag by running the following command:
+
[source,terminal]
----
$ OCP_RELEASE_NUMBER=<release_version>
----
.. Specify the architecture of the server by running the following command:
+
[source,terminal]
----
$ ARCHITECTURE=<server_architecture>
----
.. Get the release image digest from Quay by running the following command
+
[source,terminal]
----
$ DIGEST="$(oc adm release info quay.io/openshift-release-dev/ocp-release:${OCP_RELEASE_NUMBER}-${ARCHITECTURE} | sed -n 's/Pull From: .*@//p')"
----
.. Set the digest algorithm by running the following command:
+
[source,terminal]
----
$ DIGEST_ALGO="${DIGEST%%:*}"
----
.. Set the digest signature by running the following command:
+
[source,terminal]
----
$ DIGEST_ENCODED="${DIGEST#*:}"
----
.. Get the image signature from the link:https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release/[mirror.openshift.com] website by running the following command:
+
[source,terminal]
----
$ SIGNATURE_BASE64=$(curl -s "https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release/${DIGEST_ALGO}=${DIGEST_ENCODED}/signature-1" | base64 -w0 && echo)
----
.. Save the image signature to the `checksum-<OCP_RELEASE_NUMBER>.yaml` file by running the following commands:
+
[source,terminal]
----
$ cat >checksum-${OCP_RELEASE_NUMBER}.yaml <<EOF
${DIGEST_ALGO}-${DIGEST_ENCODED}: ${SIGNATURE_BASE64}
EOF
----
. Prepare the update graph. You have two options to prepare the update graph:
.. Use the OpenShift Update Service.
+
For more information about how to set up the graph on the hub cluster, see link:https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes/2.4/html/clusters/managing-your-clusters#deploy-the-operator-for-cincinnati[Deploy the operator for OpenShift Update Service] and link:https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes/2.4/html/clusters/managing-your-clusters#build-the-graph-data-init-container[Build the graph data init container].
.. Make a local copy of the upstream graph. Host the update graph on an `http` or `https` server in the disconnected environment that has access to the spoke cluster. To download the update graph, use the following command:
+
[source,terminal]
----
$ curl -s https://api.openshift.com/api/upgrades_info/v1/graph?channel=stable-4.10 -o ~/upgrade-graph_stable-4.10
----
* For Operator updates, you must perform the following task:
** Mirror the Operator catalogs. Ensure that the desired operator images are mirrored by following the procedure in the "Mirroring Operator catalogs for use with disconnected clusters" section.

View File

@@ -0,0 +1,437 @@
// Module included in the following assemblies:
// Epic CNF-2600 (CNF-2133) (4.10), Story TELCODOCS-285
// * scalability_and_performance/cnf-topology-aware-lifecycle-manager.adoc
:_content-type: PROCEDURE
[id="talo-troubleshooting_{context}"]
= Troubleshooting the {cgu-operator-full}
The {cgu-operator-first} is an {product-title} Operator that remediates {rh-rhacm} policies. When issues occur, use the `oc adm must-gather` command to gather details and logs and to take steps in debugging the issues.
For more information about related topics, see the following documentation:
* link:https://access.redhat.com/articles/6218901[Red Hat Advanced Cluster Management for Kubernetes 2.4 Support Matrix]
* link:https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes/2.0/html/troubleshooting/troubleshooting[Red Hat Advanced Cluster Management Troubleshooting]
* The "Troubleshooting Operator issues" section
[id="talo-general-troubleshooting_{context}"]
== General troubleshooting
You can determine the cause of the problem by reviewing the following questions:
* Is the configuration that you are applying supported?
** Are the {rh-rhacm} and the {product-title} versions compatible?
** Are the {cgu-operator} and {rh-rhacm} versions compatible?
* Which of the following components is causing the problem?
** <<talo-troubleshooting-managed-policies_{context}>>
** <<talo-troubleshooting-clusters_{context}>>
** <<talo-troubleshooting-remediation-strategy_{context}>>
** <<talo-troubleshooting-remediation-talo_{context}>>
To ensure that the `ClusterGroupUpgrade` configuration is functional, you can do the following:
. Create the `ClusterGroupUpgrade` CR with the `spec.enable` field set to `false`.
. Wait for the status to be updated and go through the troubleshooting questions.
. If everything looks as expected, set the `spec.enable` field to `true` in the `ClusterGroupUpgrade` CR.
[WARNING]
====
After you set the `spec.enable` field to `true` in the `ClusterUpgradeGroup` CR , the update procedure starts and you cannot edit the CR's `spec` fields anymore.
====
[id="talo-troubleshooting-modify-cgu_{context}"]
== Cannot modify the ClusterUpgradeGroup CR
Issue:: You cannot edit the `ClusterUpgradeGroup` CR after enabling the update.
Resolution:: Restart the procedure by performing the following steps:
+
. Remove the old `ClusterGroupUpgrade` CR by running the following command:
+
[source,terminal]
----
$ oc delete cgu -n <ClusterGroupUpgradeCR_namespace> <ClusterGroupUpgradeCR_name>
----
+
. Check and fix the existing issues with the managed clusters and policies.
.. Ensure that all the clusters are managed clusters and available.
.. Ensure that all the policies exist and have the `spec.remediationAction` field set to `inform`.
+
. Create a new `ClusterGroupUpgrade` CR with the correct configurations.
+
[source,terminal]
----
$ oc apply -f <ClusterGroupUpgradeCR_YAML>
----
[id="talo-troubleshooting-managed-policies_{context}"]
== Managed policies
[discrete]
== Checking managed policies on the system
Issue:: You want to check if you have the correct managed policies on the system.
Resolution:: Run the following command:
+
[source,terminal]
----
$ oc get cgu lab-upgrade -ojsonpath='{.spec.managedPolicies}'
----
+
.Example output
+
[source,json]
----
["group-du-sno-validator-du-validator-policy", "policy2-common-pao-sub-policy", "policy3-common-ptp-sub-policy"]
----
[discrete]
== Checking remediationAction mode
Issue:: You want to check if the `remediationAction` field is set to `inform` in the `spec` of the managed policies.
Resolution:: Run the following command:
+
[source,terminal]
----
$ oc get policies --all-namespaces
----
+
.Example output
+
[source,terminal]
----
NAMESPACE NAME REMEDIATION ACTION COMPLIANCE STATE AGE
default policy1-common-cluster-version-policy inform NonCompliant 5d21h
default policy2-common-pao-sub-policy inform Compliant 5d21h
default policy3-common-ptp-sub-policy inform NonCompliant 5d21h
default policy4-common-sriov-sub-policy inform NonCompliant 5d21h
----
[discrete]
== Checking policy compliance state
Issue:: You want to check the compliance state of policies.
Resolution:: Run the following command:
+
[source,terminal]
----
$ oc get policies --all-namespaces
----
+
.Example output
+
[source,terminal]
----
NAMESPACE NAME REMEDIATION ACTION COMPLIANCE STATE AGE
default policy1-common-cluster-version-policy inform NonCompliant 5d21h
default policy2-common-pao-sub-policy inform Compliant 5d21h
default policy3-common-ptp-sub-policy inform NonCompliant 5d21h
default policy4-common-sriov-sub-policy inform NonCompliant 5d21h
----
[id="talo-troubleshooting-clusters_{context}"]
== Clusters
[discrete]
=== Checking if managed clusters are present
Issue:: You want to check if the clusters in the `ClusterGroupUpgrade` CR are managed clusters.
Resolution:: Run the following command:
+
[source,terminal]
----
$ oc get managedclusters
----
+
.Example output
+
[source,terminal]
----
NAME HUB ACCEPTED MANAGED CLUSTER URLS JOINED AVAILABLE AGE
local-cluster true https://api.hub.example.com:6443 True Unknown 13d
spoke1 true https://api.spoke1.example.com:6443 True True 13d
spoke3 true https://api.spoke3.example.com:6443 True True 27h
----
. Alternatively, check the {cgu-operator} manager logs:
.. Get the name of the {cgu-operator} manager by running the following command:
+
[source,terminal]
----
$ oc get pod -n openshift-operators
----
+
.Example output
+
[source,terminal]
----
NAME READY STATUS RESTARTS AGE
cluster-group-upgrades-controller-manager-75bcc7484d-8k8xp 2/2 Running 0 45m
----
.. Check the {cgu-operator} manager logs by running the following command:
+
[source,terminal]
----
$ oc logs -n openshift-operators \
cluster-group-upgrades-controller-manager-75bcc7484d-8k8xp -c manager
----
+
.Example output
+
[source,terminal]
----
ERROR controller-runtime.manager.controller.clustergroupupgrade Reconciler error {"reconciler group": "ran.openshift.io", "reconciler kind": "ClusterGroupUpgrade", "name": "lab-upgrade", "namespace": "default", "error": "Cluster spoke5555 is not a ManagedCluster"} <1>
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
----
<1> The error message shows that the cluster is not a managed cluster.
[discrete]
=== Checking if managed clusters are available
Issue:: You want to check if the managed clusters specified in the `ClusterGroupUpgrade` CR are available.
Resolution:: Run the following command:
+
[source,terminal]
----
$ oc get managedclusters
----
+
.Example output
+
[source,terminal]
----
NAME HUB ACCEPTED MANAGED CLUSTER URLS JOINED AVAILABLE AGE
local-cluster true https://api.hub.testlab.com:6443 True Unknown 13d
spoke1 true https://api.spoke1.testlab.com:6443 True True 13d <1>
spoke3 true https://api.spoke3.testlab.com:6443 True True 27h <1>
----
<1> The value of the `AVAILABLE` field is `True` for the managed clusters.
[discrete]
=== Checking clusterSelector
Issue:: You want to check if the `clusterSelector` field is specified in the `ClusterGroupUpgrade` CR in at least one of the managed clusters.
Resolution:: Run the following command:
+
[source,terminal]
----
$ oc get managedcluster --selector=upgrade=true <1>
----
<1> The label for the clusters you want to update is `upgrade:true`.
+
.Example output
+
[source,terminal]
----
NAME HUB ACCEPTED MANAGED CLUSTER URLS JOINED AVAILABLE AGE
spoke1 true https://api.spoke1.testlab.com:6443 True True 13d
spoke3 true https://api.spoke3.testlab.com:6443 True True 27h
----
[discrete]
=== Checking if canary clusters are present
Issue:: You want to check if the canary clusters are present in the list of clusters.
+
.Example `ClusterGroupUpgrade` CR
[source,yaml]
----
spec:
clusters:
- spoke1
- spoke3
clusterSelector:
- upgrade2=true
remediationStrategy:
canaries:
- spoke3
maxConcurrency: 2
timeout: 240
----
Resolution:: Run the following commands:
+
[source,terminal]
----
$ oc get cgu lab-upgrade -ojsonpath='{.spec.clusters}'
----
+
.Example output
+
[source,json]
----
["spoke1", "spoke3"]
----
. Check if the canary clusters are present in the list of clusters that match `clusterSelector` labels by running the following command:
+
[source,terminal]
----
$ oc get managedcluster --selector=upgrade=true
----
+
.Example output
+
[source,terminal]
----
NAME HUB ACCEPTED MANAGED CLUSTER URLS JOINED AVAILABLE AGE
spoke1 true https://api.spoke1.testlab.com:6443 True True 13d
spoke3 true https://api.spoke3.testlab.com:6443 True True 27h
----
[NOTE]
====
A cluster can be present in `spec.clusters` and also be matched by the `spec.clusterSelecter` label.
====
[discrete]
=== Checking the pre-caching status on spoke clusters
. Check the status of pre-caching by running the following command on the spoke cluster:
+
[source,terminal]
----
$ oc get jobs,pods -n openshift-talo-pre-cache
----
[id="talo-troubleshooting-remediation-strategy_{context}"]
== Remediation Strategy
[discrete]
=== Checking if remediationStrategy is present in the ClusterGroupUpgrade CR
Issue:: You want to check if the `remediationStrategy` is present in the `ClusterGroupUpgrade` CR.
Resolution:: Run the following command:
+
[source,terminal]
----
$ oc get cgu lab-upgrade -ojsonpath='{.spec.remediationStrategy}'
----
+
.Example output
+
[source,json]
----
{"maxConcurrency":2, "timeout":240}
----
[discrete]
=== Checking if maxConcurrency is specified in the ClusterGroupUpgrade CR
Issue:: You want to check if the `maxConcurrency` is specified in the `ClusterGroupUpgrade` CR.
Resolution:: Run the following command:
+
[source,terminal]
----
$ oc get cgu lab-upgrade -ojsonpath='{.spec.remediationStrategy.maxConcurrency}'
----
+
.Example output
+
[source,terminal]
----
2
----
[id="talo-troubleshooting-remediation-talo_{context}"]
== {cgu-operator-full}
[discrete]
=== Checking condition message and status in the ClusterGroupUpgrade CR
Issue:: You want to check the value of the `status.conditions` field in the `ClusterGroupUpgrade` CR.
Resolution:: Run the following command:
+
[source,terminal]
----
$ oc get cgu lab-upgrade -ojsonpath='{.status.conditions}'
----
+
.Example output
+
[source,json]
----
{"lastTransitionTime":"2022-02-17T22:25:28Z", "message":"The ClusterGroupUpgrade CR has managed policies that are missing:[policyThatDoesntExist]", "reason":"UpgradeCannotStart", "status":"False", "type":"Ready"}
----
[discrete]
=== Checking corresponding copied policies
Issue:: You want to check if every policy from `status.managedPoliciesForUpgrade` has a corresponding policy in `status.copiedPolicies`.
Resolution:: Run the following command:
+
[source,terminal]
----
$ oc get cgu lab-upgrade -oyaml
----
+
.Example output
+
[source,yaml]
----
status:
copiedPolicies:
- lab-upgrade-policy3-common-ptp-sub-policy
managedPoliciesForUpgrade:
- name: policy3-common-ptp-sub-policy
namespace: default
----
[discrete]
=== Checking if status.remediationPlan was computed
Issue:: You want to check if `status.remediationPlan` is computed.
Resolution:: Run the following command:
+
[source,terminal]
----
$ oc get cgu lab-upgrade -ojsonpath='{.status.remediationPlan}'
----
+
.Example output
+
[source,json]
----
[["spoke2", "spoke3"]]
----
[discrete]
=== Errors in the {cgu-operator} manager container
Issue:: You want to check the logs of the manager container of {cgu-operator}.
Resolution:: Run the following command:
+
[source,terminal]
----
$ oc logs -n openshift-operators \
cluster-group-upgrades-controller-manager-75bcc7484d-8k8xp -c manager
----
+
.Example output
+
[source,terminal]
----
ERROR controller-runtime.manager.controller.clustergroupupgrade Reconciler error {"reconciler group": "ran.openshift.io", "reconciler kind": "ClusterGroupUpgrade", "name": "lab-upgrade", "namespace": "default", "error": "Cluster spoke5555 is not a ManagedCluster"} <1>
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
----
<1> Displays the error.

View File

@@ -0,0 +1,15 @@
// Module included in the following assemblies:
// Epic CNF-2600 (CNF-2133) (4.10), Story TELCODOCS-285
// scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: CONCEPT
[id="cnf-topology-aware-lifecycle-manager"]
= Updating managed policies with the {cgu-operator-full}
include::../_attributes/common-attributes.adoc[]
//:context: cnf-topology-aware-lifecycle-manager
You can use the {cgu-operator-first} to manage the software lifecycle of multiple OpenShift clusters. {cgu-operator} uses {rh-rhacm-first} policies to perform changes on the target clusters.
:FeatureName: The Cluster Group Upgrades Operator
include::snippets/technology-preview.adoc[]

View File

@@ -0,0 +1,52 @@
// Module included in the following assemblies:
//
// * monitoring/using-rfhe.adoc
:_content-type: PROCEDURE
[id="hw-installing-amq-interconnect-messaging-bus_{context}"]
= Installing the AMQ messaging bus
To pass Redfish bare-metal event notifications between publisher and subscriber on a node, you must install and configure an AMQ messaging bus to run locally on the node. You do this by installing the AMQ Interconnect Operator for use in the cluster.
.Prerequisites
* Install the {product-title} CLI (`oc`).
* Log in as a user with `cluster-admin` privileges.
.Procedure
* Install the AMQ Interconnect Operator to its own `amq-interconnect` namespace. See link:https://access.redhat.com/documentation/en-us/red_hat_amq/2021.q1/html/deploying_amq_interconnect_on_openshift/adding-operator-router-ocp[Installing the AMQ Interconnect Operator].
.Verification
. Verify that the AMQ Interconnect Operator is available and the required pods are running:
+
[source,terminal]
----
$ oc get pods -n amq-interconnect
----
+
.Example output
[source,terminal]
----
NAME READY STATUS RESTARTS AGE
amq-interconnect-645db76c76-k8ghs 1/1 Running 0 23h
interconnect-operator-5cb5fc7cc-4v7qm 1/1 Running 0 23h
----
. Verify that the required `bare-metal-event-relay` bare-metal event producer pod is running in the `openshift-bare-metal-events` namespace:
+
[source,terminal]
----
$ oc get pods -n openshift-bare-metal-events
----
+
.Example output
[source,terminal]
----
NAME READY STATUS RESTARTS AGE
hw-event-proxy-operator-controller-manager-74d5649b7c-dzgtl 2/2 Running 0 25s
----

View File

@@ -36,6 +36,7 @@ The server must have a Baseboard Management Controller (BMC) when booting with v
|Kubernetes API|`api.<cluster_name>.<base_domain>`| Add a DNS A/AAAA or CNAME record. This record must be resolvable by clients external to the cluster.
|Internal API|`api-int.<cluster_name>.<base_domain>`| Add a DNS A/AAAA or CNAME record when creating the ISO manually. This record must be resolvable by nodes within the cluster.
|Ingress route|`*.apps.<cluster_name>.<base_domain>`| Add a wildcard DNS A/AAAA or CNAME record that targets the node. This record must be resolvable by clients external to the cluster.
|Cluster node|`<hostname>.<cluster_name>.<base_domain>`| Add a DNS A/AAAA or CNAME record and DNS PTR record to identify the node.
|====
+
Without persistent IP addresses, communications between the `apiserver` and `etcd` might fail.

View File

@@ -0,0 +1,164 @@
// Module included in the following assemblies:
//
// * monitoring/using-rfhe.adoc
:_content-type: PROCEDURE
[id="nw-rfhe-creating-bmc-event-sub_{context}"]
= Subscribing to bare-metal events
You can configure the baseboard management controller (BMC) to send bare-metal events to subscribed applications running in an {product-title} cluster. Example Redfish bare-metal events include an increase in device temperature, or removal of a device. You subscribe applications to bare-metal events using a REST API.
[IMPORTANT]
====
You can only create a `BMCEventSubscription` custom resource (CR) for physical hardware that supports Redfish and has a vendor interface set to `redfish` or `idrac-redfish`.
====
[NOTE]
====
Use the `BMCEventSubscription` CR to subscribe to predefined Redfish events. The Redfish standard does not provide an option to create specific alerts and thresholds. For example, to receive an alert event when an enclosure's temperature exceeds 40° Celsius, you must manually configure the event according to the vendor's recommendations.
====
Perform the following procedure to subscribe to bare-metal events for the node using a `BMCEventSubscription` CR.
.Prerequisites
* Install the OpenShift CLI (`oc`).
* Log in as a user with `cluster-admin` privileges.
* Get the user name and password for the BMC.
* Deploy a bare-metal node with a Redfish-enabled Baseboard Management Controller (BMC) in your cluster, and enable Redfish events on the BMC.
+
[NOTE]
====
Enabling Redfish events on specific hardware is outside the scope of this information. For more information about enabling Redfish events for your specific hardware, consult the BMC manufacturer documentation.
====
.Procedure
. Confirm that the node hardware has the Redfish `EventService` enabled by running the following `curl` command:
+
[source,terminal]
----
curl https://<bmc_ip_address>/redfish/v1/EventService --insecure -H 'Content-Type: application/json' -u "<bmc_username>:<password>"
----
+
where:
+
--
bmc_ip_address:: is the IP address of the BMC where the Redfish events are generated.
--
+
.Example output
[source,terminal]
----
{
"@odata.context": "/redfish/v1/$metadata#EventService.EventService",
"@odata.id": "/redfish/v1/EventService",
"@odata.type": "#EventService.v1_0_2.EventService",
"Actions": {
"#EventService.SubmitTestEvent": {
"EventType@Redfish.AllowableValues": ["StatusChange", "ResourceUpdated", "ResourceAdded", "ResourceRemoved", "Alert"],
"target": "/redfish/v1/EventService/Actions/EventService.SubmitTestEvent"
}
},
"DeliveryRetryAttempts": 3,
"DeliveryRetryIntervalSeconds": 30,
"Description": "Event Service represents the properties for the service",
"EventTypesForSubscription": ["StatusChange", "ResourceUpdated", "ResourceAdded", "ResourceRemoved", "Alert"],
"EventTypesForSubscription@odata.count": 5,
"Id": "EventService",
"Name": "Event Service",
"ServiceEnabled": true,
"Status": {
"Health": "OK",
"HealthRollup": "OK",
"State": "Enabled"
},
"Subscriptions": {
"@odata.id": "/redfish/v1/EventService/Subscriptions"
}
}
----
. Get the {redfish-operator} service route for the cluster by running the following command:
+
[source,terminal]
----
$ oc get route -n openshift-bare-metal-events
----
+
.Example output
[source,terminal]
----
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
hw-event-proxy hw-event-proxy-openshift-bare-metal-events.apps.compute-1.example.com hw-event-proxy-service 9087 edge None
----
. Create a `BMCEventSubscription` resource to subscribe to the Redfish events:
.. Save the following YAML in the `bmc_sub.yaml` file:
+
[source,yaml]
----
apiVersion: metal3.io/v1alpha1
kind: BMCEventSubscription
metadata:
name: sub-01
namespace: openshift-machine-api
spec:
hostName: <hostname> <1>
destination: <proxy_service_url> <2>
context: ''
----
<1> Specifies the name or UUID of the worker node where the Redfish events are generated.
<2> Specifies the bare-metal event proxy service, for example, `https://hw-event-proxy-openshift-bare-metal-events.apps.compute-1.example.com/webhook`.
.. Create the `BMCEventSubscription` CR:
+
[source,terminal]
----
$ oc create -f bmc_sub.yaml
----
. Optional: To delete the BMC event subscription, run the following command:
+
[source,terminal]
----
$ oc delete -f bmc_sub.yaml
----
. Optional: To manually create a Redfish event subscription without creating a `BMCEventSubscription` CR, run the following `curl` command, specifying the BMC username and password.
+
[source,terminal]
----
$ curl -i -k -X POST -H "Content-Type: application/json" -d '{"Destination": "https://<proxy_service_url>", "Protocol" : "Redfish", "EventTypes": ["Alert"], "Context": "root"}' -u <bmc_username>:<password> 'https://<bmc_ip_address>/redfish/v1/EventService/Subscriptions' v
----
+
where:
+
--
proxy_service_url:: is the bare-metal event proxy service, for example, `https://hw-event-proxy-openshift-bare-metal-events.apps.compute-1.example.com/webhook`.
--
+
--
bmc_ip_address:: is the IP address of the BMC where the Redfish events are generated.
--
+
.Example output
[source,terminal]
----
HTTP/1.1 201 Created
Server: AMI MegaRAC Redfish Service
Location: /redfish/v1/EventService/Subscriptions/1
Allow: GET, POST
Access-Control-Allow-Origin: *
Access-Control-Expose-Headers: X-Auth-Token
Access-Control-Allow-Headers: X-Auth-Token
Access-Control-Allow-Credentials: true
Cache-Control: no-cache, must-revalidate
Link: <http://redfish.dmtf.org/schemas/v1/EventDestination.v1_6_0.json>; rel=describedby
Link: <http://redfish.dmtf.org/schemas/v1/EventDestination.v1_6_0.json>
Link: </redfish/v1/EventService/Subscriptions>; path=
ETag: "1651135676"
Content-Type: application/json; charset=UTF-8
OData-Version: 4.0
Content-Length: 614
Date: Thu, 28 Apr 2022 08:47:57 GMT
----

View File

@@ -0,0 +1,78 @@
// Module included in the following assemblies:
//
// * monitoring/using-rfhe.adoc
:_content-type: PROCEDURE
[id="nw-rfhe-creating-hardware-event_{context}"]
= Creating the bare-metal event and Secret CRs
To start using bare-metal events, create the `HardwareEvent` custom resource (CR) for the host where the Redfish hardware is present. Hardware events and faults are reported in the `hw-event-proxy` logs.
.Prerequisites
* Install the OpenShift CLI (`oc`).
* Log in as a user with `cluster-admin` privileges.
* Install the {redfish-operator}.
* Create a `BMCEventSubscription` CR for the BMC Redfish hardware.
[NOTE]
====
Multiple `HardwareEvent` resources are not permitted.
====
.Procedure
. Create the `HardwareEvent` custom resource (CR):
.. Save the following YAML in the `hw-event.yaml` file:
+
[source,yaml]
----
apiVersion: "event.redhat-cne.org/v1alpha1"
kind: "HardwareEvent"
metadata:
name: "hardware-event"
spec:
nodeSelector:
node-role.kubernetes.io/hw-event: "" <1>
transportHost: "amqp://amq-router-service-name.amq-namespace.svc.cluster.local" <2>
logLevel: "debug" <3>
msgParserTimeout: "10" <4>
----
<1> Required. Use the `nodeSelector` field to target nodes with the specified label, for example, `node-role.kubernetes.io/hw-event: ""`.
<2> Required. AMQP host that delivers the events at the transport layer using the AMQP protocol.
<3> Optional. The default value is `debug`. Sets the log level in `hw-event-proxy` logs. The following log levels are available: `fatal`, `error`, `warning`, `info`, `debug`, `trace`.
<4> Optional. Sets the timeout value in milliseconds for the Message Parser. If a message parsing request is not responded to within the timeout duration, the original hardware event message is passed to the cloud native event framework. The default value is 10.
.. Create the `HardwareEvent` CR:
+
[source,terminal]
----
$ oc create -f hardware-event.yaml
----
. Create a BMC username and password `Secret` CR that enables the hardware events proxy to access the Redfish message registry for the bare-metal host.
+
.. Save the following YAML in the `hw-event-bmc-secret.yaml` file:
+
[source,yaml]
----
apiVersion: v1
kind: Secret
metadata:
name: redfish-basic-auth
type: Opaque
stringData: <1>
username: <bmc_username>
password: <bmc_password>
# BMC host DNS or IP address
hostaddr: <bmc_host_ip_address>
----
<1> Enter plain text values for the various items under `stringData`.
+
.. Create the `Secret` CR:
+
[source,terminal]
----
$ oc create -f hw-event-bmc-secret.yaml
----

View File

@@ -0,0 +1,103 @@
// Module included in the following assemblies:
//
// * monitoring/using-rfhe.adoc
:_content-type: PROCEDURE
[id="nw-rfhe-installing-operator-cli_{context}"]
= Installing the {redfish-operator} using the CLI
As a cluster administrator, you can install the {redfish-operator} Operator by using the CLI.
.Prerequisites
* A cluster that is installed on bare-metal hardware with nodes that have a RedFish-enabled Baseboard Management Controller (BMC).
* Install the OpenShift CLI (`oc`).
* Log in as a user with `cluster-admin` privileges.
.Procedure
. Create a namespace for the {redfish-operator}.
.. Save the following YAML in the `bare-metal-events-namespace.yaml` file:
+
[source,yaml]
----
apiVersion: v1
kind: Namespace
metadata:
name: openshift-bare-metal-events
labels:
name: openshift-bare-metal-events
openshift.io/cluster-monitoring: "true"
----
.. Create the `Namespace` CR:
+
[source,terminal]
----
$ oc create -f bare-metal-events-namespace.yaml
----
. Create an Operator group for the {redfish-operator} Operator.
.. Save the following YAML in the `bare-metal-events-operatorgroup.yaml` file:
+
[source,yaml]
----
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
name: bare-metal-event-relay-group
namespace: openshift-bare-metal-events
spec:
targetNamespaces:
- openshift-bare-metal-events
----
.. Create the `OperatorGroup` CR:
+
[source,terminal]
----
$ oc create -f bare-metal-events-operatorgroup.yaml
----
. Subscribe to the {redfish-operator}.
.. Save the following YAML in the `bare-metal-events-sub.yaml` file:
+
[source,yaml]
----
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: bare-metal-event-relay-subscription
namespace: openshift-bare-metal-events
spec:
channel: "stable"
name: bare-metal-event-relay
source: redhat-operators
sourceNamespace: openshift-marketplace
----
.. Create the `Subscription` CR:
+
[source,terminal]
----
$ oc create -f bare-metal-events-sub.yaml
----
.Verification
To verify that the {redfish-operator} Operator is installed, run the following command:
[source,terminal]
----
$ oc get csv -n openshift-bare-metal-events -o custom-columns=Name:.metadata.name,Phase:.status.phase
----
.Example output
[source,terminal]
----
Name Phase
bare-metal-event-relay.4.10.0-202206301927 Succeeded
----

View File

@@ -0,0 +1,42 @@
// Module included in the following assemblies:
//
// * monitoring/using-rfhe.adoc
:_content-type: PROCEDURE
[id="nw-rfhe-installing-operator-web-console_{context}"]
= Installing the {redfish-operator} using the web console
As a cluster administrator, you can install the {redfish-operator} Operator using the web console.
.Prerequisites
* A cluster that is installed on bare-metal hardware with nodes that have a RedFish-enabled Baseboard Management Controller (BMC).
* Log in as a user with `cluster-admin` privileges.
.Procedure
. Install the {redfish-operator} using the {product-title} web console:
.. In the {product-title} web console, click *Operators* -> *OperatorHub*.
.. Choose *{redfish-operator}* from the list of available Operators, and then click *Install*.
.. On the *Install Operator* page, select or create a *Namespace*, select *openshift-bare-metal-events*, and then click *Install*.
.Verification
Optional: You can verify that the Operator installed successfully by performing the following check:
. Switch to the *Operators* -> *Installed Operators* page.
. Ensure that *{redfish-operator}* is listed in the project with a *Status* of *InstallSucceeded*.
+
[NOTE]
====
During installation an Operator might display a *Failed* status. If the installation later succeeds with an *InstallSucceeded* message, you can ignore the *Failed* message.
====
If the operator does not appear as installed, to troubleshoot further:
* Go to the *Operators* -> *Installed Operators* page and inspect the *Operator Subscriptions* and *Install Plans* tabs for any failure or errors under *Status*.
* Go to the *Workloads* -> *Pods* page and check the logs for pods in the project namespace.

View File

@@ -0,0 +1,50 @@
// Module included in the following assemblies:
//
// * monitoring/using-rfhe.adoc
:_content-type: CONCEPT
[id="nw-rfhe-introduction_{context}"]
= How bare-metal events work
The {redfish-operator} enables applications running on bare-metal clusters to respond quickly to Redfish hardware changes and failures such as breaches of temperature thresholds, fan failure, disk loss, power outages, and memory failure. These hardware events are delivered over a reliable low-latency transport channel based on Advanced Message Queuing Protocol (AMQP). The latency of the messaging service is between 10 to 20 milliseconds.
The {redfish-operator} provides a publish-subscribe service for the hardware events, where multiple applications can use REST APIs to subscribe and consume the events. The {redfish-operator} supports hardware that complies with Redfish OpenAPI v1.8 or higher.
[id="rfhe-elements_{context}"]
== {redfish-operator} data flow
The following figure illustrates an example bare-metal events data flow. vDU is used as an example of an application interacting with bare-metal events:
.{redfish-operator} data flow
image::211_OpenShift_Redfish_dataflow_0222.png[Bare-metal events data flow]
=== Operator-managed pod
The Operator uses custom resources to manage the pod containing the {redfish-operator} and its components using the `HardwareEvent` CR.
=== {redfish-operator}
At startup, the {redfish-operator} queries the Redfish API and downloads all the message registries, including custom registries. The {redfish-operator} then begins to receive subscribed events from the Redfish hardware.
The {redfish-operator} enables applications running on bare-metal clusters to respond quickly to Redfish hardware changes and failures such as breaches of temperature thresholds, fan failure, disk loss, power outages, and memory failure. The events are reported using the `HardwareEvent` CR.
=== Cloud native event
Cloud native events (CNE) is a REST API specification for defining the format of event data.
=== CNCF CloudEvents
link:https://cloudevents.io/[CloudEvents] is a vendor-neutral specification developed by the Cloud Native Computing Foundation (CNCF) for defining the format of event data.
=== AMQP dispatch router
The dispatch router is responsible for the message delivery service between publisher and subscriber. AMQP 1.0 qpid is an open standard that supports reliable, high-performance, fully-symmetrical messaging over the internet.
=== Cloud event proxy sidecar
The cloud event proxy sidecar container image is based on the ORAN API specification and provides a publish-subscribe event framework for hardware events.
[id="rfhe-data-flow_{context}"]
== Redfish message parsing service
In addition to handling Redfish events, the {redfish-operator} provides message parsing for events without a `Message` property. The proxy downloads all the Redfish message registries including vendor specific registries from the hardware when it starts. If an event does not contain a `Message` property, the proxy uses the Redfish message registries to construct the `Message` and `Resolution` properties and add them to the event before passing the event to the cloud events framework. This service allows Redfish events to have smaller message size and lower transmission latency.

View File

@@ -0,0 +1,68 @@
// Module included in the following assemblies:
//
// * monitoring/using-rfhe.adoc
:_module-type: PROCEDURE
[id="nw-rfhe-querying-redfish-hardware-event-subs_{context}"]
= Querying Redfish bare-metal event subscriptions with curl
Some hardware vendors limit the amount of Redfish hardware event subscriptions. You can query the number of Redfish event subscriptions by using `curl`.
.Prerequisites
* Get the user name and password for the BMC.
* Deploy a bare-metal node with a Redfish-enabled Baseboard Management Controller (BMC) in your cluster, and enable Redfish hardware events on the BMC.
.Procedure
. Check the current subscriptions for the BMC by running the following `curl` command:
+
[source,terminal]
----
$ curl --globoff -H "Content-Type: application/json" -k -X GET --user <bmc_username>:<password> https://<bmc_ip_address>/redfish/v1/EventService/Subscriptions
----
+
where:
+
--
bmc_ip_address:: is the IP address of the BMC where the Redfish events are generated.
--
+
.Example output
[source,terminal]
----
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 435 100 435 0 0 399 0 0:00:01 0:00:01 --:--:-- 399
{
"@odata.context": "/redfish/v1/$metadata#EventDestinationCollection.EventDestinationCollection",
"@odata.etag": ""
1651137375 "",
"@odata.id": "/redfish/v1/EventService/Subscriptions",
"@odata.type": "#EventDestinationCollection.EventDestinationCollection",
"Description": "Collection for Event Subscriptions",
"Members": [
{
"@odata.id": "/redfish/v1/EventService/Subscriptions/1"
}],
"Members@odata.count": 1,
"Name": "Event Subscriptions Collection"
}
----
+
In this example, a single subscription is configured: `/redfish/v1/EventService/Subscriptions/1`.
. Optional: To remove the `/redfish/v1/EventService/Subscriptions/1` subscription with `curl`, run the following command, specifying the BMC username and password:
+
[source,terminal]
----
$ curl --globoff -L -w "%{http_code} %{url_effective}\n" -k -u <bmc_username>:<password >-H "Content-Type: application/json" -d '{}' -X DELETE https://<bmc_ip_address>/redfish/v1/EventService/Subscriptions/1
----
+
where:
+
--
bmc_ip_address:: is the IP address of the BMC where the Redfish events are generated.
--

View File

@@ -1,65 +0,0 @@
// Module included in the following assemblies:
//
// *scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: CONCEPT
[id="about-ztp-and-distributed-units-on-single-node-clusters_{context}"]
= About ZTP and distributed units on single nodes
You can install a distributed unit (DU) on a single node at scale with {rh-rhacm-first} (ACM) using the assisted installer (AI) and the policy generator with core-reduction technology enabled. The DU installation is done using zero touch provisioning (ZTP) in a disconnected environment.
ACM manages clusters in a hub and spoke architecture, where a single hub cluster manages many spoke clusters. ACM applies radio access network (RAN) policies from predefined custom resources (CRs). Hub clusters running ACM provision and deploy the spoke clusters using ZTP and AI. DU installation follows the AI installation of {product-title} on a single node.
The AI service handles provisioning of {product-title} on single nodes running on bare metal. ACM ships with and deploys the assisted installer when the `MultiClusterHub` custom resource is installed.
With ZTP and AI, you can provision {product-title} single nodes to run your DUs at scale. A high level overview of ZTP for distributed units in a disconnected environment is as follows:
* A hub cluster running ACM manages a disconnected internal registry that mirrors the {product-title} release images. The internal registry is used to provision the spoke single nodes.
* You manage the bare-metal host machines for your DUs in an inventory file that uses YAML for formatting. You store the inventory file in a Git repository.
* You install the DU bare-metal host machines on site, and make the hosts ready for provisioning. To be ready for provisioning, the following is required for each bare-metal host:
** Network connectivity - including DNS for your network. Hosts should be reachable through the hub and managed spoke clusters. Ensure there is layer 3 connectivity between the hub and the host where you want to install your hub cluster.
** Baseboard Management Controller (BMC) details for each host - ZTP uses BMC details to connect the URL and credentials for accessing the BMC.
Create spoke cluster definition CRs. These define the relevant elements for the managed clusters. Required
CRs are as follows:
+
[cols="1,1"]
|===
| Custom Resource | Description
|Namespace
|Namespace for the managed single-node cluster.
|BMCSecret CR
|Credentials for the host BMC.
|Image Pull Secret CR
|Pull secret for the disconnected registry.
|AgentClusterInstall
|Specifies the single-node cluster's configuration such as networking, number of supervisor (control plane) nodes, and so on.
|ClusterDeployment
|Defines the cluster name, domain, and other details.
|KlusterletAddonConfig
|Manages installation and termination of add-ons on the ManagedCluster for ACM.
|ManagedCluster
|Describes the managed cluster for ACM.
|InfraEnv
|Describes the installation ISO to be mounted on the destination node that the assisted installer service creates.
This is the final step of the manifest creation phase.
|BareMetalHost
|Describes the details of the bare-metal host, including BMC and credentials details.
|===
* When a change is detected in the host inventory repository, a host management event is triggered to provision the new or updated host.
* The host is provisioned. When the host is provisioned and successfully rebooted, the host agent reports `Ready` status to the hub cluster.

View File

@@ -4,7 +4,7 @@
// scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: PROCEDURE
[id="ztp-acm-adding-images-to-mirror-registry_{context}"]
= Adding {op-system} ISO and RootFS images to a disconnected mirror host
= Adding {op-system} ISO and RootFS images to the disconnected mirror host
Before you install a cluster on infrastructure that you provision, you must create {op-system-first} machines for it to use. Use a disconnected mirror to host the {op-system} images you require to provision your distributed unit (DU) bare-metal hosts.

View File

@@ -23,4 +23,4 @@ See link:https://docs.openshift.com/container-platform/4.9/operators/admin/olm-r
.Procedure
* Install {rh-rhacm} on the hub cluster in the disconnected environment. See link:https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes/2.4/html/install/installing#install-on-disconnected-networks[Installing {rh-rhacm} in disconnected networks].
* Install {rh-rhacm} on the hub cluster in the disconnected environment. See link:https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes/2.4/html/install/installing#install-on-disconnected-networks[Installing {rh-rhacm} in a disconnected environment].

View File

@@ -11,15 +11,3 @@ Before you can provision distributed units (DU) at scale, you must install {rh-r
{rh-rhacm} is deployed as an Operator on the {product-title} hub cluster. It controls clusters and applications from a single console with built-in security policies. {rh-rhacm} provisions and manage your DU hosts. To install {rh-rhacm} in a disconnected environment, you create a mirror registry that mirrors the Operator Lifecycle Manager (OLM) catalog that contains the required Operator images. OLM manages, installs, and upgrades Operators and their dependencies in the cluster.
You also use a disconnected mirror host to serve the {op-system} ISO and RootFS disk images that provision the DU bare-metal host operating system.
Before you install a cluster on infrastructure that you provision in a restricted network, you must mirror the required container images into that environment. You can also use this procedure in unrestricted networks to ensure your clusters only use container images that have satisfied your organizational controls on external content.
[IMPORTANT]
====
You must have access to the internet to obtain the necessary container images.
In this procedure, you place the mirror registry on a mirror host
that has access to both your network and the internet. If you do not have access
to a mirror host, use the disconnected procedure to copy images to a device that you
can move across network boundaries.
====

View File

@@ -0,0 +1,74 @@
// Module included in the following assemblies:
//
// scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: PROCEDURE
[id="ztp-adding-new-content-to-gitops-ztp_{context}"]
= Adding new content to the GitOps ZTP pipeline
The source CRs in the GitOps ZTP site generator container provide a set of critical features and node tuning settings for RAN Distributed Unit (DU) applications. These are applied to the clusters that you deploy with ZTP. To add or modify existing source CRs in the `ztp-site-generate` container, rebuild the `ztp-site-generate` container and make it available to the hub cluster, typically from the disconnected registry associated with the hub cluster. Any valid {product-title} CR can be added.
Perform the following procedure to add new content to the ZTP pipeline.
.Procedure
. Create a directory containing a Containerfile and the source CR YAML files that you want to include in the updated `ztp-site-generate` container, for example:
+
[source,text]
----
ztp-update/
├── example-cr1.yaml
├── example-cr2.yaml
└── ztp-update.in
----
. Add the following content to the `ztp-update.in` Containerfile:
+
[source,text]
----
FROM registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.10
ADD example-cr2.yaml /kustomize/plugin/ran.openshift.io/v1/policygentemplate/source-crs/
ADD example-cr1.yaml /kustomize/plugin/ran.openshift.io/v1/policygentemplate/source-crs/
----
. Open a terminal at the `ztp-update/` folder and rebuild the container:
+
[source,terminal]
----
$ podman build -t ztp-site-generate-rhel8-custom:v4.10-custom-1
----
. Push the built container image to your disconnected registry, for example:
+
[source,terminal]
----
$ podman push localhost/ztp-site-generate-rhel8-custom:v4.10-custom-1 registry.example.com:5000/ztp-site-generate-rhel8-custom:v4.10-custom-1
----
. Patch the Argo CD instance on the hub cluster to point to the newly built container image:
+
[source,terminal]
----
$ oc patch -n openshift-gitops argocd openshift-gitops --type=json -p '[{"op": "replace", "path":"/spec/repo/initContainers/0/image", "value": "registry.example.com:5000/ztp-site-generate-rhel8-custom:v4.10-custom-1"} ]'
----
+
When the Argo CD instance is patched, the `openshift-gitops-repo-server` pod automatically restarts.
.Verification
. Verify that the new `openshift-gitops-repo-server` pod has completed initialization and that the previous repo pod is terminated:
+
[source,terminal]
----
$ oc get pods -n openshift-gitops | grep openshift-gitops-repo-server
----
+
.Example output
+
[source,terminal]
----
openshift-gitops-server-7df86f9774-db682 1/1 Running 1 28s
----
+
You must wait until the new `openshift-gitops-repo-server` pod has completed initialization and the previous pod is terminated before the newly added container image content is available.

View File

@@ -8,7 +8,7 @@
The Assisted Installer Service (AIS) deploys {product-title} clusters. {rh-rhacm-first} ships with AIS. AIS is deployed when you enable the MultiClusterHub Operator on the {rh-rhacm} hub cluster.
For distributed units (DUs), {rh-rhacm} supports {product-title} deployments that run on a single bare-metal host. The single-node cluster acts as both a control plane and a worker node.
For distributed units (DUs), {rh-rhacm} supports {product-title} deployments that run on a single bare-metal host, three-node clusters, or standard clusters. In the case of single node clusters or three-node clusters, all nodes act as both control plane and worker nodes.
.Prerequisites

View File

@@ -1,160 +0,0 @@
// Module included in the following assemblies:
//
// scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: PROCEDURE
[id="ztp-applying-source-custom-resource-policies_{context}"]
= Applying source custom resource policies
Source custom resource policies include the following:
* SR-IOV policies
* PTP policies
* Performance Add-on Operator policies
* MachineConfigPool policies
* SCTP policies
You need to define the source custom resource that generates the ACM policy with consideration of possible overlay to its metadata or spec/data.
For example, a `common-namespace-policy` contains a `Namespace` definition that exists in all managed clusters.
This `namespace` is placed under the Common category and there are no changes for its spec or data across all clusters.
.Namespace policy example
The following example shows the source custom resource for this namespace:
[source,yaml]
----
apiVersion: v1
kind: Namespace
metadata:
name: openshift-sriov-network-operator
labels:
openshift.io/run-level: "1"
----
.Example output
The generated policy that applies this `namespace` includes the `namespace` as it is defined above without any change, as shown in this example:
[source,yaml]
----
apiVersion: policy.open-cluster-management.io/v1
kind: Policy
metadata:
name: common-sriov-sub-ns-policy
namespace: common-sub
annotations:
policy.open-cluster-management.io/categories: CM Configuration Management
policy.open-cluster-management.io/controls: CM-2 Baseline Configuration
policy.open-cluster-management.io/standards: NIST SP 800-53
spec:
remediationAction: enforce
disabled: false
policy-templates:
- objectDefinition:
apiVersion: policy.open-cluster-management.io/v1
kind: ConfigurationPolicy
metadata:
name: common-sriov-sub-ns-policy-config
spec:
remediationAction: enforce
severity: low
namespaceselector:
exclude:
- kube-*
include:
- '*'
object-templates:
- complianceType: musthave
objectDefinition:
apiVersion: v1
kind: Namespace
metadata:
labels:
openshift.io/run-level: "1"
name: openshift-sriov-network-operator
----
.SRIOV policy example
The following example shows a `SriovNetworkNodePolicy` definition that exists in different clusters with a different specification for each cluster.
The example also shows the source custom resource for the `SriovNetworkNodePolicy`:
[source,yaml]
----
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: sriov-nnp
namespace: openshift-sriov-network-operator
spec:
# The $ tells the policy generator to overlay/remove the spec.item in the generated policy.
deviceType: $deviceType
isRdma: false
nicSelector:
pfNames: [$pfNames]
nodeSelector:
node-role.kubernetes.io/worker: ""
numVfs: $numVfs
priority: $priority
resourceName: $resourceName
----
.Example output
The `SriovNetworkNodePolicy` name and `namespace` are the same for all clusters, so both are defined in the source `SriovNetworkNodePolicy`.
However, the generated policy requires the `$deviceType`, `$numVfs`, as input parameters in order to adjust the policy for each cluster.
The generated policy is shown in this example:
[source,yaml]
----
apiVersion: policy.open-cluster-management.io/v1
kind: Policy
metadata:
name: site-du-sno-1-sriov-nnp-mh-policy
namespace: sites-sub
annotations:
policy.open-cluster-management.io/categories: CM Configuration Management
policy.open-cluster-management.io/controls: CM-2 Baseline Configuration
policy.open-cluster-management.io/standards: NIST SP 800-53
spec:
remediationAction: enforce
disabled: false
policy-templates:
- objectDefinition:
apiVersion: policy.open-cluster-management.io/v1
kind: ConfigurationPolicy
metadata:
name: site-du-sno-1-sriov-nnp-mh-policy-config
spec:
remediationAction: enforce
severity: low
namespaceselector:
exclude:
- kube-*
include:
- '*'
object-templates:
- complianceType: musthave
objectDefinition:
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: sriov-nnp-du-mh
namespace: openshift-sriov-network-operator
spec:
deviceType: vfio-pci
isRdma: false
nicSelector:
pfNames:
- ens7f0
nodeSelector:
node-role.kubernetes.io/worker: ""
numVfs: 8
resourceName: du_mh
----
[NOTE]
====
Defining the required input parameters as `$value`, for example `$deviceType`, is not mandatory. The `$` tells the policy generator to overlay or remove the item from the generated policy. Otherwise, the value does not change.
====

View File

@@ -1,31 +0,0 @@
// Module included in the following assemblies:
//
// scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: CONCEPT
[id="ztp-applying-the-ran-policies-for-monitoring-cluster-activity_{context}"]
= Applying the RAN policies for monitoring cluster activity
Zero touch provisioning (ZTP) uses {rh-rhacm-first} to apply the radio access network (RAN) policies using a policy-based governance approach to automatically monitor cluster activity.
The policy generator (PolicyGen) is a Kustomize plug-in that facilitates creating ACM policies from predefined custom resources.
There are three main items: Policy Categorization, Source CR policy, and PolicyGenTemplate. PolicyGen relies on these to generate the policies and
their placement bindings and rules.
The following diagram shows how the RAN policy generator interacts with GitOps and ACM.
image::175_OpenShift_ACM_0821_1.png[RAN policy generator]
RAN policies are categorized into three main groups:
Common:: A policy that exists in the `Common` category is applied to all clusters to be represented by the site plan.
Groups:: A policy that exists in the `Groups` category is applied to a group of clusters. Every group of clusters could have their own policies that exist under the
Groups category. For example, `Groups/group1` could have its own policies that are applied to the clusters belonging to `group1`.
Sites:: A policy that exists in the `Sites` category is applied to a specific cluster. Any cluster could have its own policies that exist in the `Sites` category.
For example, `Sites/cluster1` will have its own policies applied to `cluster1`.
The following diagram shows how policies are generated.
image::175_OpenShift_ACM_0821_2.png[Generating policies]

View File

@@ -1,30 +0,0 @@
// Module included in the following assemblies:
//
// *scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: PROCEDURE
[id="ztp-checking-the-installation-status_{context}"]
= Checking the installation status
The ArgoCD pipeline detects the `SiteConfig` and `PolicyGenTemplate` custom resources (CRs) in the Git repository and syncs them to the hub cluster. In the process, it generates installation and policy CRs and applies them to the hub cluster. You can monitor the progress of this synchronization in the ArgoCD dashboard.
.Procedure
. Monitor the progress of cluster installation using the following commands:
+
[source,terminal]
----
$ export CLUSTER=<cluster_name>
----
+
[source,terminal]
----
$ oc get agentclusterinstall -n $CLUSTER $CLUSTER -o jsonpath='{.status.conditions[?(@.type=="Completed")]}' | jq
----
+
[source,terminal]
----
$ curl -sk $(oc get agentclusterinstall -n $CLUSTER $CLUSTER -o jsonpath='{.status.debugInfo.eventsURL}') | jq '.[-2,-1]'
----
. Use the {rh-rhacm-first} (ACM) dashboard to monitor the progress of policy reconciliation.

View File

@@ -1,23 +0,0 @@
// Module included in the following assemblies:
//
// scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: CONCEPT
[id="ztp-cluster-provisioning_{context}"]
= Cluster provisioning
Zero touch provisioning (ZTP) provisions clusters using a layered approach. The base components consist of {op-system-first}, the basic operating system
for the cluster, and {product-title}. After these components are installed, the worker node can join the existing cluster. When the node has joined the existing cluster, the 5G RAN profile Operators are applied.
The following diagram illustrates this architecture.
image::177_OpenShift_cluster_provisioning_0821.png[Cluster provisioning]
The following RAN Operators are deployed on every cluster:
* Machine Config
* Precision Time Protocol (PTP)
* Performance Addon Operator
* SR-IOV
* Local Storage Operator
* Logging Operator

View File

@@ -79,12 +79,12 @@ spec:
name: <cluster_name>
namespace: <cluster_name>
sshAuthorizedKey: <public_key>
agentLabels: <1>
location: "<label-name>"
agentLabelSelector:
matchLabels:
cluster-name: <cluster_name>
pullSecretRef:
name: assisted-deployment-pull-secret
nmStateConfigLabelSelector:
matchLabels:
sno-cluster-<cluster-name>: <cluster_name> # Match this label
----
<1> Sets a label to match. The labels apply when the agents boot.

View File

@@ -4,37 +4,13 @@
:_module-type: PROCEDURE
[id="ztp-configuring-ptp-fast-events_{context}"]
= Configuring PTP fast events using PolicyGenTemplate custom resources and GitOps ZTP
= Configuring PTP fast events using PolicyGenTemplate CRs
You can configure PTP fast events for vRAN clusters that are deployed using the GitOps Zero Touch Provisioning (ZTP) pipeline. Use `PolicyGenTemplate` custom resources (CRs) as the basis to create a hierarchy of configuration files tailored to your specific site requirements.
The `PolicyGenTemplate` CRs that are relevant to PTP events can be found in the `/home/ztp/argocd/example` folder in the `quay.io/redhat_emp1/ztp-site-generator:latest` reference architecture container image. The reference architecture has a `/policygentemplates` and `/siteconfig` folder. The `/policygentemplates` folder has common, group, and site-specific configuration CRs. Each `PolicyGenTemplate` CR refers to other CRs that are in the `/source-crs` folder of the reference architecture.
The `PolicyGenTemplate` CRs required to deploy PTP fast events are described below.
.PolicyGenTemplate CRs for vRAN deployments
[cols=2*, options="header"]
|====
|PolicyGenTemplate CR
|Description
|`common-ranGen.yaml`
|Contains the common RAN policies that get applied to all clusters. To deploy the PTP Operator to your clusters, you configure `Namespace`, `Subscription`, and `OperatorGroup` CRs.
|`group-du-3node-ranGen.yaml`
|Contains the RAN policies for three-node clusters only, including PTP fast events configuration.
|`group-du-sno-ranGen.yaml`
|Contains the RAN policies for single-node clusters only, including PTP fast events configuration.
|`group-du-standard-ranGen.yaml`
|Contains the RAN policies for standard three control-plane clusters, including PTP fast events configuration.
|====
.Prerequisites
* Create a Git repository where you manage your custom site configuration data.
* Extract the contents of the `/home/ztp` folder from the `quay.io/redhat_emp1/ztp-site-generator:latest` reference architecture container image, and review the changes.
.Procedure
@@ -44,16 +20,16 @@ The `PolicyGenTemplate` CRs required to deploy PTP fast events are described bel
----
#AMQ interconnect operator for fast events
- fileName: AmqSubscriptionNS.yaml
policyName: "amq-sub-policy"
policyName: "subscriptions-policy"
- fileName: AmqSubscriptionOperGroup.yaml
policyName: "amq-sub-policy"
policyName: "subscriptions-policy"
- fileName: AmqSubscription.yaml
policyName: "amq-sub-policy"
policyName: "subscriptions-policy"
----
. Apply the following `PolicyGenTemplate` changes to `group-du-3node-ranGen.yaml`, `group-du-sno-ranGen.yaml`, or `group-du-standard-ranGen.yaml` files according to your requirements:
.. In `.sourceFiles`, add the `PtpOperatorConfig` CR that configures the AMQ transport host to the `config-policy`:
.. In `.sourceFiles`, add the `PtpOperatorConfig` CR file that configures the AMQ transport host to the `config-policy`:
+
[source,yaml]
----
@@ -80,13 +56,21 @@ The `PolicyGenTemplate` CRs required to deploy PTP fast events are described bel
maxOffsetThreshold: 100 #nano secs
minOffsetThreshold: -100 #nano secs
----
<1> Can be one `PtpConfigMaster.yaml`, `PtpConfigSlave.yaml`, or `PtpConfigSlaveCvl.yaml` depending on your requirements. `PtpConfigSlaveCvl.yaml` configes `linuxptp` services for an Intel E810 Columbiaville NIC.
<1> Can be one `PtpConfigMaster.yaml`, `PtpConfigSlave.yaml`, or `PtpConfigSlaveCvl.yaml` depending on your requirements. `PtpConfigSlaveCvl.yaml` configures `linuxptp` services for an Intel E810 Columbiaville NIC. For configurations based on `group-du-sno-ranGen.yaml` or `group-du-3node-ranGen.yaml`, use `PtpConfigSlave.yaml`.
<2> Device specific interface name.
<3> You must append the `--summary_interval -4` value to `ptp4lOpts` in `.spec.sourceFiles.spec.profile` to enable PTP fast events.
<4> `ptpClockThreshold` configues how long the clock stays in clock holdover state. Holdover state is the period between local and master clock synchronizations. Offset is the time difference between the local and master clock.
. Apply the following `PolicyGenTemplate` changes to your specific site YAML files, for example, `example-sno-site.yaml`:
.. In `.sourceFiles`, add the `Interconnect` CR file that configures the AMQ router to the `config-policy`:
+
[source,yaml]
----
- fileName: AmqInstance.yaml
policyName: "config-policy"
----
. Merge any other required changes and files with your custom site repository.
. Push the changes to your site configuration repository to deploy PTP fast events to new sites using GitOps ZTP.
//. Optional: Use the Topology-Aware Lifecycle Operator to deploy PTP events to existing sites.

View File

@@ -0,0 +1,89 @@
// Module included in the following assemblies:
//
// scalability_and_performance/ztp-deploying-disconnected.adoc
:_module-type: PROCEDURE
[id="ztp-configuring-uefi-secure-boot_{context}"]
= Configuring UEFI secure boot for clusters using PolicyGenTemplate CRs
You can configure UEFI secure boot for vRAN clusters that are deployed using the
GitOps zero touch provisioning (ZTP) pipeline.
.Prerequisites
* Create a Git repository where you manage your custom site configuration data.
.Procedure
. Create the following `MachineConfig` resource and save it in the `uefi-secure-boot.yaml` file:
+
[source,yaml]
----
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: master
name: uefi-secure-boot
spec:
config:
ignition:
version: 3.1.0
kernelArguments:
- efi=runtime
----
. In your Git repository custom `/siteconfig` directory, create a `/sno-extra-manifest` folder and add the `uefi-secure-boot.yaml` file, for example:
+
[source,text]
----
siteconfig
├── site1-sno-du.yaml
├── site2-standard-du.yaml
└── sno-extra-manifest
└── uefi-secure-boot.yaml
----
. In your cluster `SiteConfig` CR, specify the required values for `extraManifestPath` and `bootMode`:
.. Enter the directory name in the `.spec.clusters.extraManifestPath` field, for example:
+
[source,yaml]
----
clusters:
- clusterName: "example-cluster"
extraManifestPath: sno-extra-manifest/
----
.. Set the value for `.spec.clusters.nodes.bootMode` to `UEFISecureBoot`, for example:
+
[source,yaml]
----
nodes:
- hostName: "ran.example.lab"
bootMode: "UEFISecureBoot"
----
. Deploy the cluster using the GitOps ZTP pipeline.
.Verification
. Open a remote shell to the deployed cluster, for example:
+
[source,terminal]
----
$ oc debug node/node-1.example.com
----
. Verify that the `SecureBoot` feature is enabled:
+
[source,terminal]
----
sh-4.4# mokutil --sb-state
----
+
.Example output
[source,terminal]
----
SecureBoot enabled
----

View File

@@ -0,0 +1,94 @@
// Module included in the following assemblies:
//
// * scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: PROCEDURE
[id="ztp-creating-a-validator-inform-policy_{context}"]
= Creating a validator inform policy
Use the following procedure to create a validator inform policy that provides an indication of
when the zero touch provisioning (ZTP) installation and configuration of the deployed cluster is complete. This policy
can be used for deployments of single node clusters, three-node clusters, and standard clusters.
.Procedure
. Create a stand-alone `PolicyGenTemplate` custom resource (CR) that contains the source file
`validatorCRs/informDuValidator.yaml`.
You only need one stand-alone `PolicyGenTemplate` CR for each cluster type.
+
.Single node clusters
+
[source,yaml]
----
group-du-sno-validator-ranGen.yaml
apiVersion: ran.openshift.io/v1
kind: PolicyGenTemplate
metadata:
name: "group-du-sno-validator" <1>
namespace: "ztp-group" <2>
spec:
bindingRules:
group-du-sno: "" <3>
bindingExcludedRules:
ztp-done: "" <4>
mcp: "master" <5>
sourceFiles:
- fileName: validatorCRs/informDuValidator.yaml
remediationAction: inform <6>
policyName: "du-policy" <7>
----
+
.Three-node clusters
+
[source,yaml]
----
group-du-3node-validator-ranGen.yaml
apiVersion: ran.openshift.io/v1
kind: PolicyGenTemplate
metadata:
name: "group-du-3node-validator" <1>
namespace: "ztp-group" <2>
spec:
bindingRules:
group-du-3node: "" <3>
bindingExcludedRules:
ztp-done: "" <4>
mcp: "master" <5>
sourceFiles:
- fileName: validatorCRs/informDuValidator.yaml
remediationAction: inform <6>
policyName: "du-policy" <7>
----
+
.Standard clusters
+
[source,yaml]
----
group-du-standard-validator-ranGen.yaml
apiVersion: ran.openshift.io/v1
kind: PolicyGenTemplate
metadata:
name: "group-du-standard-validator" <1>
namespace: "ztp-group" <2>
spec:
bindingRules:
group-du-standard: "" <3>
bindingExcludedRules:
ztp-done: "" <4>
mcp: "worker" <5>
sourceFiles:
- fileName: validatorCRs/informDuValidator.yaml
remediationAction: inform <6>
policyName: "du-policy" <7>
----
<1> The name of `PolicyGenTemplates` object. This name is also used as part of the names
for the `placementBinding`, `placementRule`, and `policy` that are created in the requested `namespace`.
<2> This value should match the `namespace` used in the group `PolicyGenTemplates`.
<3> The `group-du-*` label defined in `bindingRules` must exist in the `SiteConfig` files.
<4> The label defined in `bindingExcludedRules` must be`ztp-done:`. The `ztp-done` label is used in coordination with the {cgu-operator-full}.
<5> `mcp` defines the `MachineConfigPool` object that is used in the source file `validatorCRs/informDuValidator.yaml`. It should be `master` for single node and three-node cluster deployments and `worker` for standard cluster deployments.
<6> Optional. The default value is `inform`.
<7> This value is used as part of the name for the generated {rh-rhacm} policy.
The generated validator policy for the single node example is named `group-du-sno-validator-du-policy`.
. Push the files to the ZTP Git repository.

View File

@@ -0,0 +1,32 @@
// Module included in the following assemblies:
//
// *scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: CONCEPT
[id="ztp-creating-the-policygentemplate-cr_{context}"]
= Creating the PolicyGenTemplate CR
Use this procedure to create the `PolicyGenTemplate` custom resource (CR) for your site in your local clone of the Git repository.
.Procedure
. Choose an appropriate example from `out/argocd/example/policygentemplates`. This directory demonstrates a three-level policy framework that represents a well-supported low-latency profile tuned for the needs of 5G Telco DU deployments:
+
* A single `common-ranGen.yaml` file that should apply to all types of sites.
* A set of shared `group-du-*-ranGen.yaml` files, each of which should be common across a set of similar clusters.
* An example `example-*-site.yaml` that can be copied and updated for each individual site.
. Ensure that the labels defined in your `PolicyGenTemplate` `bindingRules` section correspond to the labels that are defined in the `SiteConfig` files of the clusters you are managing.
. Ensure that the content of the overlaid spec files matches your desired end state. As a reference, the `out/source-crs` directory contains the full list of `source-crs` available to be included and overlaid by your `PolicyGenTemplate` templates.
+
[NOTE]
====
Depending on the specific requirements of your clusters, you might need more than a single group policy per cluster type, especially considering that the example group policies each have a single `PerformancePolicy.yaml` file that can only be shared across a set of clusters if those clusters consist of identical hardware configurations.
====
. Define all the policy namespaces in a YAML file similar to the example `out/argocd/example/policygentemplates/ns.yaml` file.
. Add all the `PolicyGenTemplate` files and `ns.yaml` file to the `kustomization.yaml` file, similar to the example `out/argocd/example/policygentemplates/kustomization.yaml` file.
. Commit the `PolicyGenTemplate` CRs, `ns.yaml` file, and the associated `kustomization.yaml` file in the Git repository.

View File

@@ -1,19 +0,0 @@
// Module included in the following assemblies:
//
// *scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: PROCEDURE
[id="ztp-creating-the-policygentemplates_{context}"]
= Creating the PolicyGenTemplates
Use the following procedure to create the `PolicyGenTemplates` you will need for generating policies in your Git repository for the hub cluster.
.Procedure
. Create the `PolicyGenTemplates` and save them to the zero touch provisioning (ZTP) Git repository accessible from the hub cluster and defined as a source repository of the ArgoCD application.
. ArgoCD detects that the application is out of sync. Upon sync, either automatic or manual, ArgoCD applies the new `PolicyGenTemplate` to the hub cluster and launches the associated resource hooks. These hooks are responsible for generating the policy wrapped configuration CRs that apply to the spoke cluster and perform the following actions:
.. Create the {rh-rhacm-first} (ACM) policies according to the basic distributed unit (DU) profile and required customizations.
.. Apply the generated policies to the hub cluster.
The ZTP process creates policies that direct ACM to apply the desired configuration to the cluster nodes.

View File

@@ -10,8 +10,8 @@ Add the required secrets for the site to the hub cluster. These resources must b
.Procedure
. Create a secret for authenticating to the site Baseboard Management Controller (BMC). Ensure the secret name matches the name used in the `SiteConfig`.
In this example, the secret name is `test-sno-bmh-secret`:
. Create a secret for authenticating to the site Baseboard Management Controller
(BMC). Ensure that the secret name matches the name used in the `SiteConfig`. In this example, the secret name is `test-sno-bmh-secret`:
+
[source,yaml]
----
@@ -26,7 +26,9 @@ data:
type: Opaque
----
. Create the pull secret for the site. The pull secret must contain all credentials necessary for installing OpenShift and all add-on Operators. In this example, the secret name is `assisted-deployment-pull-secret`:
. Create the pull secret for the site. The pull secret must contain all credentials necessary
for installing OpenShift and all add-on Operators. In this example, the secret name is
`assisted-deployment-pull-secret`:
+
[source,yaml]
----
@@ -42,5 +44,6 @@ data:
[NOTE]
====
The secrets are referenced from the `SiteConfig` custom resource (CR) by name. The namespace must match the `SiteConfig` namespace.
The secrets are referenced from the `SiteConfig` custom resource (CR) by name. The namespace
must match the `SiteConfig` namespace.
====

View File

@@ -1,103 +0,0 @@
// Module included in the following assemblies:
//
// *scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: PROCEDURE
[id="ztp-creating-the-siteconfig-custom-resources_{context}"]
= Creating the SiteConfig custom resources
ArgoCD acts as the engine for the GitOps method of site deployment. After completing a site plan that contains the required custom resources for the site installation, a policy generator creates the manifests and applies them to the hub cluster.
.Procedure
. Create one or more `SiteConfig` custom resources, `site-config.yaml` files, that contains the site-plan data for the
clusters. For example:
+
[source,yaml]
----
apiVersion: ran.openshift.io/v1
kind: SiteConfig
metadata:
name: "test-sno"
namespace: "test-sno"
spec:
baseDomain: "clus2.t5g.lab.eng.bos.redhat.com"
pullSecretRef:
name: "assisted-deployment-pull-secret"
clusterImageSetNameRef: "openshift-4.11"
sshPublicKey: "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDB3dwhI5X0ZxGBb9VK7wclcPHLc8n7WAyKjTNInFjYNP9J+Zoc/ii+l3YbGUTuqilDwZN5rVIwBux2nUyVXDfaM5kPd9kACmxWtfEWTyVRootbrNWwRfKuC2h6cOd1IlcRBM1q6IzJ4d7+JVoltAxsabqLoCbK3svxaZoKAaK7jdGG030yvJzZaNM4PiTy39VQXXkCiMDmicxEBwZx1UsA8yWQsiOQ5brod9KQRXWAAST779gbvtgXR2L+MnVNROEHf1nEjZJwjwaHxoDQYHYKERxKRHlWFtmy5dNT6BbvOpJ2e5osDFPMEd41d2mUJTfxXiC1nvyjk9Irf8YJYnqJgBIxi0IxEllUKH7mTdKykHiPrDH5D2pRlp+Donl4n+sw6qoDc/3571O93+RQ6kUSAgAsvWiXrEfB/7kGgAa/BD5FeipkFrbSEpKPVu+gue1AQeJcz9BuLqdyPUQj2VUySkSg0FuGbG7fxkKeF1h3Sga7nuDOzRxck4I/8Z7FxMF/e8DmaBpgHAUIfxXnRqAImY9TyAZUEMT5ZPSvBRZNNmLbfex1n3NLcov/GEpQOqEYcjG5y57gJ60/av4oqjcVmgtaSOOAS0kZ3y9YDhjsaOcpmRYYijJn8URAH7NrW8EZsvAoF6GUt6xHq5T258c6xSYUm5L0iKvBqrOW9EjbLw== root@cnfdc2.clus2.t5g.lab.eng.bos.redhat.com"
clusters:
- clusterName: "test-sno"
clusterType: "sno"
clusterProfile: "du"
clusterLabels:
group-du-sno: ""
common: true
sites : "test-sno"
clusterNetwork:
- cidr: 1001:db9::/48
hostPrefix: 64
machineNetwork:
- cidr: 2620:52:0:10e7::/64
serviceNetwork:
- 1001:db7::/112
additionalNTPSources:
- 2620:52:0:1310::1f6
nodes:
- hostName: "test-sno.clus2.t5g.lab.eng.bos.redhat.com"
bmcAddress: "idrac-virtualmedia+https://[2620:52::10e7:f602:70ff:fee4:f4e2]/redfish/v1/Systems/System.Embedded.1"
bmcCredentialsName:
name: "test-sno-bmh-secret"
bmcDisableCertificateVerification: true <1>
bootMACAddress: "0C:42:A1:8A:74:EC"
bootMode: "UEFI"
rootDeviceHints:
hctl: '0:1:0'
cpuset: "0-1,52-53"
nodeNetwork:
interfaces:
- name: eno1
macAddress: "0C:42:A1:8A:74:EC"
config:
interfaces:
- name: eno1
type: ethernet
state: up
macAddress: "0C:42:A1:8A:74:EC"
ipv4:
enabled: false
ipv6:
enabled: true
address:
- ip: 2620:52::10e7:e42:a1ff:fe8a:900
prefix-length: 64
dns-resolver:
config:
search:
- clus2.t5g.lab.eng.bos.redhat.com
server:
- 2620:52:0:1310::1f6
routes:
config:
- destination: ::/0
next-hop-interface: eno1
next-hop-address: 2620:52:0:10e7::fc
table-id: 254
----
<1> If you are using `UEFI SecureBoot`, add this line to prevent failures due to invalid or local certificates.
. Save the files and push them to the zero touch provisioning (ZTP) Git repository accessible from the hub cluster and defined as a source repository of the ArgoCD application.
ArgoCD detects that the application is out of sync. Upon sync, either automatic or manual, ArgoCD synchronizes the `PolicyGenTemplate` to the hub cluster and launches the associated resource hooks. These hooks are responsible for generating the policy wrapped configuration CRs that apply to the spoke cluster. The resource hooks convert the site definitions to installation custom resources and applies them to the hub cluster:
* `Namespace` - Unique per site
* `AgentClusterInstall`
* `BareMetalHost`
* `ClusterDeployment`
* `InfraEnv`
* `NMStateConfig`
* `ExtraManifestsConfigMap` - Extra manifests. The additional manifests include workload partitioning, chronyd, mountpoint hiding, sctp enablement, and more.
* `ManagedCluster`
* `KlusterletAddonConfig`
{rh-rhacm-first} (ACM) deploys the hub cluster.

View File

@@ -6,8 +6,8 @@
[id="ztp-creating-ztp-custom-resources-for-multiple-managed-clusters_{context}"]
= Creating ZTP custom resources for multiple managed clusters
If you are installing multiple managed clusters, zero touch provisioning (ZTP) uses ArgoCD and `SiteConfig` to manage the processes that create the custom resources (CR) and generate and apply the policies for multiple clusters, in batches of no more than 100, using the GitOps approach.
If you are installing multiple managed clusters, zero touch provisioning (ZTP) uses ArgoCD and `SiteConfig` files to manage the processes that create the CRs and generate and apply the policies for multiple clusters, in batches of no more than 100, using the GitOps approach.
Installing and deploying the clusters is a two stage process, as shown here:
image::183_OpenShift_ZTP_0921.png[GitOps approach for Installing and deploying the clusters]
image::217_OpenShift_Zero_Touch_Provisioning_updates_0222_2.png[GitOps approach for Installing and deploying the clusters]

View File

@@ -0,0 +1,44 @@
// Module included in the following assemblies:
//
// * scalability_and_performance/ztp-deploying-disconnected.adoc
:_module-type: PROCEDURE
[id="ztp-customizing-the-install-extra-manifests_{context}"]
= Customizing extra installation manifests in the ZTP GitOps pipeline
You can define a set of extra manifests for inclusion in the installation phase of the zero touch provisioning (ZTP) GitOps pipeline. These manifests are linked to the `SiteConfig` custom resources (CRs) and are applied to the cluster during installation. Including `MachineConfig` CRs at install time makes the installation process more efficient.
.Prerequisites
* Create a Git repository where you manage your custom site configuration data. The repository must be accessible from the hub cluster and be defined as a source repository for the Argo CD application.
.Procedure
. Create a set of extra manifest CRs that the ZTP pipeline uses to customize the cluster installs.
. In your custom `/siteconfig` directory, create an `/extra-manifest` folder for your extra manifests. The following example illustrates a sample `/siteconfig` with `/extra-manifest` folder:
+
[source,text]
----
siteconfig
├── site1-sno-du.yaml
├── site2-standard-du.yaml
└── extra-manifest
└── 01-example-machine-config.yaml
----
. Add your custom extra manifest CRs to the `siteconfig/extra-manifest` directory.
. In your `SiteConfig` CR, enter the directory name in the `extraManifestPath` field, for example:
+
[source,yaml]
----
clusters:
- clusterName: "example-sno"
networkType: "OVNKubernetes"
extraManifestPath: extra-manifest
----
. Save the `SiteConfig` CRs and `/extra-manifest` CRs and push them to the site configuration repo.
The ZTP pipeline appends the CRs in the `/extra-manifest` directory to the default set of extra manifests during cluster provisioning.

View File

@@ -0,0 +1,33 @@
// Module included in the following assemblies:
//
// * scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: CONCEPT
[id="ztp-definition-of-done-for-ztp-installations_{context}"]
= Indication of done for ZTP installations
Zero touch provisioning (ZTP) simplifies the process of checking the ZTP installation status for a cluster. The ZTP status moves through three phases: cluster installation, cluster configuration, and ZTP done.
Cluster installation phase::
The cluster installation phase is shown by the `ManagedCluster` CR `ManagedClusterJoined` condition. If the `ManagedCluster` CR does not have this condition, or the condition is set to `False`, the cluster is still in the installation phase. Additional details about installation are available from the `AgentClusterInstall` and `ClusterDeployment` CRs. For more information, see "Troubleshooting GitOps ZTP".
Cluster configuration phase::
The cluster configuration phase is shown by a `ztp-running` label applied the the `ManagedCluster` CR for the cluster.
ZTP done::
Cluster installation and configuration is complete in the ZTP done phase. This is shown by the removal of the `ztp-running` label and addition of the `ztp-done` label to the `ManagedCluster` CR. The `ztp-done` label shows that the configuration has been applied and the baseline DU configuration has completed cluster tuning.
+
The transition to the ZTP done state is conditional on the compliant state of a {rh-rhacm-first} static validator inform policy. This policy captures the existing criteria for a completed installation and validates that it moves to a compliant state only when ZTP provisioning of the spoke cluster is complete.
+
The validator inform policy ensures the configuration of the distributed unit (DU) cluster is fully applied and
Operators have completed their initialization. The policy validates the following:
+
* The target `MachineConfigPool` contains the expected entries and has finished
updating. All nodes are available and not degraded.
* The SR-IOV Operator has completed initialization as indicated by at least one `SriovNetworkNodeState` with `syncStatus: Succeeded`.
* The PTP Operator daemon set exists.
+
The policy captures the existing criteria for a completed installation and validates that it moves
to a compliant state only when ZTP provisioning of the spoke cluster is complete.
+
The validator inform policy is included in the reference group `PolicyGenTemplate` CRs. For reliable indication of the ZTP done state, this validator inform policy must be included in the ZTP pipeline.

View File

@@ -0,0 +1,187 @@
// Module included in the following assemblies:
//
// *scalability_and_performance/ztp-support-for-deployment-of-multi-node-clusters.adoc
:_content-type: PROCEDURE
[id="ztp-deploying-a-site_{context}"]
= Deploying a site
Use the following procedure to prepare the hub cluster for site deployment and initiate zero touch provisioning (ZTP) by pushing custom resources (CRs) to your Git repository.
.Procedure
. Create the required secrets for the site. These resources must be in a namespace with a name matching the cluster name. In `out/argocd/example/siteconfig/example-sno.yaml`, the cluster name and namespace is `example-sno`.
+
Create the namespace for the cluster using the following commands:
+
[source,terminal]
----
$ export CLUSTERNS=example-sno
----
+
[source,terminal]
----
$ oc create namespace $CLUSTERNS
----
. Create a pull secret for the cluster. The pull secret must contain all the credentials necessary for installing {product-title} and all required Operators. In all of the example `SiteConfig` CRs, the pull secret is named `assisted-deployment-pull-secret`, as shown below:
+
[source,terminal]
----
$ oc apply -f - <<EOF
apiVersion: v1
kind: Secret
metadata:
name: assisted-deployment-pull-secret
namespace: $CLUSTERNS
type: kubernetes.io/dockerconfigjson
data:
.dockerconfigjson: $(base64 <pull-secret.json)
EOF
----
. Create a BMC authentication secret for each host you are deploying:
+
[source,yaml]
----
$ oc apply -f - <<EOF
apiVersion: v1
kind: Secret
metadata:
name: $(read -p 'Hostname: ' tmp; printf $tmp)-bmc-secret
namespace: $CLUSTERNS
type: Opaque
data:
username: $(read -p 'Username: ' tmp; printf $tmp | base64)
password: $(read -s -p 'Password: ' tmp; printf $tmp | base64)
EOF
----
+
[NOTE]
====
The secrets are referenced from the `SiteConfig` custom resource (CR) by name. The namespace
must match the `SiteConfig` namespace.
====
. Create a `SiteConfig` CR for your cluster in your local clone of the Git repository:
.. Choose the appropriate example for your CR from the `out/argocd/example/siteconfig/` folder.
The folder includes example files for single node, three-node, and standard clusters:
+
* `example-sno.yaml`
* `example-3node.yaml`
* `example-standard.yaml`
.. Change the cluster and host details in the example file to match the type of cluster you want. The following file is a composite of the three files that explains the configuration of each cluster type:
+
[source,yaml]
----
# example-node1-bmh-secret & assisted-deployment-pull-secret need to be created under same namespace example-sno
---
apiVersion: ran.openshift.io/v1
kind: SiteConfig
metadata:
name: "example-sno"
namespace: "example-sno"
spec:
baseDomain: "example.com"
pullSecretRef:
name: "assisted-deployment-pull-secret"
clusterImageSetNameRef: "openshift-4.10" <1>
sshPublicKey: "ssh-rsa AAAA..."
clusters:
- clusterName: "example-sno"
networkType: "OVNKubernetes"
clusterLabels: <2>
# These example cluster labels correspond to the bindingRules in the PolicyGenTemplate examples in ../policygentemplates:
# ../policygentemplates/common-ranGen.yaml will apply to all clusters with 'common: true'
common: true
# ../policygentemplates/group-du-sno-ranGen.yaml will apply to all clusters with 'group-du-sno: ""'
group-du-sno: ""
# ../policygentemplates/example-sno-site.yaml will apply to all clusters with 'sites: "example-sno"'
# Normally this should match or contain the cluster name so it only applies to a single cluster
sites : "example-sno"
clusterNetwork:
- cidr: 1001:1::/48
hostPrefix: 64
machineNetwork: <3>
- cidr: 1111:2222:3333:4444::/64
# For 3-node and standard clusters with static IPs, the API and Ingress IPs must be configured here
apiVIP: 1111:2222:3333:4444::1:1 <4>
ingressVIP: 1111:2222:3333:4444::1:2 <5>
serviceNetwork:
- 1001:2::/112
additionalNTPSources:
- 1111:2222:3333:4444::2
nodes:
- hostName: "example-node1.example.com" <6>
role: "master"
bmcAddress: "idrac-virtualmedia+https://[1111:2222:3333:4444::bbbb:1]/redfish/v1/Systems/System.Embedded.1" <7>
bmcCredentialsName:
name: "example-node1-bmh-secret" <8>
bootMACAddress: "AA:BB:CC:DD:EE:11"
bootMode: "UEFI"
rootDeviceHints:
hctl: '0:1:0'
cpuset: "0-1,52-53"
nodeNetwork: <9>
interfaces:
- name: eno1
macAddress: "AA:BB:CC:DD:EE:11"
config:
interfaces:
- name: eno1
type: ethernet
state: up
macAddress: "AA:BB:CC:DD:EE:11"
ipv4:
enabled: false
ipv6:
enabled: true
address:
- ip: 1111:2222:3333:4444::1:1
prefix-length: 64
dns-resolver:
config:
search:
- example.com
server:
- 1111:2222:3333:4444::2
routes:
config:
- destination: ::/0
next-hop-interface: eno1
next-hop-address: 1111:2222:3333:4444::1
table-id: 254
----
<1> Applies to all cluster types. The value must match an image set available on the hub cluster. To see the list of supported versions on your hub, run `oc get clusterimagesets`.
<2> Applies to all cluster types. These values must correspond to the `PolicyGenTemplate` labels that you define in a later step.
<3> Applies to single node clusters. The value defines the cluster network sections for a single node deployment.
<4> Applies to three-node and standard clusters. The value defines the cluster network sections.
<5> Applies to three-node and standard clusters. The value defines the cluster network sections.
<6> Applies to all cluster types. For single node deployments, define one host. For three-node deployments, define three hosts. For standard deployments, define three hosts with `role: master` and two or more hosts defined with `role: worker`.
<7> Applies to all cluster types. Specifies the BMC address.
<8> Applies to all cluster types. Specifies the BMC credentials.
<9> Applies to all cluster types. Specifies the network settings for the node.
.. You can inspect the default set of extra-manifest `MachineConfig` CRs in `out/argocd/extra-manifest`. It is automatically applied to the cluster when it is installed.
+
Optional: To provision additional install-time manifests on the provisioned cluster, create a directory in your Git repository, for example, `sno-extra-manifest/`, and add your custom manifest CRs to this directory. If your `SiteConfig.yaml` refers to this directory in the `extraManifestPath` field, any CRs in this referenced directory are appended to the default set of extra manifests.
. Add the `SiteConfig` CR to the `kustomization.yaml` file in the `generators` section, similar to the example shown in `out/argocd/example/siteconfig/kustomization.yaml`.
. Commit your `SiteConfig` CR and associated `kustomization.yaml` in your Git repository.
. Push your changes to the Git repository. The ArgoCD pipeline detects the changes and begins the site deployment. You can push the changes to the `SiteConfig` CR and the `PolicyGenTemplate` CR simultaneously.
+
The `SiteConfig` CR creates the following CRs on the hub cluster:
+
* `Namespace` - Unique per site
* `AgentClusterInstall`
* `BareMetalHost` - One per node
* `ClusterDeployment`
* `InfraEnv`
* `NMStateConfig` - One per node
* `ExtraManifestsConfigMap` - Extra manifests. The additional manifests include workload partitioning, chronyd, mountpoint hiding, sctp enablement, and more.
* `ManagedCluster`
* `KlusterletAddonConfig`

View File

@@ -0,0 +1,32 @@
// Module included in the following assemblies:
//
// * scalability_and_performance/ztp-deploying-disconnected.adoc
:_module-type: CONCEPT
[id="ztp-deploying-additional-changes-to-clusters_{context}"]
= Deploying additional changes to clusters
Custom resources (CRs) that are deployed through the GitOps zero touch provisioning (ZTP) pipeline support two goals:
. Deploying additional Operators to spoke clusters that are required by typical RAN DU applications running at the network far-edge.
. Customizing the {product-title} installation to provide a high performance platform capable of meeting the strict timing requirements in a minimal CPU budget.
If you require cluster configuration changes outside of the base GitOps ZTP pipeline configuration, there are three options:
Apply the additional configuration after the ZTP pipeline is complete::
When the GitOps ZTP pipeline deployment is complete, the deployed cluster is ready for application workloads. At this point, you can install additional Operators and apply configurations specific to your requirements. Ensure that additional configurations do not negatively affect the performance of the platform or allocated CPU budget.
Add content to the ZTP library::
The base source CRs that you deploy with the GitOps ZTP pipeline can be augmented with custom content as required.
Create extra manifests for the cluster installation::
Extra manifests are applied during installation and makes the installation process more efficient.
[IMPORTANT]
====
Providing additional source CRs or modifying existing source CRs can significantly impact the performance or CPU profile of {product-title}.
====

View File

@@ -1,21 +0,0 @@
// Module included in the following assemblies:
//
// scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: CONCEPT
[id="ztp-disconnected-environment-prereqs_{context}"]
= Disconnected environment prerequisites
You must have a container image registry that supports link:https://docs.docker.com/registry/spec/manifest-v2-2/[Docker v2-2] in the location that will host the {product-title} cluster, such as one of the following registries:
* link:https://www.redhat.com/en/technologies/cloud-computing/quay[Red Hat Quay]
* link:https://jfrog.com/artifactory/[JFrog Artifactory]
* link:https://www.sonatype.com/products/repository-oss?topnav=true[Sonatype Nexus Repository]
* link:https://goharbor.io/[Harbor]
If you have an entitlement to Red Hat Quay, see the documentation on deploying Red Hat Quay link:https://access.redhat.com/documentation/en-us/red_hat_quay/3.5/html/deploy_red_hat_quay_for_proof-of-concept_non-production_purposes/[for proof-of-concept purposes] or link:https://access.redhat.com/documentation/en-us/red_hat_quay/3.5/html/deploy_red_hat_quay_on_openshift_with_the_quay_operator/[by using the Quay Operator]. If you need additional assistance selecting and installing a registry, contact your sales representative or Red Hat support.
[NOTE]
====
Red Hat does not test third party registries with {product-title}.
====

View File

@@ -9,11 +9,6 @@
Distributed unit (DU) hosts require the BIOS to be configured before the host can be provisioned. The BIOS configuration is dependent on the specific hardware that runs your DUs and the particular requirements of your installation.
[IMPORTANT]
====
In this Developer Preview release, configuration and tuning of BIOS for DU bare-metal host machines is the responsibility of the customer. Automatic setting of BIOS is not handled by the zero touch provisioning workflow.
====
.Procedure
. Set the *UEFI/BIOS Boot Mode* to `UEFI`.

View File

@@ -0,0 +1,28 @@
// Module included in the following assemblies:
//
// scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: CONCEPT
[id="ztp-how-to-plan-your-ran-policies_{context}"]
= How to plan your RAN policies
Zero touch provisioning (ZTP) uses {rh-rhacm-first} to apply the radio access network (RAN) configuration using a policy-based governance approach to apply the configuration.
The policy generator or `PolicyGen` is a part of the GitOps ZTP tooling that facilitates creating {rh-rhacm} policies from a set of predefined custom resources. There are three main items: policy categorization, source CR policy, and the `PolicyGenTemplate` CR. `PolicyGen` uses these to generate the policies and their placement bindings and rules.
The following diagram shows how the RAN policy generator interacts with GitOps and {rh-rhacm}.
image::217_OpenShift_Zero_Touch_Provisioning_updates_0222_3.png[RAN policy generator]
RAN policies are categorized into three main groups:
Common:: A policy that exists in the `Common` category is applied to all clusters to be represented by the site plan. Cluster types include single node, three-node, and standard clusters.
Groups:: A policy that exists in the `Groups` category is applied to a group of clusters. Every group of clusters could have their own policies that exist under the
`Groups` category. For example, `Groups/group1` can have its own policies that are applied to the clusters belonging to `group1`.
You can also define a group for each cluster type: single node, three-node, and standard clusters.
Sites:: A policy that exists in the `Sites` category is applied to a specific cluster. Any cluster
could have its own policies that exist in the `Sites` category.
For example, `Sites/cluster1` has its own policies applied to `cluster1`.
You can also define an example site-specific configuration for each cluster type: single node, three-node, and standard clusters.

View File

@@ -1,9 +0,0 @@
// Module included in the following assemblies:
//
// scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: CONCEPT
[id="ztp-installing-preparing-mirror_{context}"]
= Preparing your mirror host
Before you perform the mirror procedure, you must prepare the host to retrieve content and push it to the remote location.

View File

@@ -0,0 +1,18 @@
// Module included in the following assemblies:
//
// *scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: PROCEDURE
[id="ztp-installing-the-new-gitops-ztp-applications_{context}"]
= Installing the new GitOps ZTP applications
Using the extracted `argocd/deployment` directory, and after ensuring that the applications point to your Git repository, apply the full contents of the deployment directory. Applying the full contents of the directory ensures that all necessary resources for the applications are correctly configured.
.Procedure
* Apply the contents of the `argocd/deployment` directory using the following command:
+
[source,terminal]
----
$ oc apply -k out/argocd/deployment
----

View File

@@ -0,0 +1,25 @@
// Module included in the following assemblies:
//
// *scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: PROCEDURE
[id="ztp-labeling-the-existing-clusters_{context}"]
= Labeling the existing clusters
To ensure that existing clusters remain untouched by the tooling updates, all existing managed clusters must be labeled with the `ztp-done` label.
.Procedure
. Find a label selector that lists the managed clusters that were deployed with zero touch provisioning (ZTP), such as `local-cluster!=true`:
+
[source,terminal]
----
$ oc get managedcluster -l 'local-cluster!=true'
----
. Ensure that the resulting list contains all the managed clusters that were deployed with ZTP, and then use that selector to add the `ztp-done` label:
+
[source,terminal]
----
$ oc label managedcluster -l 'local-cluster!=true' ztp-done=
----

View File

@@ -1,6 +1,6 @@
// Module included in the following assemblies:
//
// *scalability_and_performance/ztp-deploying-disconnected.adoc
// * scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: CONCEPT
[id="ztp-low-latency-for-distributed-units-dus_{context}"]
@@ -9,8 +9,7 @@
Low latency is an integral part of the development of 5G networks. Telecommunications networks require as little signal delay as possible to ensure quality of service in a variety of critical use cases.
Low latency processing is essential for any communication with timing constraints that affect functionality and
security. For example, 5G Telco applications require a guaranteed one millisecond one-way latency to meet Internet of Things (IoT) requirements. Low latency is also critical for the future development of autonomous vehicles, smart factories, and online gaming. Networks in these environments require almost a real-time flow of data.
Low latency processing is essential for any communication with timing constraints that affect functionality and security. For example, 5G Telco applications require a guaranteed one millisecond one-way latency to meet Internet of Things (IoT) requirements. Low latency is also critical for the future development of autonomous vehicles, smart factories, and online gaming. Networks in these environments require almost a real-time flow of data.
Low latency systems are about guarantees with regards to response and processing times. This includes keeping a communication protocol running smoothly, ensuring device security with fast responses to error conditions, or just making sure a system is not lagging behind when receiving a lot of data. Low latency is key for optimal synchronization of radio transmissions.

View File

@@ -1,11 +0,0 @@
// Module included in the following assemblies:
//
// scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: CONCEPT
[id="ztp-machine-config-operator_{context}"]
= Machine Config Operator
The Machine Config Operator enables system definitions and low-level system settings such as workload partitioning, NTP, and SCTP. This Operator is installed with {product-title}.
A performance profile and its created products are applied to a node according to an associated machine config pool (MCP). The MCP holds valuable information about the progress of applying the machine configurations created by performance addons that encompass kernel args, kube config, huge pages allocation, and deployment of the realtime kernel (rt-kernel). The performance addons controller monitors changes in the MCP and updates the performance profile status accordingly.

View File

@@ -3,21 +3,19 @@
// *scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: PROCEDURE
[id="ztp-creating-siteconfig-custom-resources_{context}"]
= Creating custom resources to install a single managed cluster
[id="ztp-manually-install-a-single-managed-cluster_{context}"]
= Manually install a single managed cluster
This procedure tells you how to manually create and deploy a single managed cluster. If you are creating multiple clusters, perhaps hundreds, use the `SiteConfig` method described in
“Creating ZTP custom resources for multiple managed clusters”.
This procedure tells you how to manually create and deploy a single managed cluster. If you are creating multiple clusters, perhaps hundreds, use the `SiteConfig` method described in “Creating ZTP custom resources for multiple managed clusters”.
.Prerequisites
* Enable Assisted Installer Service.
* Enable the Assisted Installer service.
* Ensure network connectivity:
** The container within the hub must be able to reach the Baseboard Management Controller (BMC) address of the target bare-metal host.
** The managed cluster must be able to resolve and reach the hubs API `hostname` and `{asterisk}.app` hostname.
Example of the hubs API and `{asterisk}.app` hostname:
** The managed cluster must be able to resolve and reach the hubs API `hostname` and `{asterisk}.app` hostname. Here is an example of the hubs API and `{asterisk}.app` hostname:
+
[source,terminal]
----
@@ -25,8 +23,7 @@ console-openshift-console.apps.hub-cluster.internal.domain.com
api.hub-cluster.internal.domain.com
----
** The hub must be able to resolve and reach the API and `{asterisk}.app` hostname of the managed cluster.
Here is an example of the managed clusters API and `{asterisk}.app` hostname:
** The hub must be able to resolve and reach the API and `{asterisk}.app` hostname of the managed cluster. Here is an example of the managed clusters API and `{asterisk}.app` hostname:
+
[source,terminal]
----
@@ -34,19 +31,19 @@ console-openshift-console.apps.sno-managed-cluster-1.internal.domain.com
api.sno-managed-cluster-1.internal.domain.com
----
** A DNS Server that is IP reachable from the target bare-metal host.
** A DNS server that is IP reachable from the target bare-metal host.
* A target bare-metal host for the managed cluster with the following hardware minimums:
** 4 CPU or 8 vCPU
** 32 GiB RAM
** 120 GiB Disk for root filesystem
** 120 GiB disk for root file system
* When working in a disconnected environment, the release image needs to be mirrored. Use this command to mirror the release image:
* When working in a disconnected environment, the release image must be mirrored. Use this command to mirror the release image:
+
[source,terminal]
----
oc adm release mirror -a <pull_secret.json>
$ oc adm release mirror -a <pull_secret.json>
--from=quay.io/openshift-release-dev/ocp-release:{{ mirror_version_spoke_release }}
--to={{ provisioner_cluster_registry }}/ocp4 --to-release-image={{
provisioner_cluster_registry }}/ocp4:{{ mirror_version_spoke_release }}
@@ -54,7 +51,8 @@ provisioner_cluster_registry }}/ocp4:{{ mirror_version_spoke_release }}
* You mirrored the ISO and `rootfs` used to generate the spoke cluster ISO to an HTTP server and configured the settings to pull images from there.
+
The images must match the version of the `ClusterImageSet`. To deploy a 4.11.0 version, the `rootfs` and ISO need to be set at 4.11.0.
The images must match the version of the `ClusterImageSet`. To deploy a 4.9.0 version, the `rootfs` and
ISO must be set at 4.9.0.
.Procedure
@@ -66,9 +64,9 @@ The images must match the version of the `ClusterImageSet`. To deploy a 4.11.0 v
apiVersion: hive.openshift.io/v1
kind: ClusterImageSet
metadata:
name: openshift-4.11.0-rc.0 <1>
name: openshift-4.9.0-rc.0 <1>
spec:
releaseImage: quay.io/openshift-release-dev/ocp-release:4.11.0-x86_64 <2>
releaseImage: quay.io/openshift-release-dev/ocp-release:4.9.0-x86_64 <2>
----
<1> The descriptive version that you want to deploy.
<2> Specifies the `releaseImage` to deploy and determines the OS Image version. The discovery ISO is based on an OS image version as the `releaseImage`, or latest if the exact version is unavailable.
@@ -149,15 +147,15 @@ spec:
sshPublicKey: <public_key> <5>
----
+
<1> The name of the ClusterImageSet custom resource used to install {product-title} on the bare-metal host.
<1> The name of the `ClusterImageSet` custom resource used to install {product-title} on the bare-metal host.
<2> A block of IPv4 or IPv6 addresses in CIDR notation used for communication among cluster nodes.
<3> A block of IPv4 or IPv6 addresses in CIDR notation used for the target bare-metal host external communication. Also used to determine the API and Ingress VIP addresses when provisioning DU single-node clusters.
<4> A block of IPv4 or IPv6 addresses in CIDR notation used for cluster services internal communication.
<5> Entered as plain text. You can use the public key to SSH into the node after it has finished installing.
<5> A plain text string. You can use the public key to SSH into the node after it has finished installing.
+
[NOTE]
====
If you want to configure a static IP for the managed cluster at this point, see the procedure in this document for configuring static IP addresses for managed clusters.
If you want to configure a static IP address for the managed cluster at this point, see the procedure in this document for configuring static IP addresses for managed clusters.
====
@@ -216,7 +214,7 @@ spec:
enabled: false <1>
----
+
<1> Set to `true` to enable KlusterletAddonConfig or `false` to disable the KlusterletAddonConfig. Keep `searchCollector` disabled.
<1> Keep `searchCollector` disabled. Set to `true` to enable the `KlusterletAddonConfig` CR or `false` to disable the `KlusterletAddonConfig` CR.
. Create the `ManagedCluster` custom resource:
+
@@ -244,13 +242,13 @@ spec:
name: <cluster_name>
namespace: <cluster_name>
sshAuthorizedKey: <public_key> <1>
agentLabels: <2>
location: "<label-name>"
agentLabelSelector:
matchLabels:
cluster-name: <cluster_name>
pullSecretRef:
name: assisted-deployment-pull-secret
----
<1> Entered as plain text. You can use the public key to SSH into the target bare-metal host when it boots from the ISO.
<2> Sets a label to match. The labels apply when the agents boot.
. Create the `BareMetalHost` custom resource:
+
@@ -282,6 +280,6 @@ Optionally, you can add `bmac.agent-install.openshift.io/hostname: <host-name>`
. After you have created the custom resources, push the entire directory of generated custom resources to the Git repository you created for storing the custom resources.
.Next step
.Next steps
To provision additional clusters, repeat this procedure for each cluster.

View File

@@ -0,0 +1,55 @@
// Module included in the following assemblies:
//
// *scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: PROCEDURE
[id="ztp-monitoring-deployment-progress_{context}"]
= Monitoring deployment progress
The ArgoCD pipeline uses the `SiteConfig` and `PolicyGenTemplate` CRs in Git to generate the cluster configuration CRs and {rh-rhacm} policies and then sync them to the hub. You can monitor the progress of this synchronization can be monitored in the ArgoCD dashboard.
.Procedure
When the synchronization is complete, the installation generally proceeds as follows:
. The Assisted Service Operator installs {product-title} on the cluster. You can monitor the progress of cluster installation from the {rh-rhacm} dashboard or from the command line:
+
[source,terminal]
----
$ export CLUSTER=<clusterName>
----
+
[source,terminal]
----
$ oc get agentclusterinstall -n $CLUSTER $CLUSTER -o jsonpath='{.status.conditions[?(@.type=="Completed")]}' | jq
----
+
[source,terminal]
----
$ curl -sk $(oc get agentclusterinstall -n $CLUSTER $CLUSTER -o jsonpath='{.status.debugInfo.eventsURL}') | jq '.[-2,-1]'
----
. The {cgu-operator-first} applies the configuration policies that are bound to the cluster.
+
After the cluster installation is complete and the cluster becomes `Ready`, a `ClusterGroupUpgrade` CR corresponding to this cluster, with a list of ordered policies defined by the `ran.openshift.io/ztp-deploy-wave annotations`, is automatically created by the {cgu-operator}. The cluster's policies are applied in the order listed in `ClusterGroupUpgrade` CR. You can monitor the high-level progress of configuration policy reconciliation using the following commands:
+
[source,terminal]
----
$ export CLUSTER=<clusterName>
----
+
[source,terminal]
----
$ oc get clustergroupupgrades -n ztp-install $CLUSTER -o jsonpath='{.status.conditions[?(@.type=="Ready")]}'
----
. You can monitor the detailed policy compliant status using the {rh-rhacm} dashboard or the command line:
+
[source,terminal]
----
$ oc get policies -n $CLUSTER
----
The final policy that becomes compliant is the one defined in the `*-du-validator-policy` policies. This policy, when compliant on a cluster, ensures that all cluster configuration, Operator installation, and Operator configuration is complete.
After all policies become complaint, the `ztp-done` label is added to the cluster, indicating the entire ZTP pipeline is complete for the cluster.

View File

@@ -0,0 +1,17 @@
// Module included in the following assemblies:
//
// scalability_and_performance/ztp-deploying-disconnected.adoc
:_module-type: CONCEPT
[id="ztp-pgt-config-best-practices_{context}"]
= Best practices when customizing PolicyGenTemplate CRs
Consider the following best practices when customizing site configuration `PolicyGenTemplate` CRs:
* Use as few policies as necessary. Using fewer policies means using less resources. Each additional policy creates overhead for the hub cluster and the deployed spoke cluster. CRs are combined into policies based on the `policyName` field in the `PolicyGenTemplate` CR. CRs in the same `PolicyGenTemplate` which have the same value for `policyName` are managed under a single policy.
* Use a single catalog source for all Operators. In disconnected environments, configure the registry as a single index containing all Operators. Each additional `CatalogSource` on the spoke clusters increases CPU usage.
* `MachineConfig` CRs should be included as `extraManifests` in the `SiteConfig` CR so that they are applied during installation. This can reduce the overall time taken until the cluster is ready to deploy applications.
* `PolicyGenTemplates` should override the channel field to explicitly identify the desired version. This ensures that changes in the source CR during upgrades does not update the generated subscription.

View File

@@ -0,0 +1,34 @@
// Module included in the following assemblies:
//
// scalability_and_performance/ztp-deploying-disconnected.adoc
:_module-type: CONCEPT
[id="ztp-policygentemplates-for-ran_{context}"]
= PolicyGenTemplate CRs for RAN deployments
You use `PolicyGenTemplate` custom resources (CRs) to customize the configuration applied to the cluster using the GitOps zero touoch provisioning (ZTP) pipeline. The baseline configuration, obtained from the GitOps ZTP container, is designed to provide a set of critical features and node tuning settings that ensure the cluster can support the stringent performance and resource utilization constraints typical of RAN Distributed Unit (DU) applications. Changes or omissions from the baseline configuration can affect feature availability, performance, and resource utilization. Use `PolicyGenTemplate` CRs as the basis to create a hierarchy of configuration files tailored to your specific site requirements.
The baseline `PolicyGenTemplate` CRs that are defined for RAN DU cluster configuration can be extracted from the GitOps ZTP `ztp-site-generator`. See "Preparing the ZTP Git repository" for further details.
The `PolicyGenTemplate` CRs can be found in the `./out/argocd/example/policygentemplates` folder. The reference architecture has common, group, and site-specific configuration CRs. Each `PolicyGenTemplate` CR refers to other CRs that can be found in the `./out/source-crs` folder.
The `PolicyGenTemplate` CRs relevant to RAN cluster configuration are described below. Variants are provided for the group `PolicyGenTemplate` CRs to account for differences in single-node, three-node compact, and standard cluster configurations. Similarly, site-specific configuration variants are provided for single-node clusters and multi-node (compact or standard) clusters. Use the group and site-specific configuration variants that are relevant for your deployment.
.PolicyGenTemplate CRs for RAN deployments
[cols=2*, options="header"]
|====
|PolicyGenTemplate CR
|Description
|`common-ranGen.yaml`
|Contains a set of common RAN CRs that get applied to all clusters. These CRs subscribe to a set of operators providing cluster features typical for RAN as well as baseline cluster tuning.
|`group-du-3node-ranGen.yaml`
|Contains the RAN policies for three-node clusters only.
|`group-du-sno-ranGen.yaml`
|Contains the RAN policies for single-node clusters only.
|`group-du-standard-ranGen.yaml`
|Contains the RAN policies for standard three control-plane clusters.
|====

View File

@@ -1,9 +0,0 @@
// Module included in the following assemblies:
//
// scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: CONCEPT
[id="ztp-precision-time-protocol-operator_{context}"]
= Precision Time Protocol Operator
Precision Time Protocol (PTP) is used to synchronize clocks in a network. The PTP Operator discovers PTP-capable devices in the cluster and creates and manages `linuxptp` services for those devices. The PTP Operator also deploys a PTP fast events infrastructure. vDU applications use PTP fast events notifications to report on clock events that can negatively affect the performance and reliability of the application. PTP fast events are distributed over an Advanced Message Queuing Protocol (AMQP) event notification bus.

View File

@@ -0,0 +1,36 @@
// Module included in the following assemblies:
//
// *scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: PROCEDURE
[id="ztp-preparing-for-the-gitops-ztp-upgrade_{context}"]
= Preparing for the upgrade
Use the following procedure to prepare your site for the GitOps zero touch provisioning (ZTP) upgrade.
.Procedure
. Obtain the latest version of the GitOps ZTP container from which you can extract a set of custom resources (CRs) used to configure the GitOps operator on the hub cluster for use in the GitOps ZTP solution.
. Extract the `argocd/deployment` directory using the following commands:
+
[source,terminal]
----
$ mkdir -p ./out
----
+
[source,terminal]
----
$ podman run --log-driver=none --rm registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.10 extract /home/ztp --tar | tar x -C ./out
----
+
The `/out` directory contains the following subdirectories:
+
* `out/extra-manifest`: contains the source CR files that the `SiteConfig` CR uses to generate the extra manifest `configMap`.
* `out/source-crs`: contains the source CR files that the `PolicyGenTemplate` CR uses to generate the {rh-rhacm-first} policies.
* `out/argocd/deployment`: contains patches and YAML files to apply on the hub cluster for use in the next step of this procedure.
* `out/argocd/example`: contains example `SiteConfig` and `PolicyGenTemplate` files that represent the recommended configuration.
. Update the `clusters-app.yaml` and `policies-app.yaml` files to reflect the name of your applications and the URL, branch, and path for your Git repository.
If the upgrade includes changes to policies that may result in obsolete policies, these policies should be removed prior to performing the upgrade.

View File

@@ -8,103 +8,37 @@
You can configure your hub cluster with a set of ArgoCD applications that generate the required installation and policy custom resources (CR) for each site based on a zero touch provisioning (ZTP) GitOps flow.
.Prerequisites
* Openshift Cluster 4.8 or 4.9 as the hub cluster
* {rh-rhacm-first} Operator 2.3 or 2.4 installed on the hub cluster
* Red Hat OpenShift GitOps Operator 1.3 on the hub cluster
.Procedure
. Install the Red Hat OpenShift GitOps Operator on your hub cluster.
. Extract the administrator password for ArgoCD:
+
[source,terminal]
----
$ oc get secret openshift-gitops-cluster -n openshift-gitops -o jsonpath='{.data.admin\.password}' | base64 -d
----
. Install the {cgu-operator-first}, which coordinates with any new sites added by ZTP and manages application of the `PolicyGenTemplate`-generated policies.
. Prepare the ArgoCD pipeline configuration:
.. Extract the ArgoCD deployment CRs from the ZTP site generator container using the latest container image version:
+
.. Create a Git repository with the directory structure similar to the example directory. For more information, see "Preparing the ZTP Git repository".
.. Configure access to the repository using the ArgoCD UI. Under *Settings* configure the following:
+
* *Repositories* - Add the connection information. The URL must end in `.git`, for example, `https://repo.example.com/repo.git` and credentials.
* *Certificates* - Add the public certificate for the repository, if needed.
.. Modify the two ArgoCD Applications, `out/argocd/deployment/clusters-app.yaml` and `out/argocd/deployment/policies-app.yaml`, based on your Git repository:
+
* Update the URL to point to the Git repository. The URL must end with `.git`, for example, `https://repo.example.com/repo.git`.
* The `targetRevision` must indicate which Git repository branch to monitor.
* The path should specify the path to the `SiteConfig` or `PolicyGenTemplate` CRs, respectively.
. Apply the pipeline configuration to your hub cluster using the following command:
+
[source,terminal]
----
$ mkdir ztp
$ podman run --rm -v `pwd`/ztp:/mnt/ztp:Z registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.10.0-1 /bin/bash -c "cp -ar /usr/src/hook/ztp/* /mnt/ztp/"
----
+
The remaining steps in this section relate to the `ztp/gitops-subscriptions/argocd/` directory.
.. Modify the source values of the two ArgoCD applications, `deployment/clusters-app.yaml` and `deployment/policies-app.yaml` with appropriate URL, `targetRevision` branch, and path values. The path values must match those used in your Git repository.
+
Modify `deployment/clusters-app.yaml`:
+
[source,yaml]
----
apiVersion: v1
kind: Namespace
metadata:
name: clusters-sub
---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: clusters
namespace: openshift-gitops
spec:
destination:
server: https://kubernetes.default.svc
namespace: clusters-sub
project: default
source:
path: ztp/gitops-subscriptions/argocd/resource-hook-example/siteconfig <1>
repoURL: https://github.com/openshift-kni/cnf-features-deploy <2>
targetRevision: master <3>
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
----
<1> The `ztp/gitops-subscriptions/argocd/` file path that contains the `siteconfig` CRs for the clusters.
<2> The URL of the Git repository that contains the `siteconfig` custom resources that define site configuration for installing clusters.
<3> The branch on the Git repository that contains the relevant site configuration data.
.. Modify `deployment/policies-app.yaml`:
+
[source,yaml]
----
apiVersion: v1
kind: Namespace
metadata:
name: policies-sub
---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: policies
namespace: openshift-gitops
spec:
destination:
server: https://kubernetes.default.svc
namespace: policies-sub
project: default
source:
directory:
recurse: true
path: ztp/gitops-subscriptions/argocd/resource-hook-example/policygentemplates <1>
repoURL: https://github.com/openshift-kni/cnf-features-deploy <2>
targetRevision: master <3>
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
----
<1> The `ztp/gitops-subscriptions/argocd/` file path that contains the `policygentemplates` CRs for the clusters.
<2> The URL of the Git repository that contains the `policygentemplates` custom resources that specify configuration data for the site.
<3> The branch on the Git repository that contains the relevant configuration data.
. To apply the pipeline configuration to your hub cluster, enter this command:
+
[source,terminal]
----
$ oc apply -k ./deployment
$ oc apply -k out/argocd/deployment
----

View File

@@ -12,13 +12,50 @@ Create a Git repository for hosting site configuration data. The zero touch prov
. Create a directory structure with separate paths for the `SiteConfig` and `PolicyGenTemplate` custom resources (CR).
. Add `pre-sync.yaml` and `post-sync.yaml` from `resource-hook-example/<policygentemplates>/` to the path for the `PolicyGenTemplate` CRs.
. Add `pre-sync.yaml` and `post-sync.yaml` from `resource-hook-example/<siteconfig>/` to the path for the `SiteConfig` CRs.
. Export the `argocd` directory from the `ztp-site-generate` container image using the following commands:
+
[NOTE]
====
If your hub cluster operates in a disconnected environment, you must update the `image` for all four pre and post sync hook CRs.
====
[source,terminal]
----
$ podman pull registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.10
----
+
[source,terminal]
----
$ mkdir -p ./out
----
+
[source,terminal]
----
$ podman run --log-driver=none --rm registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.10 extract /home/ztp --tar | tar x -C ./out
----
. Apply the `policygentemplates.ran.openshift.io` and `siteconfigs.ran.openshift.io` CR definitions.
. Check that the `out` directory contains the following subdirectories:
+
* `out/extra-manifest` contains the source CR files that `SiteConfig` uses to generate extra manifest `configMap`.
* `out/source-crs` contains the source CR files that `PolicyGenTemplate` uses to generate the {rh-rhacm-first} policies.
* `out/argocd/deployment` contains patches and YAML files to apply on the hub cluster for use in the next step of this procedure.
* `out/argocd/example` contains the examples for `SiteConfig` and `PolicyGenTemplate` files that represent the recommended configuration.
The directory structure under `out/argocd/example` serves as a reference for the structure and content of your Git repository. The example includes `SiteConfig` and `PolicyGenTemplate` reference CRs for single-node, three-node, and standard clusters. Remove references to cluster types that you are not using. The following example describes a set of CRs for a network of single-node clusters:
[source,terminal]
----
example/
├── policygentemplates
│ ├── common-ranGen.yaml
│ ├── example-sno-site.yaml
│ ├── group-du-sno-ranGen.yaml
│ ├── group-du-sno-validator-ranGen.yaml
│ ├── kustomization.yaml
│ └── ns.yaml
└── siteconfig
├── example-sno.yaml
├── KlusterletAddonConfigOverride.yaml
└── kustomization.yaml
----
Keep `SiteConfig` and `PolicyGenTemplate` CRs in separate directories. Both the `SiteConfig` and `PolicyGenTemplate` directories must contain a `kustomization.yaml` file that explicitly includes the files in that directory.
This directory structure and the `kustomization.yaml` files must be committed and pushed to your Git repository. The initial push to Git should include the `kustomization.yaml` files. The `SiteConfig` (`example-sno.yaml`) and `PolicyGenTemplate` (`common-ranGen.yaml`, `group-du-sno*.yaml`, and `example-sno-site.yaml`) files can be omitted and pushed at a later time as required when deploying a site.
The `KlusterletAddonConfigOverride.yaml` file is only required if one or more `SiteConfig` CRs which make reference to it are committed and pushed to Git. See `example-sno.yaml` for an example of how this is used.

View File

@@ -1,29 +0,0 @@
// Module included in the following assemblies:
//
// *scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: CONCEPT
[id="ztp-prerequisites-for-deploying-the-ztp-pipeline_{context}"]
= Prerequisites for deploying the ZTP pipeline
* {product-title} cluster version 4.8 or higher and Red Hat GitOps Operator is installed.
* {rh-rhacm-first} version 2.3 or above is installed.
* For disconnected environments, make sure your source data Git repository and `ztp-site-generator` container image are accessible from the hub cluster.
* If you want additional custom content, such as extra install manifests or custom resources (CR) for policies, add them to the `/usr/src/hook/ztp/source-crs/extra-manifest/` directory. Similarly, you can add additional configuration CRs, as referenced from a `PolicyGenTemplate`, to the `/usr/src/hook/ztp/source-crs/` directory.
** Create a `Containerfile` that adds your additional manifests to the Red Hat provided image, for example:
+
[source,yaml]
----
FROM <registry fqdn>/ztp-site-generator:latest <1>
COPY myInstallManifest.yaml /usr/src/hook/ztp/source-crs/extra-manifest/
COPY mySourceCR.yaml /usr/src/hook/ztp/source-crs/
----
+
<1> <registry fqdn> must point to a registry containing the `ztp-site-generator` container image provided by Red Hat.
** Build a new container image that includes these additional files:
+
[source,terminal]
----
$> podman build Containerfile.example
----

View File

@@ -3,7 +3,7 @@
// scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: CONCEPT
[id="provisioning-edge-sites-at-scale_{context}"]
[id="ztp-provisioning-edge-sites-at-scale_{context}"]
= Provisioning edge sites at scale
Telco edge computing presents extraordinary challenges with managing hundreds to tens of thousands of clusters in hundreds of thousands of locations. These challenges require fully-automated management solutions with, as closely as possible, zero human interaction.
@@ -16,4 +16,4 @@ Service providers are deploying a more distributed mobile network architecture a
The following diagram shows how ZTP works within a far edge framework.
image::176_OpenShift_zero_touch_provisioning_0821.png[ZTP in a far edge framework]
image::217_OpenShift_Zero_Touch_Provisioning_updates_0222_1.png[ZTP in a far edge framework]

View File

@@ -0,0 +1,62 @@
// Module included in the following assemblies:
//
// * scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: PROCEDURE
[id="ztp-querying-the-policy-compliance-status-for-each-cluster_{context}"]
= Querying the policy compliance status for each cluster
After you have created the validator inform policies for your clusters and pushed them to the
the zero touch provisioning (ZTP) Git repository, you can check the status of each cluster for policy compliance.
.Procedure
. To query the status of the spoke clusters, use either the {rh-rhacm-first} web console or the CLI:
+
* To query status from the {rh-rhacm} web console, perform the following actions:
+
.. Click *Governance* -> *Find policies*.
.. Search for *du-validator-policy*.
.. Click into the policy.
* To query status using the CLI, run the following command:
+
[source,terminal]
----
$ oc get policies du-validator-policy -n <namespace_for_common> -o jsonpath={'.status.status'} | jq
----
+
When all of the policies including the validator inform policy applied to
the cluster become compliant, ZTP installation and configuration for this cluster is complete.
. To query the cluster violation/compliant status from the ACM web console, click
*Governance* -> *Cluster violations*.
. Check the validator policy compliant status for a cluster using the following commands:
+
--
.. Export the cluster name:
+
[source,terminal]
----
$ export CLUSTER=<cluster_name>
----
.. Get the policy:
+
[source,terminal]
----
$ oc get policies -n $CLUSTER | grep <validator_policy_name>
----
--
+
Alternatively, you can use the following command:
+
[source,terminal]
----
$ oc get policies -n <namespace-for-group> <validatorPolicyName> -o jsonpath="{.status.status[?(@.clustername=='$CLUSTER')]}" | jq
----
+
After the `*-validator-du-policy` {rh-rhacm} policy becomes compliant for the cluster, the
validator policy is unbound for this cluster and the `ztp-done` label is added to the cluster.
This acts as a persistent indicator that the whole ZTP pipeline has completed for the cluster.

View File

@@ -0,0 +1,26 @@
// Module included in the following assemblies:
//
// *scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: PROCEDURE
[id="ztp-removing-obsolete-content_{context}"]
= Removing obsolete content
If a change to the `PolicyGenTemplate` file configuration results in obsolete policies, for example, policies are renamed, use the following procedure to remove those policies in an automated way.
.Procedure
. Remove the affected `PolicyGenTemplate` files from the Git repository, commit and push to the remote repository.
. Wait for the changes to synchronize through the application and the affected policies to be removed from the hub cluster.
. Add the updated `PolicyGenTemplate` files back to the Git repository, and then commit and push to the remote repository.
Note that removing the zero touch provisioning (ZTP) distributed unit (DU) profile policies from the Git repository, and as a result also removing them from the hub cluster, does not affect any configuration of the managed spoke clusters. Removing a policy from the hub cluster does not delete it from the spoke cluster and the CRs managed by that policy.
As an alternative, after making changes to `PolicyGenTemplate` files that result in obsolete policies, you can remove these policies from the hub cluster manually. You can delete policies from the {rh-rhacm} console using the *Governance* tab or by using the following command:
[source,terminal]
----
$ oc delete policy -n <namespace> <policyName>
----

View File

@@ -1,34 +0,0 @@
// Module included in the following assemblies:
//
// *scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: PROCEDURE
[id="ztp-removing-the-argocd-pipeline_{context}"]
= Removing the ArgoCD pipeline
Use the following procedure if you want to remove the ArgoCD pipeline and all generated artifacts.
.Procedure
. Detach all clusters from ACM.
. Delete all `SiteConfig` and `PolicyGenTemplate` custom resources (CRs) from your Git repository.
. Delete the following namespaces:
+
* All policy namespaces:
+
[source,terminal]
----
$ oc get policy -A
----
+
* `clusters-sub`
* `policies-sub`
. Process the directory using the Kustomize tool:
+
[source,terminal]
----
$ oc delete -k cnf-features-deploy/ztp/gitops-subscriptions/argocd/deployment
----

View File

@@ -0,0 +1,82 @@
// Module included in the following assemblies:
//
// *scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: CONCEPT
[id="ztp-required-changes-to-the-git-repository_{context}"]
= Required changes to the Git repository
When upgrading from an earlier release to {product-title} 4.10, additional requirements are placed on the contents of the Git repository. Existing content in the repository must be updated to reflect these changes.
* Changes to `PolicyGenTemplate` files:
+
All `PolicyGenTemplate` files must be created in a `Namespace` prefixed with `ztp`. This ensures that the GitOps zero touch provisioning (ZTP) application is able to manage the policy CRs generated by GitOps ZTP without conflicting with the way {rh-rhacm-first} manages the policies internally.
* Remove the `pre-sync.yaml` and `post-sync.yaml` files:
+
This step is optional but recommended. When the `kustomization.yaml` files are added, the `pre-sync.yaml` and `post-sync.yaml` files are no longer used. They must be removed to avoid confusion and can potentially cause errors if kustomization files are inadvertantly removed. Note that there is a set of `pre-sync.yaml` and `post-sync.yaml` files under both the `SiteConfig` and `PolicyGenTemplate` trees.
* Add the `kustomization.yaml` file to the repository:
+
All `SiteConfig` and `PolicyGenTemplate` CRs must be included in a `kustomization.yaml` file under their respective directory trees. For example:
+
[source,terminal]
----
├── policygentemplates
│ ├── site1-ns.yaml
│ ├── site1.yaml
│ ├── site2-ns.yaml
│ ├── site2.yaml
│ ├── common-ns.yaml
│ ├── common-ranGen.yaml
│ ├── group-du-sno-ranGen-ns.yaml
│ ├── group-du-sno-ranGen.yaml
│ └── kustomization.yaml
└── siteconfig
├── site1.yaml
├── site2.yaml
└── kustomization.yaml
----
+
[NOTE]
====
The files listed in the `generator` sections must contain either `SiteConfig` or `PolicyGenTemplate` CRs only. If your existing YAML files contain other CRs, for example, `Namespace`, these other CRs must be pulled out into separate files and listed in the `resources` section.
====
+
The `PolicyGenTemplate` kustomization file must contain all `PolicyGenTemplate` YAML files in the `generator` section and `Namespace` CRs in the `resources` section. For example:
+
[source,yaml]
----
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
generators:
- common-ranGen.yaml
- group-du-sno-ranGen.yaml
- site1.yaml
- site2.yaml
resources:
- common-ns.yaml
- group-du-sno-ranGen-ns.yaml
- site1-ns.yaml
- site2-ns.yaml
----
+
The `SiteConfig` kustomization file must contain all `SiteConfig` YAML files in the `generator` section and any other CRs in the resources:
+
[source,terminal]
----
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
generators:
- site1.yaml
- site2.yaml
----
* Review and incorporate recommended changes
+
Each release may include additional recommended changes to the configuration applied to deployed clusters. Typically these changes result in lower CPU use by the OpenShift platform, additional features, or improved tuning of the platform.
+
Review the reference `SiteConfig` and `PolicyGenTemplate` CRs applicable to the types of cluster in your network. These examples can be found in the `argocd/example` directory extracted from the GitOps ZTP container.

View File

@@ -0,0 +1,41 @@
// Module included in the following assemblies:
//
// *scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: PROCEDURE
[id="ztp-restarting-policies-reconciliation_{context}"]
= Restarting policies reconciliation
Use the following procedure to restart policies reconciliation in the event of unexpected compliance issues. This procedure is required when the `ClusterGroupUpgrade` CR has timed out.
.Procedure
. A `ClusterGroupUpgrade` CR is generated in the namespace `ztp-install` by the {cgu-operator-full} after the managed spoke cluster becomes `Ready`:
+
[source,terminal]
----
$ export CLUSTER=<clusterName>
----
+
[source,terminal]
----
$ oc get clustergroupupgrades -n ztp-install $CLUSTER
----
. If there are unexpected issues and the policies fail to become complaint within the configured timeout (the default is 4 hours), the status of the `ClusterGroupUpgrade` CR shows `UpgradeTimedOut`:
+
[source,terminal]
----
$ oc get clustergroupupgrades -n ztp-install $CLUSTER -o jsonpath='{.status.conditions[?(@.type=="Ready")]}'
----
. A `ClusterGroupUpgrade` CR in the `UpgradeTimedOut` state automatically restarts its policy reconciliation every hour. If you have changed your policies, you can start a retry immediately by deleting the existing `ClusterGroupUpgrade` CR. This triggers the automatic creation of a new `ClusterGroupUpgrade` CR that begins reconciling the policies immediately:
+
[source,terminal]
----
$ oc delete clustergroupupgrades -n ztp-install $CLUSTER
----
Note that when the `ClusterGroupUpgrade` CR completes with status `UpgradeCompleted` and the managed spoke cluster has the label `ztp-done` applied, you can make additional configuration changes using `PolicyGenTemplate`. Deleting the existing `ClusterGroupUpgrade` CR will not make the {cgu-operator} generate a new CR.
At this point, ZTP has completed its interaction with the cluster and any further interactions should be treated as an upgrade.

View File

@@ -0,0 +1,11 @@
// Module included in the following assemblies:
//
// *scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: CONCEPT
[id="ztp-roll-out-the-configuration-changes_{context}"]
= Roll out the configuration changes
If any configuration changes were included in the upgrade due to implementing recommended changes, the upgrade process results in a set of policy CRs on the hub cluster in the `Non-Compliant` state. As of the {product-title} 4.10 release, these policies are set to `inform` mode and are not pushed to the spoke clusters without an additional step by the user. This ensures that potentially disruptive changes to the clusters can be managed in terms of when the changes are made, for example, during a maintenance window, and how many clusters are updated concurrently.
To roll out the changes, create one or more `ClusterGroupUpgrade` CRs as detailed in the {cgu-operator} documentation. The CR must contain the list of `Non-Compliant` policies that you want to push out to the spoke clusters as well as a list or selector of which clusters should be included in the update.

View File

@@ -4,7 +4,6 @@
:_content-type: CONCEPT
[id="ztp-single-node-clusters_{context}"]
= Single-node clusters
You use zero touch provisioning (ZTP) to deploy {sno} clusters to run distributed units (DUs) on small hardware footprints at disconnected

View File

@@ -2,13 +2,13 @@
//
// *scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: CONCEPT
:_content-type: PROCEDURE
[id="ztp-site-cleanup_{context}"]
= Site cleanup
To remove a site and the associated installation and policy custom resources (CRs), remove the `SiteConfig` and site-specific `PolicyGenTemplate` CRs from the Git repository. The pipeline hooks remove the generated CRs.
Remove a site and the associated installation and configuration policy CRs by removing the `SiteConfig` and `PolicyGenTemplate` file names from the `kustomization.yaml` file. When you run the ZTP pipeline again, the generated CRs are removed. If you want to permanently remove a site, you should also remove the `SiteConfig` and site-specific `PolicyGenTemplate` files from the Git repository. If you want to remove a site temporarily, for example when redeploying a site, you can leave the `SiteConfig` and site-specific `PolicyGenTemplate` CRs in the Git repository.
[NOTE]
====
Before removing a `SiteConfig` CR you must detach the cluster from ACM.
After removing the `SiteConfig` file, if the corresponding clusters remain in the detach process, check {rh-rhacm-first} for information about cleaning up the detached managed cluster.
====

View File

@@ -1,21 +0,0 @@
// Module included in the following assemblies:
//
// scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: CONCEPT
[id="ztp-site-planning-for-du-deployments_{context}"]
= Site planning considerations for distributed unit deployments
Site planning for distributed units (DU) deployments is complex. The following is an overview of the tasks that you complete before the DU hosts are brought online in the production environment.
* Develop a network model. The network model depends on various factors such as the size of the area of coverage, number of hosts, projected traffic load, DNS, and DHCP requirements.
* Decide how many DU radio nodes are required to provide sufficient coverage and redundancy for your network.
* Develop mechanical and electrical specifications for the DU host hardware.
* Develop a construction plan for individual DU site installations.
* Tune host BIOS settings for production, and deploy the BIOS configuration to the hosts.
* Install the equipment on-site, connect hosts to the network, and apply power.
* Configure on-site switches and routers.
* Perform basic connectivity tests for the host machines.
* Establish production network connectivity, and verify host connections to the network.
* Provision and deploy on-site DU hosts at scale.
* Test and verify on-site operations, performing load and scale testing of the DU hosts before finally bringing the DU infrastructure online in the live production environment.

View File

@@ -1,13 +0,0 @@
// Module included in the following assemblies:
//
// scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: CONCEPT
[id="ztp-sriov-operator_{context}"]
= SR-IOV Operator
The Single Root I/O Virtualization (SR-IOV) Network Operator manages the SR-IOV network devices and network attachments in your cluster.
The SR-IOV Operator allows network interfaces to be virtual and shared at a device level with networking functions running within the cluster.
The SR-IOV Network Operator adds the `SriovOperatorConfig.sriovnetwork.openshift.io` CustomResourceDefinition resource. The Operator automatically creates a SriovOperatorConfig custom resource named `default` in the `openshift-sriov-network-operator` namespace. The `default` custom resource contains the SR-IOV Network Operator configuration for your cluster.

View File

@@ -0,0 +1,32 @@
// Module included in the following assemblies:
//
// *scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: PROCEDURE
[id="ztp-stopping-the-existing-gitops-ztp-applications_{context}"]
= Stopping the existing GitOps ZTP applications
Removing the existing applications ensures that any changes to existing content in the Git repository are not rolled out until the new version of the tooling is available.
Use the application files from the `deployment` directory. If you used custom names for the applications, update the names in these files first.
.Procedure
. Perform a non-cascaded delete on the `clusters` application to leave all generated resources in place:
+
[source,terminal]
----
$ oc delete -f out/argocd/deployment/clusters-app.yaml
----
. Perform a cascaded delete on the `policies` application to remove all previous policies:
+
[source,terminal]
----
$ oc patch -f policies-app.yaml -p '{"metadata": {"finalizers": ["resources-finalizer.argocd.argoproj.io"]}}' --type merge
----
+
[source,terminal]
----
$ oc delete -f out/argocd/deployment/policies-app.yaml
----

View File

@@ -0,0 +1,36 @@
// Module included in the following assemblies:
//
// scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: CONCEPT
[id="ztp-support-for-deployment-of-multi-node-clusters_{context}"]
= ZTP support for deployment of multi-node clusters
The Telco 5G zero touch provisioning (ZTP) flow uses the Assisted Service, which is part of {rh-rhacm-first} on the hub cluster, to install clusters. This is done by generating all of the custom resources (CRs) required by Assisted Service including:
* `AgentClusterInstall`
* `ClusterDeployment`
* `NMStateConfig`
* `ManagedCluster and `KlusterletAddonConfig` (integration with {rh-rhacm})
* `InfraEnv`
* `BareMetalHost`
* `ConfigMap` for extra install manifests
Extending ZTP to support three-node clusters and standard clusters requires updates to these CRs,including multiple instantiations of some.
ZTP provides support for deploying single node clusters, three-node clusters, and standard OpenShift clusters. This includes the installation of OpenShift and deployment of the distributed units (DUs) at scale.
The overall flow is identical to the ZTP support for single node clusters, with some differentiation in configuration depending on the type of cluster:
`SiteConfig`:
* For single node clusters, the `SiteConfig` file must have exactly one entry in the `nodes` section.
* For three-node clusters, the `SiteConfig` file must have exactly three entries defined in the `nodes` section.
* For standard clusters, the `SiteConfig` file must have exactly three entries in the `nodes` section with `role: master` and two or more additional entries with `role: worker`.
`PolicyGenTemplate`:
* The example common `PolicyGenTemplate` is common across all types of clusters.
* There are example group `PolicyGenTemplate` files for each single node, three-node,
and standard clusters.
* Site-specific `PolicyGenTemplate` files are still specific to each site.

View File

@@ -0,0 +1,53 @@
// Module included in the following assemblies:
//
// * scalability_and_performance/ztp-deploying-disconnected.adoc
:_module-type: CONCEPT
[id="ztp-talo-integration_{context}"]
= GitOps ZTP and {cgu-operator-full}
GitOps zero touch provisioning (ZTP) generates installation and configuration CRs from manifests stored in Git. These artifacts are applied to a centralized hub cluster where {rh-rhacm-first}, assisted installer service, and the {cgu-operator-first} use the CRs to install and configure the spoke cluster. The configuration phase of the ZTP pipeline uses the {cgu-operator} to orchestrate the application of the configuration CRs to the cluster. There are several key integration points between GitOps ZTP and the {cgu-operator}.
Inform policies::
By default, GitOps ZTP creates all policies with a remediation action of `inform`. These policies cause {rh-rhacm} to report on compliance status of clusters relevant to the policies but does not apply the desired configuration. During the ZTP installation, the {cgu-operator} steps through the created `inform` policies, creates a copy for the target spoke cluster(s) and changes the remediation action of the copy to `enforce`. This pushes the configuration to the spoke cluster. Outside of the ZTP phase of the cluster lifecycle, this setup allows changes to be made to policies without the risk of immediately rolling those changes out to all affected spoke clusters in the network. You can control the timing and the set of clusters that are remediated using {cgu-operator}.
Automatic creation of ClusterGroupUpgrade CRs::
The {cgu-operator} monitors the state of all `ManagedCluster` CRs on the hub cluster. Any `ManagedCluster` CR which does not have a `ztp-done` label applied, including newly created `ManagedCluster` CRs, causes the {cgu-operator} to automatically create a `ClusterGroupUpgrade` CR with the following characteristics:
* The `ClusterGroupUpgrade` CR is created and enabled in the `ztp-install` namespace.
* `ClusterGroupUpgrade` CR has the same name as the `ManagedCluster` CR.
* The cluster selector includes only the cluster associated with that `ManagedCluster` CR.
* The set of managed policies includes all policies that {rh-rhacm} has bound to the cluster at the time the `ClusterGroupUpgrade` is created.
* Pre-caching is disabled.
* Timeout set to 4 hours (240 minutes).
+
The automatic creation of an enabled `ClusterGroupUpgrade` ensures that initial zero-touch deployment of clusters proceeds without the need for user intervention. Additionally, the automatic creation of a `ClusterGroupUpgrade` CR for any `ManagedCluster` without the `ztp-done` label allows a failed ZTP installation to be restarted by simply deleting the `ClusterGroupUpgrade` CR for the cluster.
Waves::
Each policy generated from a `PolicyGenTemplate` CR includes a `ztp-deploy-wave` annotation. This annotation is based on the same annotation from each CR which is included in that policy. The wave annotation is used to order the policies in the auto-generated `ClusterGroupUpgrade` CR.
+
[NOTE]
====
All CRs in the same policy must have the same setting for the `ztp-deploy-wave` annotation. The default value of this annotation for each CR can be overridden in the `PolicyGenTemplate`. The wave annotation in the source CR is used for determining and setting the policy wave annotation. This annotation is removed from each built CR which is included in the generated policy at runtime.
====
+
The {cgu-operator} applies the configuration policies in the order specified by the wave annotations. The {cgu-operator} waits for each policy to be compliant before moving to the next policy. It is important to ensure that the wave annotation for each CR takes into account any prerequisites for those CRs to be applied to the cluster. For example, an Operator must be installed before or concurrently with the configuration for the Operator. Similarly, the `CatalogSource` for an Operator must be installed in a wave before or concurrently with the Operator Subscription. The default wave value for each CR takes these prerequisites into account.
+
Multiple CRs and policies can share the same wave number. Having fewer policies can result in faster deployments and lower CPU usage. It is a best practice to group many CRs into relatively few waves.
To check the default wave value in each source CR, run the following command against the `out/source-crs` directory that is extracted from the `ztp-site-generator` container image:
+
[source,terminal]
----
$ grep -r "ztp-deploy-wave" out/source-crs
----
Phase labels::
The `ClusterGroupUpgrade` CR is automatically created and includes directives to annotate the `ManagedCluster` CR with labels at the start and end of the ZTP process.
+
When ZTP configuration post-installation commences, the `ManagedCluster` has the `ztp-running` label applied. When all policies are remediated to the cluster and are fully compliant, these directives cause the {cgu-operator} to remove the `ztp-running` label and apply the `ztp-done` label.
+
For deployments which make use of the `informDuValidator` policy, the `ztp-done` label is applied when the cluster is fully ready for deployment of applications. This includes all reconciliation and resulting effects of the ZTP applied configuration CRs.
Linked CRs::
The automatically created `ClusterGroupUpgrade` CR has the owner reference set as the `ManagedCluster` from which it was derived. This reference ensures that deleting the `ManagedCluster` CR causes the instance of the `ClusterGroupUpgrade` to be deleted along with any supporting resources.

View File

@@ -0,0 +1,20 @@
// Module included in the following assemblies:
//
// *scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: PROCEDURE
[id="ztp-tearing-down-the-pipeline_{context}"]
= Tearing down the pipeline
If you need to remove the ArgoCD pipeline and all generated artifacts follow this procedure:
.Procedure
. Detach all clusters from {rh-rhacm}.
. Delete the `kustomization.yaml` file in the `deployment` directory using the following command:
+
[source,terminal]
----
$ oc delete -k out/argocd/deployment
----

View File

@@ -6,7 +6,7 @@
[id="ztp-the-gitops-approach_{context}"]
= The GitOps approach
ZTP uses the GitOps deployment set of practices for infrastructure deployment that allows developers to perform tasks that would otherwise fall under the purview of IT operations. GitOps achieves these tasks using declarative specifications stored in Git repositories, such as YAML files and other defined patterns, that provide a framework for deploying the infrastructure. The declarative output is leveraged by the Open Cluster Manager for multisite deployment.
ZTP uses the GitOps deployment set of practices for infrastructure deployment that allows developers to perform tasks that would otherwise fall under the purview of IT operations. GitOps achieves these tasks using declarative specifications stored in Git repositories, such as YAML files and other defined patterns, that provide a framework for deploying the infrastructure. The declarative output is leveraged by the Open Cluster Manager (OCM) for multisite deployment.
One of the motivators for a GitOps approach is the requirement for reliability at scale. This is a significant challenge that GitOps helps solve.

View File

@@ -2,21 +2,22 @@
//
// scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: PROCEDURE
:_content-type: REFERENCE
[id="ztp-the-policygentemplate_{context}"]
= The PolicyGenTemplate
= About the PolicyGenTemplate
The `PolicyGenTemplate.yaml` file is a Custom Resource Definition (CRD) that tells PolicyGen where to categorize the generated policies and which items need to be overlaid.
The `PolicyGenTemplate.yaml` file is a custom resource definition (CRD) that tells the `PolicyGen` policy generator what CRs to include in the configuration, how to categorize the CRs into the generated policies, and what items in those CRs need to be updated with overlay content.
The following example shows the `PolicyGenTemplate.yaml` file:
The following example shows a `PolicyGenTemplate.yaml` file:
[source,yaml]
----
---
apiVersion: ran.openshift.io/v1
kind: PolicyGenTemplate
metadata:
name: "group-du-sno"
namespace: "group-du-sno"
namespace: "group-du-sno-policies"
spec:
bindingRules:
group-du-sno: ""
@@ -24,19 +25,68 @@ spec:
sourceFiles:
- fileName: ConsoleOperatorDisable.yaml
policyName: "console-policy"
- fileName: ClusterLogForwarder.yaml
policyName: "log-forwarder-policy"
spec:
outputs:
- type: "kafka"
name: kafka-open
# below url is an example
url: tcp://10.46.55.190:9092/test
pipelines:
- name: audit-logs
inputRefs:
- audit
outputRefs:
- kafka-open
- name: infrastructure-logs
inputRefs:
- infrastructure
outputRefs:
- kafka-open
- fileName: ClusterLogging.yaml
policyName: "cluster-log-policy"
policyName: "log-policy"
spec:
curation:
curator:
schedule: "30 3 * * *"
collection:
logs:
type: "fluentd"
fluentd: {}
collection:
logs:
type: "fluentd"
fluentd: {}
- fileName: MachineConfigSctp.yaml
policyName: "mc-sctp-policy"
metadata:
labels:
machineconfiguration.openshift.io/role: master
- fileName: PtpConfigSlave.yaml
policyName: "ptp-config-policy"
metadata:
name: "du-ptp-slave"
spec:
profile:
- name: "slave"
interface: "ens5f0"
ptp4lOpts: "-2 -s --summary_interval -4"
phc2sysOpts: "-a -r -n 24"
- fileName: SriovOperatorConfig.yaml
policyName: "sriov-operconfig-policy"
spec:
disableDrain: true
- fileName: MachineConfigAcceleratedStartup.yaml
policyName: "mc-accelerated-policy"
metadata:
name: 04-accelerated-container-startup-master
labels:
machineconfiguration.openshift.io/role: master
- fileName: DisableSnoNetworkDiag.yaml
policyName: "disable-network-diag"
metadata:
labels:
machineconfiguration.openshift.io/role: master
----
The `group-du-ranGen.yaml` file defines a group of policies under a group named `group-du`. This file defines a `MachineConfigPool` `worker-du` that is used as the node selector for any other policy defined in `sourceFiles`. An ACM policy is generated for every source file that exists in `sourceFiles`. And, a single placement binding and placement rule is generated to apply the cluster selection rule for `group-du` policies.
The `group-du-ranGen.yaml` file defines a group of policies under a group named `group-du`. A {rh-rhacm-first} policy is generated for every source file that exists in `sourceFiles`. And, a single placement binding and placement rule is generated to apply the cluster selection rule for `group-du` policies.
Using the source file `PtpConfigSlave.yaml` as an example, the `PtpConfigSlave` has a definition of a `PtpConfig` custom resource (CR). The generated policy for the `PtpConfigSlave` example is named `group-du-ptp-config-policy`. The `PtpConfig` CR defined in the generated `group-du-ptp-config-policy` is named `du-ptp-slave`. The `spec` defined in `PtpConfigSlave.yaml` is placed under `du-ptp-slave` along with the other `spec` items defined under the source file.
@@ -71,7 +121,7 @@ spec:
include:
- '*'
object-templates:
- complianceType: musthave <1>
- complianceType: musthave
objectDefinition:
apiVersion: ptp.openshift.io/v1
kind: PtpConfig
@@ -100,4 +150,3 @@ spec:
domainNumber 24
.....
----
<1> Displays the value of the `complianceType` field. The default value is `musthave` which indicates that an object must exist with the same `name` as specified in `object-templates`. To find the exact matches to roles and objects, set the value to `mustonlyhave`. For more information about the accepted values, see link:https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes/2.4/html-single/governance/index#configuration-policy-yaml-table[Configuration policy YAML table].

View File

@@ -1,15 +0,0 @@
// Module included in the following assemblies:
//
// scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: CONCEPT
[id="ztp-things-to-consider-when-creating-custom-resource-policies_{context}"]
= Considerations when creating custom resource policies
* The custom resources used to create the ACM policies should be defined with consideration of possible overlay to its metadata and spec/data. For example, if the custom resource `metadata.name` does not change between clusters then you should set the `metadata.name` value in the custom resource file. If the custom resource will have multiple instances in the same cluster, then the custom resource `metadata.name` must be defined in the policy template file.
* In order to apply the node selector for a specific machine config pool, you have to set the node selector value as `$mcp` in order to let the policy generator overlay the `$mcp` value with the defined mcp in the policy template.
* Subscription source files do not change.
* To ensure that policy updates are applied, set the `complianceType` field to `mustonlyhave`.

View File

@@ -0,0 +1,9 @@
// Module included in the following assemblies:
//
// *scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: CONCEPT
[id="ztp-topology-aware-lifecycle-manager_{context}"]
= {cgu-operator-full}
Install the {cgu-operator-first} on the hub cluster.

View File

@@ -6,4 +6,4 @@
[id="ztp-troubleshooting-gitops-ztp_{context}"]
= Troubleshooting GitOps ZTP
As noted, the ArgoCD pipeline synchronizes the `SiteConfig` and `PolicyGenTemplate` custom resources (CR) from the Git repository to the hub cluster. During this process, post-sync hooks create the installation and policy CRs that are also applied to the hub cluster. Use the following procedures to troubleshoot issues that might occur in this process.
The ArgoCD pipeline uses the `SiteConfig` and `PolicyGenTemplate` custom resources (CRs) from Git to generate the cluster configuration CRs and {rh-rhacm-first} policies. Use the following steps to troubleshoot issues that might occur during this process.

View File

@@ -0,0 +1,23 @@
// Module included in the following assemblies:
//
// *scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: PROCEDURE
[id="ztp-upgrading-gitops-ztp_{context}"]
= Upgrading GitOps ZTP
You can upgrade the Gitops zero touch provisioning (ZTP) infrastructure independently from the underlying cluster, {rh-rhacm-first}, and {product-title} version running on the spoke clusters. This procedure guides you through the upgrade process to avoid impact on the spoke clusters. However, any changes to the content or settings of policies, including adding recommended content, results in changes that must be rolled out and reconciled to the spoke clusters.
.Prerequisites
* This procedure assumes that you have a fully operational hub cluster running the earlier version of the GitOps ZTP infrastructure.
.Procedure
At a high level, the strategy for upgrading the GitOps ZTP infrastructure is:
. Label all existing clusters with the `ztp-done` label.
. Stop the ArgoCD applications.
. Install the new tooling.
. Update required content and optional changes in the Git repository.
. Update and restart the application configuration.

View File

@@ -0,0 +1,150 @@
// Module included in the following assemblies:
//
// scalability_and_performance/ztp-deploying-disconnected.adoc
:_module-type: PROCEDURE
[id="ztp-using-pgt-to-update-source-crs_{context}"]
= Using PolicyGenTemplate CRs to override source CRs content
`PolicyGenTemplate` CRs allow you to overlay additional configuration details on top of the base source CRs provided in the `ztp-site-generate` container. You can think of `PolicyGenTemplate` CRs as a logical merge or patch to the base CR. Use `PolicyGenTemplate` CRs to update a single field of the base CR, or overlay the entire contents of the base CR. You can update values and insert fields that are not in the base CR.
The following example procedure describes how to update fields in the generated `PerformanceProfile` CR for the reference configuration based on the `PolicyGenTemplate` CR in the `group-du-sno-ranGen.yaml` file. Use the procedure as a basis for modifying other parts of the `PolicyGenTemplate` based on your requirements.
.Prerequisites
* Create a Git repository where you manage your custom site configuration data. The repository must be accessible from the hub cluster and be defined as a source repository for Argo CD.
.Procedure
. Review the baseline source CR for existing content. You can review the source CRs listed in the reference `PolicyGenTemplate` CRs by extracting them from the zero touch provisioning (ZTP) container.
.. Create an `/out` folder:
+
[source,terminal]
----
$ mkdir -p ./out
----
.. Extract the source CRs:
+
[source,terminal]
----
$ podman run --log-driver=none --rm registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.10 extract /home/ztp --tar | tar x -C ./out
----
. Review the baseline `PerformanceProfile` CR in `./out/source-crs/PerformanceProfile.yaml`:
+
[source,yaml]
----
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
name: $name
annotations:
ran.openshift.io/ztp-deploy-wave: "10"
spec:
additionalKernelArgs:
- "idle=poll"
- "rcupdate.rcu_normal_after_boot=0"
cpu:
isolated: $isolated
reserved: $reserved
hugepages:
defaultHugepagesSize: $defaultHugepagesSize
pages:
- size: $size
count: $count
node: $node
machineConfigPoolSelector:
pools.operator.machineconfiguration.openshift.io/$mcp: ""
net:
userLevelNetworking: true
nodeSelector:
node-role.kubernetes.io/$mcp: ''
numa:
topologyPolicy: "restricted"
realTimeKernel:
enabled: true
----
+
[NOTE]
====
Any fields in the source CR which contain `$...` are removed from the generated CR if they are not provided in the `PolicyGenTemplate` CR.
====
. Update the `PolicyGenTemplate` entry for `PerformanceProfile` in the `group-du-sno-ranGen.yaml` reference file. The following example `PolicyGenTemplate` CR stanza supplies appropriate CPU specifications, sets the `hugepages` configuration, and adds a new field that sets `globallyDisableIrqLoadBalancing` to false.
+
[source,yaml]
----
- fileName: PerformanceProfile.yaml
policyName: "config-policy"
metadata:
name: openshift-node-performance-profile
spec:
cpu:
# These must be tailored for the specific hardware platform
isolated: "2-19,22-39"
reserved: "0-1,20-21"
hugepages:
defaultHugepagesSize: 1G
pages:
- size: 1G
count: 10
globallyDisableIrqLoadBalancing: false
----
. Commit the `PolicyGenTemplate` change in Git, and then push to the Git repository being monitored by the GitOps ZTP argo CD application.
.Example output
The ZTP application generates an ACM policy that contains the generated `PerformanceProfile` CR. The contents of that CR are derived by merging the `metadata` and `spec` contents from the `PerformanceProfile` entry in the `PolicyGenTemplate` onto the source CR. The resulting CR has the following content:
[source,yaml]
----
---
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
name: openshift-node-performance-profile
spec:
additionalKernelArgs:
- idle=poll
- rcupdate.rcu_normal_after_boot=0
cpu:
isolated: 2-19,22-39
reserved: 0-1,20-21
globallyDisableIrqLoadBalancing: false
hugepages:
defaultHugepagesSize: 1G
pages:
- count: 10
size: 1G
machineConfigPoolSelector:
pools.operator.machineconfiguration.openshift.io/master: ""
net:
userLevelNetworking: true
nodeSelector:
node-role.kubernetes.io/master: ""
numa:
topologyPolicy: restricted
realTimeKernel:
enabled: true
----
[NOTE]
====
In the `/source-crs` folder that you extract from the `ztp-site-generate` container, the `$` syntax is not used for template substitution as implied by the syntax. Rather, if the `policyGen` tool sees the `$` prefix for a string and you do not specify a value for that field in the related `PolicyGenTemplate` CR, the field is omitted from the output CR entirely.
An exception to this is the `$mcp` variable in `/source-crs` YAML files that is substituted with the specified value for `mcp` from the `PolicyGenTemplate` CR. For example, in `example/policygentemplates/group-du-standard-ranGen.yaml`, the value for `mcp` is `worker`:
[source,yaml]
----
spec:
bindingRules:
group-du-standard: ""
mcp: "worker"
----
The `policyGen` tool replace instances of `$mcp` with `worker` in the output CRs.
====

View File

@@ -0,0 +1,115 @@
// Module included in the following assemblies:
//
// *scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: PROCEDURE
[id="ztp-validating-the-generation-of-configuration-policy-crs_{context}"]
= Validating the generation of configuration policy CRs
Policy custom resources (CRs) are generated in the same namespace as the `PolicyGenTemplate` from which they are created. The same troubleshooting flow applies to all policy CRs generated from a `PolicyGenTemplate` regardless of whether they are `ztp-common`, `ztp-group`, or `ztp-site` based, as shown using the following commands:
[source,terminal]
----
$ export NS=<namespace>
----
[source,terminal]
----
$ oc get policy -n $NS
----
The expected set of policy-wrapped CRs should be displayed.
If the policies failed synchronization, use the following troubleshooting steps.
.Procedure
. To display detailed information about the policies, run the following command:
+
[source,terminal]
----
$ oc describe -n openshift-gitops application policies
----
. Check for `Status: Conditions:` to show the error logs. For example, setting an invalid `sourceFile→fileName:` generates the error shown below:
+
[source,text]
----
Status:
Conditions:
Last Transition Time: 2021-11-26T17:21:39Z
Message: rpc error: code = Unknown desc = `kustomize build /tmp/https___git.com/ran-sites/policies/ --enable-alpha-plugins` failed exit status 1: 2021/11/26 17:21:40 Error could not find test.yaml under source-crs/: no such file or directory
Error: failure in plugin configured via /tmp/kust-plugin-config-52463179; exit status 1: exit status 1
Type: ComparisonError
----
. Check for `Status: Sync:`. If there are log errors at `Status: Conditions:`, the `Status: Sync:` shows `Unknown` or `Error`:
+
[source,text]
----
Status:
Sync:
Compared To:
Destination:
Namespace: policies-sub
Server: https://kubernetes.default.svc
Source:
Path: policies
Repo URL: https://git.com/ran-sites/policies/.git
Target Revision: master
Status: Error
----
. When {rh-rhacm-first} recognizes that policies apply to a `ManagedCluster` object, the policy CR objects are applied to the cluster namespace. Check to see if the policies were copied to the cluster namespace:
+
[source,terminal]
----
$ oc get policy -n $CLUSTER
----
+
.Example output:
+
[source,terminal]
----
NAME REMEDIATION ACTION COMPLIANCE STATE AGE
ztp-common.common-config-policy inform Compliant 13d
ztp-common.common-subscriptions-policy inform Compliant 13d
ztp-group.group-du-sno-config-policy inform Compliant 13d
Ztp-group.group-du-sno-validator-du-policy inform Compliant 13d
ztp-site.example-sno-config-policy inform Compliant 13d
----
+
{rh-rhacm} copies all applicable policies into the cluster namespace. The copied policy names have the format: `<policyGenTemplate.Namespace>.<policyGenTemplate.Name>-<policyName>`.
. Check the placement rule for any policies not copied to the cluster namespace. The `matchSelector` in the `PlacementRule` for those policies should match labels on the `ManagedCluster` object:
+
[source,terminal]
----
$ oc get placementrule -n $NS
----
. Note the `PlacementRule` name appropriate for the missing policy, common, group, or site, using the following command:
+
[source,terminal]
----
$ oc get placementrule -n $NS <placementRuleName> -o yaml
----
+
* The status-decisions should include your cluster name.
* The key-value pair of the `matchSelector` in the spec must match the labels on your managed cluster.
. Check the labels on the `ManagedCluster` object using the following command:
+
[source,terminal]
----
$ oc get ManagedCluster $CLUSTER -o jsonpath='{.metadata.labels}' | jq
----
. Check to see which policies are compliant using the following command:
+
[source,terminal]
----
$ oc get policy -n $CLUSTER
----
+
If the `Namespace`, `OperatorGroup`, and `Subscription` policies are compliant but the Operator configuration policies are not, it is likely that the Operators did not install on the spoke cluster. This causes the Operator configuration policies to fail to apply because the CRD is not yet applied to the spoke.

View File

@@ -1,4 +1,4 @@
// Module included in the following assemblies:
file// Module included in the following assemblies:
//
// *scalability_and_performance/ztp-deploying-disconnected.adoc
@@ -6,66 +6,58 @@
[id="ztp-validating-the-generation-of-installation-crs_{context}"]
= Validating the generation of installation CRs
`SiteConfig` applies Installation custom resources (CR) to the hub cluster in a namespace with the name matching the site name. To check the status, enter the following command:
The GitOps zero touch provisioning (ZTP) infrastructure generates a set of installation CRs on the hub cluster in response to a `SiteConfig` CR pushed to your Git repository. You can check that the installation CRs were created by using the following command:
[source,terminal]
----
$ oc get AgentClusterInstall -n <cluster_name>
----
If no object is returned, use the following procedure to troubleshoot the ArgoCD pipeline flow from `SiteConfig` to the installation CRs.
If no object is returned, use the following procedure to troubleshoot the ArgoCD pipeline flow from `SiteConfig` files to the installation CRs.
.Procedure
. Check the synchronization of the `SiteConfig` to the hub cluster using either of the following commands:
. Verify that the `SiteConfig->ManagedCluster` was generated to the hub cluster:
+
[source,terminal]
----
$ oc get siteconfig -A
$ oc get managedcluster
----
+
or
+
[source,terminal]
----
$ oc get siteconfig -n clusters-sub
----
+
If the `SiteConfig` is missing, one of the following situations has occurred:
* The *clusters* application failed to synchronize the CR from the Git repository to the hub. Use the following command to verify this:
. If the `SiteConfig` `ManagedCluster` is missing, see if the `clusters` application failed to synchronize the files from the Git repository to the hub:
+
[source,terminal]
----
$ oc describe -n openshift-gitops application clusters
----
+
Check for `Status: Synced` and that the `Revision:` is the SHA of the commit you pushed to the subscribed repository.
+
* The pre-sync hook failed, possibly due to a failure to pull the container image. Check the ArgoCD dashboard for the status of the pre-sync job in the *clusters* application.
. Verify the post hook job ran:
. Check for `Status: Conditions:` to view the error logs. For example, setting an invalid value for `extraManifestPath:` in the `siteConfig` file raises an error as shown below:
+
[source,terminal]
[source,text]
----
$ oc describe job -n clusters-sub siteconfig-post
Status:
Conditions:
Last Transition Time: 2021-11-26T17:21:39Z
Message: rpc error: code = Unknown desc = `kustomize build /tmp/https___git.com/ran-sites/siteconfigs/ --enable-alpha-plugins` failed exit status 1: 2021/11/26 17:21:40 Error could not create extra-manifest ranSite1.extra-manifest3 stat extra-manifest3: no such file or directory
2021/11/26 17:21:40 Error: could not build the entire SiteConfig defined by /tmp/kust-plugin-config-913473579: stat extra-manifest3: no such file or directory
Error: failure in plugin configured via /tmp/kust-plugin-config-913473579; exit status 1: exit status 1
Type: ComparisonError
----
+
* If successful, the returned output indicates `succeeded: 1`.
* If the job fails, ArgoCD retries it. In some cases, the first pass will fail and the second pass will indicate that the job passed.
. Check for errors in the post hook job:
. Check for `Status: Sync:`. If there are log errors, `Status: Sync:` could indicate an
`Unknown` error:
+
[source,terminal]
[source,text]
----
$ oc get pod -n clusters-sub
Status:
Sync:
Compared To:
Destination:
Namespace: clusters-sub
Server: https://kubernetes.default.svc
Source:
Path: sites-config
Repo URL: https://git.com/ran-sites/siteconfigs/.git
Target Revision: master
Status: Unknown
----
+
Note the name of the `siteconfig-post-xxxxx` pod:
+
[source,terminal]
----
$ oc logs -n clusters-sub siteconfig-post-xxxxx
----
+
If the logs indicate errors, correct the conditions and push the corrected `SiteConfig` or `PolicyGenTemplate` to the Git repository.

View File

@@ -1,112 +0,0 @@
// Module included in the following assemblies:
//
// *scalability_and_performance/ztp-deploying-disconnected.adoc
:_content-type: PROCEDURE
[id="ztp-validating-the-generation-of-policy-crs_{context}"]
= Validating the generation of policy CRs
ArgoCD generates the policy custom resources (CRs) in the same namespace as the `PolicyGenTemplate` from which they were created. The same troubleshooting flow applies to all policy CRs generated from `PolicyGenTemplates` regardless of whether they are common, group, or site based.
To check the status of the policy CRs, enter the following commands:
[source,terminal]
----
$ export NS=<namespace>
----
[source,terminal]
----
$ oc get policy -n $NS
----
The returned output displays the expected set of policy wrapped CRs. If no object is returned, use the following procedure to troubleshoot the ArgoCD pipeline flow from `SiteConfig` to the policy CRs.
.Procedure
. Check the synchronization of the `PolicyGenTemplate` to the hub cluster:
+
[source,terminal]
----
$ oc get policygentemplate -A
----
or
+
[source,terminal]
----
$ oc get policygentemplate -n $NS
----
+
If the `PolicyGenTemplate` is not synchronized, one of the following situations has occurred:
+
* The clusters application failed to synchronize the CR from the Git repository to the hub. Use the following command to verify this:
+
[source,terminal]
----
$ oc describe -n openshift-gitops application clusters
----
+
Check for `Status: Synced` and that the `Revision:` is the SHA of the commit you pushed to the subscribed repository.
+
* The pre-sync hook failed, possibly due to a failure to pull the container image. Check the ArgoCD dashboard for the status of the pre-sync job in the *clusters* application.
. Ensure the policies were copied to the cluster namespace. When ACM recognizes that policies apply to a `ManagedCluster`, ACM applies the policy CR objects to the cluster namespace:
+
[source,terminal]
----
$ oc get policy -n <cluster_name>
----
ACM copies all applicable common, group, and site policies here. The policy names are `<policyNamespace>` and `<policyName>`.
. Check the placement rule for any policies not copied to the cluster namespace. The `matchSelector` in the `PlacementRule` for those policies should match the labels on the `ManagedCluster`:
+
[source,terminal]
----
$ oc get placementrule -n $NS
----
. Make a note of the `PlacementRule` name for the missing common, group, or site policy:
+
[source,terminal]
----
oc get placementrule -n $NS <placmentRuleName> -o yaml
----
+
* The `status decisions` value should include your cluster name.
* The `key value` of the `matchSelector` in the spec should match the labels on your managed cluster. Check the labels on `ManagedCluster`:
+
[source,terminal]
----
oc get ManagedCluster $CLUSTER -o jsonpath='{.metadata.labels}' | jq
----
+
.Example
[source,yaml]
----
apiVersion: apps.open-cluster-management.io/v1
kind: PlacementRule
metadata:
name: group-test1-policies-placementrules
namespace: group-test1-policies
spec:
clusterSelector:
matchExpressions:
- key: group-test1
operator: In
values:
- ""
status:
decisions:
- clusterName: <cluster_name>
clusterNamespace: <cluster_name>
----
. Ensure all policies are compliant:
+
[source,terminal]
----
oc get policy -n $CLUSTER
----
+
If the Namespace, OperatorGroup, and Subscription policies are compliant but the Operator configuration policies are not it is likely that the Operators did not install.

Some files were not shown because too many files have changed in this diff Show More