From 89368c6d855c811712bfcb590850610e55f3cb2f Mon Sep 17 00:00:00 2001 From: Jeana Routh Date: Wed, 21 May 2025 15:02:08 -0400 Subject: [PATCH] OSDOCS-14739: Troubleshooting MAPI-CAPI conversion, minus deletion behavior --- .../cluster-api-disabling.adoc | 9 +- .../cluster-api-getting-started.adoc | 11 ++- .../cluster-api-troubleshooting.adoc | 31 ++++++- ...c => capi-to-mapi-migration-overview.adoc} | 2 +- ...achine-set-authoritative-api-machines.adoc | 39 +++++++++ ...c => mapi-to-capi-migration-overview.adoc} | 2 +- modules/migrating-between-capi-mapi.adoc | 2 +- modules/ts-capi-migrate-aws-creds.adoc | 12 +++ ...ts-capi-migrate-sync-label-annotation.adoc | 30 +++++++ ...ate-unexpected-machine-counts-scaling.adoc | 46 ++++++++++ .../ts-capi-migrate-unsupported-features.adoc | 83 +++++++++++++++++++ ...ts-capi-sync-list-duplicate-resources.adoc | 55 ++++++++++++ 12 files changed, 311 insertions(+), 11 deletions(-) rename modules/{mapi-capi-migration-overview.adoc => capi-to-mapi-migration-overview.adoc} (96%) create mode 100644 modules/machine-set-authoritative-api-machines.adoc rename modules/{capi-mapi-migration-overview.adoc => mapi-to-capi-migration-overview.adoc} (98%) create mode 100644 modules/ts-capi-migrate-aws-creds.adoc create mode 100644 modules/ts-capi-migrate-sync-label-annotation.adoc create mode 100644 modules/ts-capi-migrate-unexpected-machine-counts-scaling.adoc create mode 100644 modules/ts-capi-migrate-unsupported-features.adoc create mode 100644 modules/ts-capi-sync-list-duplicate-resources.adoc diff --git a/machine_management/cluster_api_machine_management/cluster-api-disabling.adoc b/machine_management/cluster_api_machine_management/cluster-api-disabling.adoc index 126593070e..884ee91ad7 100644 --- a/machine_management/cluster_api_machine_management/cluster-api-disabling.adoc +++ b/machine_management/cluster_api_machine_management/cluster-api-disabling.adoc @@ -12,12 +12,15 @@ To stop using the Cluster API to automate the management of infrastructure resou include::snippets/technology-preview.adoc[] //Migrating Cluster API resources to Machine API resources -include::modules/mapi-capi-migration-overview.adoc[leveloffset=+1] +include::modules/capi-to-mapi-migration-overview.adoc[leveloffset=+1] //Migrating a Cluster API resource to use the Machine API include::modules/migrating-between-capi-mapi.adoc[leveloffset=+2] +//Authoritative API types of compute machines +include::modules/machine-set-authoritative-api-machines.adoc[leveloffset=+2] + [role="_additional-resources"] .Additional resources -// * xr3f:../../machine_management/cluster_api_machine_management/cluster-api-troubleshooting.adoc#ts-capi-resource-migration_cluster-api-troubleshooting[Troubleshooting resource migration] -* xref:../../machine_management/cluster_api_machine_management/cluster-api-getting-started.adoc#capi-mapi-migration-overview_cluster-api-getting-started[Migrating Machine API resources to Cluster API resources] \ No newline at end of file +* xref:../../machine_management/cluster_api_machine_management/cluster-api-troubleshooting.adoc#ts-capi-resource-migration_cluster-api-troubleshooting[Troubleshooting resource migration] +* xref:../../machine_management/cluster_api_machine_management/cluster-api-getting-started.adoc#mapi-to-capi-migration-overview_cluster-api-getting-started[Migrating Machine API resources to Cluster API resources] \ No newline at end of file diff --git a/machine_management/cluster_api_machine_management/cluster-api-getting-started.adoc b/machine_management/cluster_api_machine_management/cluster-api-getting-started.adoc index d3bd2bb953..2d44bd79ce 100644 --- a/machine_management/cluster_api_machine_management/cluster-api-getting-started.adoc +++ b/machine_management/cluster_api_machine_management/cluster-api-getting-started.adoc @@ -24,7 +24,7 @@ When you install a cluster that supports managing infrastructure resources with * One provider-specific infrastructure cluster resource. On clusters that support migrating Machine API resources to Cluster API resources, a two-way synchronization controller creates these primary resources automatically. -For more information, see xref:../../machine_management/cluster_api_machine_management/cluster-api-getting-started.adoc#capi-mapi-migration-overview_cluster-api-getting-started[Migrating Machine API resources to Cluster API resources]. +For more information, see xref:../../machine_management/cluster_api_machine_management/cluster-api-getting-started.adoc#mapi-to-capi-migration-overview_cluster-api-getting-started[Migrating Machine API resources to Cluster API resources]. [id="creating-primary-resources_{context}"] == Creating the Cluster API primary resources @@ -57,7 +57,10 @@ include::modules/capi-creating-machine-set.adoc[leveloffset=+2] * xref:../../machine_management/cluster_api_machine_management/cluster_api_provider_configurations/cluster-api-config-options-bare-metal.adoc#capi-yaml-machine-set-bare-metal_cluster-api-config-options-bare-metal[Sample YAML for a Cluster API compute machine set resource on bare metal] //Migrating Machine API resources to Cluster API resources -include::modules/capi-mapi-migration-overview.adoc[leveloffset=+1] +include::modules/mapi-to-capi-migration-overview.adoc[leveloffset=+1] + +//Authoritative API types of compute machines +include::modules/machine-set-authoritative-api-machines.adoc[leveloffset=+2] //Migrating a Machine API resource to use the Cluster API include::modules/migrating-between-capi-mapi.adoc[leveloffset=+2] @@ -67,5 +70,5 @@ include::modules/deploying-capi-machines-via-mapi-machine-sets.adoc[leveloffset= [role="_additional-resources"] .Additional resources -//* xr3f:../../machine_management/cluster_api_machine_management/cluster-api-troubleshooting.adoc#ts-capi-resource-migration_cluster-api-troubleshooting[Troubleshooting resource migration] -* xref:../../machine_management/cluster_api_machine_management/cluster-api-disabling.adoc#mapi-capi-migration-overview_cluster-api-disabling[Migrating Cluster API resources to Machine API resources] +* xref:../../machine_management/cluster_api_machine_management/cluster-api-troubleshooting.adoc#ts-capi-resource-migration_cluster-api-troubleshooting[Troubleshooting resource migration] +* xref:../../machine_management/cluster_api_machine_management/cluster-api-disabling.adoc#capi-to-mapi-migration-overview_cluster-api-disabling[Migrating Cluster API resources to Machine API resources] diff --git a/machine_management/cluster_api_machine_management/cluster-api-troubleshooting.adoc b/machine_management/cluster_api_machine_management/cluster-api-troubleshooting.adoc index 91eb85cf1c..5f10c54d68 100644 --- a/machine_management/cluster_api_machine_management/cluster-api-troubleshooting.adoc +++ b/machine_management/cluster_api_machine_management/cluster-api-troubleshooting.adoc @@ -15,4 +15,33 @@ Generally, troubleshooting steps for problems with the Cluster API are similar t The {cluster-capi-operator} and its operands are provisioned in the `openshift-cluster-api` namespace, whereas the Machine API uses the `openshift-machine-api` namespace. When using `oc` commands that reference a namespace, be sure to reference the correct one. //Returning the intended machines when using the CLI -include::modules/ts-capi-cli-reference-intended-objects.adoc[leveloffset=+1] \ No newline at end of file +include::modules/ts-capi-cli-reference-intended-objects.adoc[leveloffset=+1] + +//Duplicated machine set and machine resources +include::modules/ts-capi-sync-list-duplicate-resources.adoc[leveloffset=+1] + +//Unexpected resource deletion behavior +//Draft to be completed +//include::modules/ts-capi-migrate-unexpected-deletion-behavior.adoc[leveloffset=+1] + +[id="ts-capi-resource-migration_{context}"] +== Troubleshooting resource migration + +When you migrate a resource to use a different authoritative API, you might encounter issues during the migration process. +You might also notice unexpected behavior due to differences between the Cluster API and the Machine API. + +//Authoritative API types of compute machines +include::modules/machine-set-authoritative-api-machines.adoc[leveloffset=+2] + +//Unexpected machine counts after scaling +include::modules/ts-capi-migrate-unexpected-machine-counts-scaling.adoc[leveloffset=+2] + +//Incomplete synchronization of labels and annotations +include::modules/ts-capi-migrate-sync-label-annotation.adoc[leveloffset=+2] + +//Migrating {aws-short} cloud credentials +//KCS draft not ready for publication +//include::modules/ts-capi-migrate-aws-creds.adoc[leveloffset=+2] + +//Unsupported configuration options +include::modules/ts-capi-migrate-unsupported-features.adoc[leveloffset=+2] diff --git a/modules/mapi-capi-migration-overview.adoc b/modules/capi-to-mapi-migration-overview.adoc similarity index 96% rename from modules/mapi-capi-migration-overview.adoc rename to modules/capi-to-mapi-migration-overview.adoc index 4ad60f3f6f..40ca50c85f 100644 --- a/modules/mapi-capi-migration-overview.adoc +++ b/modules/capi-to-mapi-migration-overview.adoc @@ -3,7 +3,7 @@ // * machine_management/cluster_api_machine_management/cluster-api-disabling.adoc :_mod-docs-content-type: CONCEPT -[id="mapi-capi-migration-overview_{context}"] +[id="capi-to-mapi-migration-overview_{context}"] = Migrating Cluster API resources to Machine API resources On clusters that support migrating between Machine API and Cluster API resources, the two-way synchronization controller supports converting a Cluster API resource to a Machine API resource. diff --git a/modules/machine-set-authoritative-api-machines.adoc b/modules/machine-set-authoritative-api-machines.adoc new file mode 100644 index 0000000000..a2c68af279 --- /dev/null +++ b/modules/machine-set-authoritative-api-machines.adoc @@ -0,0 +1,39 @@ +// Module included in the following assemblies: +// +// * machine_management/cluster_api_machine_management/cluster-api-disabling.adoc +// * machine_management/cluster_api_machine_management/cluster-api-getting-started.adoc +// * machine_management/cluster_api_machine_management/cluster-api-troubleshooting.adoc + +:_mod-docs-content-type: REFERENCE +[id="machine-set-authoritative-api-machines_{context}"] += Authoritative API types of compute machines + +The authoritative API of a compute machine depends on the values of the `.spec.authoritativeAPI` and `.spec.template.spec.authoritativeAPI` fields in the Machine API compute machine set that creates it. + +.Interaction of `authoritativeAPI` fields when creating compute machines +[cols="h,1,1,1,1"] +|=== +|`.spec.authoritativeAPI` value +|`ClusterAPI` +|`ClusterAPI` +|`MachineAPI` +|`MachineAPI` + +|`.spec.template.spec.authoritativeAPI` value +|`ClusterAPI` +|`MachineAPI` +|`MachineAPI` +|`ClusterAPI` + +|`authoritativeAPI` value for new compute machines +|`ClusterAPI` +|`ClusterAPI` +|`MachineAPI` +|`ClusterAPI` +|=== + +[NOTE] +==== +When the `.spec.authoritativeAPI` value is `ClusterAPI`, the Machine API machine set is not authoritative and the `.spec.template.spec.authoritativeAPI` value is not used. +As a result, the only combination that creates a compute machine with the Machine API as authoritative is where the `.spec.authoritativeAPI` and `.spec.template.spec.authoritativeAPI` values are `MachineAPI`. +==== diff --git a/modules/capi-mapi-migration-overview.adoc b/modules/mapi-to-capi-migration-overview.adoc similarity index 98% rename from modules/capi-mapi-migration-overview.adoc rename to modules/mapi-to-capi-migration-overview.adoc index 93c1fda2f5..75b669a1cd 100644 --- a/modules/capi-mapi-migration-overview.adoc +++ b/modules/mapi-to-capi-migration-overview.adoc @@ -3,7 +3,7 @@ // * machine_management/cluster_api_machine_management/cluster-api-getting-started.adoc :_mod-docs-content-type: CONCEPT -[id="capi-mapi-migration-overview_{context}"] +[id="mapi-to-capi-migration-overview_{context}"] = Migrating Machine API resources to Cluster API resources On clusters that support migrating Machine API resources to Cluster API resources, a two-way synchronization controller creates the following Cluster API resources in the `openshift-cluster-api` namespace: diff --git a/modules/migrating-between-capi-mapi.adoc b/modules/migrating-between-capi-mapi.adoc index 8524a11d21..d06495db30 100644 --- a/modules/migrating-between-capi-mapi.adoc +++ b/modules/migrating-between-capi-mapi.adoc @@ -147,7 +147,7 @@ ifdef::cluster-to-machine[] Do not delete any nonauthoritative resource that does not use the current authoritative API unless you want to delete the corresponding resource that does use the current authoritative API. When you delete a nonauthoritative resource that does not use the current authoritative API, the synchronization controller deletes the corresponding resource that does use the current authoritative API. -//For more information, see "Unexpected resource deletion behavior" in the _Troubleshooting resource conversion_ content. +For more information, see "Unexpected resource deletion behavior" in the _Troubleshooting resource migration_ content. ==== endif::[] diff --git a/modules/ts-capi-migrate-aws-creds.adoc b/modules/ts-capi-migrate-aws-creds.adoc new file mode 100644 index 0000000000..3c4322305a --- /dev/null +++ b/modules/ts-capi-migrate-aws-creds.adoc @@ -0,0 +1,12 @@ +// Module included in the following assemblies: +// +// * machine_management/cluster_api_machine_management/cluster-api-troubleshooting.adoc + +:_mod-docs-content-type: CONCEPT +[id="ts-capi-migrate-aws-creds_{context}"] += Migrating {aws-short} cloud credentials + +//KCS draft not ready for publication + +The two-way synchronization controller that maintains changes between Machine API and Cluster API resources does not automatically copy {aws-first} credential secrets. +For more information, see the Red{nbsp}Hat Knowledgebase article link:https://access.redhat.com/articles/7116313[Migrate AWS cloud credentials between Machine API and Cluster API]. \ No newline at end of file diff --git a/modules/ts-capi-migrate-sync-label-annotation.adoc b/modules/ts-capi-migrate-sync-label-annotation.adoc new file mode 100644 index 0000000000..c6fc00f4c2 --- /dev/null +++ b/modules/ts-capi-migrate-sync-label-annotation.adoc @@ -0,0 +1,30 @@ +// Module included in the following assemblies: +// +// * machine_management/cluster_api_machine_management/cluster-api-troubleshooting.adoc + +:_mod-docs-content-type: CONCEPT +[id="ts-capi-migrate-sync-label-annotation_{context}"] += Incomplete synchronization of labels and annotations + +The label and annotation synchronization behavior differs between the Machine API and the Cluster API. +In some cases, these differences cause the two-way synchronization controller to overwrite labels on a Cluster API machine during migration. + +Cause:: + +With the Machine API, changes to machine set labels and annotations do not propagate to existing machines and nodes. +These changes only apply to machines deployed after the update. ++ +With the Cluster API, changes to machine set labels and annotations propagate to existing machines and nodes. +When the authoritative API for a machine set changes from Machine API to Cluster API, its labels propagate to the Cluster API machines that it manages. +The propagation happens before the Cluster API machine is marked as authoritative. + +Consequence:: + +The two-way synchronization controller overwrites any propagated labels and annotations with the earlier value, leading to an inconsistency. +This outcome only occurs when removing a label or annotation. +Updates and additional labels or annotations do not cause this inconsistency. + +Workaround:: + +There is no workaround for this issue. +For more information, see link:https://issues.redhat.com/browse/OCPBUGS-54333[OCPBUGS-54333]. \ No newline at end of file diff --git a/modules/ts-capi-migrate-unexpected-machine-counts-scaling.adoc b/modules/ts-capi-migrate-unexpected-machine-counts-scaling.adoc new file mode 100644 index 0000000000..fd361a94b1 --- /dev/null +++ b/modules/ts-capi-migrate-unexpected-machine-counts-scaling.adoc @@ -0,0 +1,46 @@ +// Module included in the following assemblies: +// +// * machine_management/cluster_api_machine_management/cluster-api-troubleshooting.adoc + +:_mod-docs-content-type: CONCEPT +[id="ts-capi-migrate-unexpected-machine-counts-scaling_{context}"] += Unexpected machine counts after scaling + +On clusters that support migrating resources between the Machine API and the Cluster API, users might experience unexpected behavior when scaling the number of compute machines. +The output of the `oc get` command for a compute machine set that does not use the authoritative API might contain inaccurate values in the `CURRENT`, `READY`, and `AVAILABLE` columns. + +Cause:: + +The values that populate the `CURRENT`, `READY`, and `AVAILABLE` columns originate in the `.status` stanza of a compute machine set. +The two-way synchronization controller that handles resource conversion between authoritative API types does not currently synchronize values in the `.status` stanza. ++ +The value in the `DESIRED` column reflects the `.spec.replicas` value of a compute machine set. +The two-way synchronization controller synchronizes values in the `.spec` stanza. + +Consequence:: + +Users can expect to see the following behavior when scaling migrated machine sets: ++ +-- +. Start with a compute machine set with existing machines. +. Migrate the machine set to use a different authoritative API. +. Scale the now authoritative machine set up by setting a larger value in the `.spec.replicas` field. +. The machine set creates machines with the current authoritative API to satisfy the number of requested replicas. +. Scale the authoritative machine set down such that one of the following conditions causes the deletion of machines that do not use the current authoritative API: +** The total number of replicas requested is fewer than the number of machines that do not use the current authoritative API. +** The machine deletion policy for the machine set selects machines that do not use the current authoritative API. +. Check the status of the nonauthoritative compute machine set by running the `oc get` command. +** The value in the `DESIRED` column in the output reflects the `.spec.replicas` value. +** The values in the `CURRENT`, `READY`, and `AVAILABLE` columns reflect the original number of replicas that existed before scaling the machine set. +-- + +Workaround:: + +To verify that a scale-down operation successfully deleted the compute machines that do not use the current authoritative API, run the `oc get` command that lists the nonauthoritative compute machines. + +Result:: + +If the scale-down operation succeeded, the count in the output of the `oc get` command for the nonauthoritative compute machines reflects the `.spec.replicas` value of the machine set. + +//OCPCLOUD-2994 +//OCPCLOUD-2995 \ No newline at end of file diff --git a/modules/ts-capi-migrate-unsupported-features.adoc b/modules/ts-capi-migrate-unsupported-features.adoc new file mode 100644 index 0000000000..31e10018a2 --- /dev/null +++ b/modules/ts-capi-migrate-unsupported-features.adoc @@ -0,0 +1,83 @@ +// Module included in the following assemblies: +// +// * machine_management/cluster_api_machine_management/cluster-api-troubleshooting.adoc + +:_mod-docs-content-type: REFERENCE +[id="ts-capi-migrate-unsupported-features_{context}"] += Unsupported configuration options + +The Machine API does not support all configuration options for the Cluster API. +Some Machine API configurations cannot migrate to the Cluster API. +Additional configuration options might be supported in a future release. + +Attempting to use the following configurations might cause a migration to fail or result in errors. + +[NOTE] +==== +This list might not be exhaustive. +==== + +.General limitations + +* Machine API compute machines cannot migrate to the Cluster API unless the `NodeDeletionTimeout` field uses the Cluster API default value of `10s`. + +* {product-title} does not support using the following Cluster API fields in the `spec.template.spec` stanza of a machine set or the `spec` stanza of a machine: + +** `version` +** `readinessGates` +//OCPCLOUD-2714 + +* The Machine API does not support using the following Cluster API drain configuration options: + +** `nodeDrainTimeout` +** `nodeVolumeDetachTimeout` +** `nodeDeletionTimeout` +//OCPCLOUD-2715 + +* The Cluster API does not support propagating labels or taints from machines to nodes. +//OCPCLOUD-2861 + +.{aws-first} limitations + +* Machine API compute machines cannot use {aws-short} load balancers. +//OCPCLOUD-2709 + +* The Machine API does not support using the following Amazon EC2 Instance Metadata Service (IMDS) configuration options: ++ +-- +** `httpEndpoint` +** `httpPutResponseHopLimit` +** `instanceMetadataTags` +-- ++ +If you migrate a Cluster API machine template that uses IMDS configuration options to a Machine API compute machine set, expect the following behaviors: ++ +-- +** Any machines that the migrated Machine API machine set creates will not have these fields. +The underlying instances will not use these settings. +** Any existing machines that the migrated machine set manages will retain these fields. +The underlying instances will continue to use these settings. +-- +//OCPCLOUD-2710 + +* {product-title} does not support using the following {aws-short} machine template fields: + +** `spec.ami.eksLookupType` +** `spec.cloudInit` +** `spec.ignition.proxy` +** `spec.ignition.tls` +** `spec.imageLookupBaseOS` +** `spec.imageLookupFormat` +** `spec.imageLookupOrg` +** `spec.networkInterfaces` +** `spec.privateDNSName` +** `spec.securityGroupOverrides` +** `spec.uncompressedUserData` +//OCPCLOUD-2711 + +* The Cluster API does not support orphaning a nonroot EBS volume when its underlying {aws-short} EC2 instance is removed. +When an instance is terminated, the Cluster API removes all dependent volumes. +//OCPCLOUD-2717 + +* When migrating a Machine API resource to the Cluster API, the ignition version is hard-coded and might not match the user data secret that is passed through. +//OCPCLOUD-2719 diff --git a/modules/ts-capi-sync-list-duplicate-resources.adoc b/modules/ts-capi-sync-list-duplicate-resources.adoc new file mode 100644 index 0000000000..d2be1a1add --- /dev/null +++ b/modules/ts-capi-sync-list-duplicate-resources.adoc @@ -0,0 +1,55 @@ +// Module included in the following assemblies: +// +// * machine_management/cluster_api_machine_management/cluster-api-troubleshooting.adoc + +:_mod-docs-content-type: CONCEPT +[id="ts-capi-sync-list-duplicate-resources_{context}"] += Duplicated machine set and machine resources + +On clusters that support migrating Machine API resources to Cluster API resources, some resources seem to have duplicate instances in the output of {oc-first} commands that list resources and in the {product-title} web console. + +Cause:: + +When you install an {product-title} cluster that uses the default configuration options, the installation program provisions the following infrastructure resources in the `openshift-machine-api` namespace: ++ +-- +* One control plane machine set that manages three control plane machines. +* One or more compute machine sets that manage three compute machines. +* One machine health check that manages spot instances. +* Compute machines that are created according to the compute machine set specifications. +-- ++ +On clusters that support migrating Machine API resources to Cluster API resources, a two-way synchronization controller creates the following Cluster API resources in the `openshift-cluster-api` namespace: ++ +-- +* One cluster resource. +* One provider-specific infrastructure cluster resource. +* One or more machine templates that correspond to compute machine sets. +* One or more compute machine sets that manage three compute machines. +* Compute machines that are created according to the machine template and compute machine set specifications. +* Infrastructure machines that correspond to compute machines. +-- ++ +These Cluster API resources have the same names as their counterparts in the `openshift-machine-api` namespace. + +Consequence:: + +Due to this behavior, instances of machine set and machine resources that seem to be duplicates appear in the output of `oc` commands that list resources and in the {product-title} web console. + +Workaround:: + +Although the resources have the same names as their counterparts in the other namespace, only the resources that use the current authoritative API are active. +The synchronization controller creates and maintains the corresponding resources that do not use the current authoritative API in an unprovisioned (`Paused`) state to prevent unintended reconciliation. + +Result:: + +Only one of each resource that seems to be a duplicate is active at a time. +The inactive nonauthoritative resources do not impact functionality. ++ +[IMPORTANT] +==== +Do not delete any nonauthoritative resource that does not use the current authoritative API unless you want to delete the corresponding resource that does use the current authoritative API. + +When you delete a nonauthoritative resource that does not use the current authoritative API, the synchronization controller deletes the corresponding resource that does use the current authoritative API. +For more information, see "Unexpected resource deletion behavior". +==== \ No newline at end of file