From 8a067897c215ce0eddfe8889fc640cb7327e7b6f Mon Sep 17 00:00:00 2001 From: cbippley Date: Wed, 25 Sep 2024 15:41:28 -0400 Subject: [PATCH] OSDOCS-11208 Oc & bare metal bugs for 4.17 RNs --- release_notes/ocp-4-17-release-notes.adoc | 80 ++++++++++++++++++++--- 1 file changed, 71 insertions(+), 9 deletions(-) diff --git a/release_notes/ocp-4-17-release-notes.adoc b/release_notes/ocp-4-17-release-notes.adoc index f1392f8b26..ab0266667e 100644 --- a/release_notes/ocp-4-17-release-notes.adoc +++ b/release_notes/ocp-4-17-release-notes.adoc @@ -1271,7 +1271,7 @@ In the following tables, features are marked with the following statuses: === Deprecated features [id="ocp-4-17-preserveBootstrapIgnition-deprecated_{context}"] -==== The `preserveBootstrapIgnition` parameter for {aws-short} +==== The `preserveBootstrapIgnition` parameter for {aws-short} The `preserveBootstrapIgnition` parameter for {aws-short} in the `install-config.yaml` file has been deprecated. You can use the `bestEffortDeleteIgnition` parameter instead. (link:https://issues.redhat.com/browse/OCPBUGS-33661[*OCPBUGS-33661*]) @@ -1331,6 +1331,10 @@ Starting in {product-title} 4.17, RukPak is now removed and relevant functionali [id="ocp-4-17-bare-metal-hardware-bug-fixes_{context}"] ==== Bare Metal Hardware Provisioning +* Previously, attempting to configure RAID on specific hardware models by using Redfish might have resulted in the following error: "The attribute StorageControllers/Name is missing from the resource." With this update, the validation logic no longer requires the `Name` field, because it is not mandated by the Redfish standard. (link:https://issues.redhat.com/browse/OCPBUGS-38465[*OCPBUGS-38465*]) + +* Previously, the management interface for the iDRAC9 Redfish management interface in the Redfish Bare Metal Operator (BMO) module was incorrectly set to iPXE. This caused the error "Could not find the following interface in the 'ironic.hardware.interfaces.management' entrypoint: ipxe." and deployment failed on Dell Remote Access Controller (iDRAC)-based servers. With this release, the issue is resolved. (link:https://issues.redhat.com/browse/OCPBUGS-37261[*OCPBUGS-37261*]) + [discrete] [id="ocp-4-17-builds-bug-fixes_{context}"] ==== Builds @@ -1411,13 +1415,13 @@ Starting in {product-title} 4.17, RukPak is now removed and relevant functionali * Previously, the AWS cloud controller manager within a hosted control plane that was running on a proxied management cluster would not use the proxy for cloud API communication. With this release, the issue is fixed. (link:https://issues.redhat.com/browse/OCPBUGS-37832[*OCPBUGS-37832*]) -* Previously, proxying for Operators that run in the control plane of a hosted cluster was performed through proxy settings on the Konnectivity agent pod that runs in the data plane. It was not possible to distinguish if proxying was needed based on application protocol. +* Previously, proxying for Operators that run in the control plane of a hosted cluster was performed through proxy settings on the Konnectivity agent pod that runs in the data plane. It was not possible to distinguish if proxying was needed based on application protocol. + -For parity with {product-title}, IDP communication via HTTPS or HTTP should be proxied, but LDAP communication should not be proxied. This type of proxying also ignores `NO_PROXY` entries that rely on host names because by the time traffic reaches the Konnectivity agent, only the destination IP address is available. +For parity with {product-title}, IDP communication via HTTPS or HTTP should be proxied, but LDAP communication should not be proxied. This type of proxying also ignores `NO_PROXY` entries that rely on host names because by the time traffic reaches the Konnectivity agent, only the destination IP address is available. + With this release, in hosted clusters, proxy is invoked in the control plane via `konnectivity-https-proxy` and `konnectivity-socks5-proxy`, and proxying traffic is stopped from the Konnectivity agent. As a result, traffic that is destined for LDAP servers is no longer proxied. Other HTTPS or HTTPS traffic is proxied correctly. The `NO_PROXY` setting is honored when you specify hostnames. (link:https://issues.redhat.com/browse/OCPBUGS-37052[*OCPBUGS-37052*]) -* Previously, proxying for IDP communication occurred in the Konnectivity agent. By the time traffic reached Konnectivity, its protocol and hostname were no longer available. As a consequence, proxying was not done correctly for the OAUTH server pod. It did not distinguish between protocols that require proxying (http/s) and protocols that do not (ldap://). In addition, it did not honor the `no_proxy` variable that is configured in the `HostedCluster.spec.configuration.proxy` spec. +* Previously, proxying for IDP communication occurred in the Konnectivity agent. By the time traffic reached Konnectivity, its protocol and hostname were no longer available. As a consequence, proxying was not done correctly for the OAUTH server pod. It did not distinguish between protocols that require proxying (http/s) and protocols that do not (ldap://). In addition, it did not honor the `no_proxy` variable that is configured in the `HostedCluster.spec.configuration.proxy` spec. + With this release, you can configure the proxy on the Konnectivity sidecar of the OAUTH server so that traffic is routed appropriately, honoring your `no_proxy` settings. As a result, the OAUTH server can communicate properly with identity providers when a proxy is configured for the hosted cluster. (link:https://issues.redhat.com/browse/OCPBUGS-36932[*OCPBUGS-36932*]) @@ -1457,7 +1461,7 @@ link:https://issues.redhat.com/browse/OCPBUGS-39428[(*OCPBUGS-39428*)] * Previously, the installation program attempted to download the OVA on {vmw-first} whether the template field was defined or not. With this update, the issue is resolved. The installation program verifies if the template field is defined. If the template field is not defined, the OVA is downloaded. If the template field is defined, the OVA is not downloaded. (link:https://issues.redhat.com/browse/OCPBUGS-39240[*OCPBUGS-39240*]) * Previously, enabling custom feature gates sometimes caused installation on an AWS cluster to fail if the feature gate `ClusterAPIInstallAWS=true` was not enabled. With this release, the `ClusterAPIInstallAWS=true` feature gate is not required. -(link:https://issues.redhat.com/browse/OCPBUGS-34708[*OCPBUGS-34708*]) +(link:https://issues.redhat.com/browse/OCPBUGS-34708[*OCPBUGS-34708*]) * Previously, some processes could be left running if the installation program exited due to infrastructure provisioning failures. With this update, all installation-related processes are terminated when the installation program terminates. (link:https://issues.redhat.com/browse/OCPBUGS-36378[*OCPBUGS-36378*]) @@ -1465,7 +1469,7 @@ link:https://issues.redhat.com/browse/OCPBUGS-39428[(*OCPBUGS-39428*)] * Previously, long cluster names were trimmed without warning the user. With this update, the installation program warns the user when trimming long cluster names. (link:https://issues.redhat.com/browse/OCPBUGS-33840[*OCPBUGS-33840*]) -* Previously, the `openshift-install` CLI sometimes failed to connect to the bootstrap node when collecting bootstrap gather logs. The installation program reported an error message such as `The bootstrap machine did not execute the release-image.service systemd unit`. With this release and after the bootstrap gather logs issue occurs, the installation program now reports `Invalid log bundle or the bootstrap machine could not be reached and bootstrap logs were not collected`, which is a more accurate error message. (link:https://issues.redhat.com/browse/OCPBUGS-34953[*OCPBUGS-34953*]) +* Previously, the `openshift-install` CLI sometimes failed to connect to the bootstrap node when collecting bootstrap gather logs. The installation program reported an error message such as `The bootstrap machine did not execute the release-image.service systemd unit`. With this release and after the bootstrap gather logs issue occurs, the installation program now reports `Invalid log bundle or the bootstrap machine could not be reached and bootstrap logs were not collected`, which is a more accurate error message. (link:https://issues.redhat.com/browse/OCPBUGS-34953[*OCPBUGS-34953*]) * Previously, when installing a cluster on {aws-short}, subnets that the installation program created were incorrectly tagged with the `kubernetes.io/cluster/: shared` tag. With this update, these subnets are correctly tagged with the `kubernetes.io/cluster/: owned` tag. (link:https://issues.redhat.com/browse/OCPBUGS-36904[*OCPBUGS-36904*]) @@ -1494,7 +1498,7 @@ link:https://issues.redhat.com/browse/OCPBUGS-39428[(*OCPBUGS-39428*)] * Previously, when installing a cluster with the Agent-based installer, the assisted-installer process could timeout when attempting to add control plane nodes to the cluster. With this update, the assisted-installer process loads fresh data from the assisted-service process, preventing the timeout. (link:https://issues.redhat.com/browse/OCPBUGS-36779[*OCPBUGS-36779*]) * Previously, when the {vmw-full} vCenter cluster contained an ESXi host that did not have a standard port group defined and the installation program tried to select that host to import the OVA, the import failed and the error “Invalid Configuration for device '0'” was presented. -With this release, the installation program verifies whether a standard port group for an ESXi host is defined and, if not, continues until it locates an ESXi host with a defined standard port group or presents an error message if it fails to locate one, resolving the issue. +With this release, the installation program verifies whether a standard port group for an ESXi host is defined and, if not, continues until it locates an ESXi host with a defined standard port group or presents an error message if it fails to locate one, resolving the issue. (link:https://issues.redhat.com/browse/OCPBUGS-38560[*OCPBUGS-38560*]) * Previously, extracting the IP address from the Cluster API Machine object only returned a single IP address. On {vmw-first}, the returned address would always be an IPv6 address and this caused issues with the `must-gather` implementation if the address was non-routable. With this release, the Cluster API Machine object returns all IP addresses, including IPv4, so that the `must-gather` issue no longer occurs on {vmw-full}. (link:https://issues.redhat.com/browse/OCPBUGS-37607[*OCPBUGS-37607*]) @@ -1509,7 +1513,7 @@ With this release, the installation program verifies whether a standard port gro * Previously, when setting `platform.openstack.controlPlanePort.network` without a `fixedIPs` value, the installation program would output a misleading error message about the network missing subnets. With this release, the installation program validates that the `install-config` field `controlPlanePort` has a valid subnet filter set because it is a required value. (link:https://issues.redhat.com/browse/OCPBUGS-37104[*OCPBUGS-37104*]) -* Previously, adding IPv6 support for user-provisioned installation platforms caused an issue with naming {rh-openstack-first} resources, especially when you run two user-provisioned installation clusters on the same {rh-openstack-first} platform. This happened because the two clusters share the same names for network, subnets, and router resources. With this release, all the resources names for a cluster remain unique for that cluster so no interfere occurs. (link:https://issues.redhat.com/browse/OCPBUGS-33973[*OCPBUGS-33973*]) +* Previously, adding IPv6 support for user-provisioned installation platforms caused an issue with naming {rh-openstack-first} resources, especially when you run two user-provisioned installation clusters on the same {rh-openstack-first} platform. This happened because the two clusters share the same names for network, subnets, and router resources. With this release, all the resources names for a cluster remain unique for that cluster so no interfere occurs. (link:https://issues.redhat.com/browse/OCPBUGS-33973[*OCPBUGS-33973*]) * Previously, when installing a cluster on {ibm-power-server-name} with installer-provisioned infrastructure, the installation could fail due to load balancer timeouts. With this update, the installation program waits for the load balancer to be available instead of timing out. (link:https://issues.redhat.com/browse/OCPBUGS-34869[*OCPBUGS-34869*]) @@ -1563,7 +1567,7 @@ With this release, the installation program verifies whether a standard port gro * Previously, if machine config pools (MCP) had a higher `maxUnavailable` value than the cluster's number of unavailable nodes, cordoned nodes were able to be erroneously selected as an update candidate. This fix adds a node readiness check in the node controller so that cordoned nodes are queued for an update. (link:https://issues.redhat.com/browse/OCPBUGS-33397[*OCPBUGS-33397*]) * Previously, nodes could be drained twice if the node was queued multiple times in the drain controller. This behaviour might have been due to increased activity on the node object by on-cluster layering functionality. With this fix, a node queued for drain only drains once. (link:https://issues.redhat.com/browse/OCPBUGS-33134[*OCPBUGS-33134*]) - + * Previously, a potential panic was seen in Machine Config Controller and Machine Build Controller objects if a de-reference accidentally deleted `MachineOSConfig/MachineOSBuild` to read the build status. The panic is controlled with additional error conditions to warn for allowed MachineOSConfig deletions. (link:https://issues.redhat.com/browse/OCPBUGS-33129[*OCPBUGS-33129*]) * Previously, after upgrading from {product-title} 4.1 or 4.2 to version 4.15, some machines could get stuck during provisioning and never became available. This was because the `machine-config-daemon-firstboot` service was failing due to an incompatible `machine-config-daemon` binary on those nodes. With this release, the correct `machine-config-daemon` binary is copied to nodes before booting. (link:https://issues.redhat.com/browse/OCPBUGS-28974[*OCPBUGS-28974*]) @@ -1714,6 +1718,64 @@ metadata: [id="ocp-4-17-openshift-cli-bug-fixes_{context}"] ==== OpenShift CLI (oc) +* Previously, when using oc-mirror plugin v2 with the `--delete` flag to remove operator catalogs from mirror registries, the process failed with the following error: ++ +[source,terminal] +---- +2024/08/02 12:18:03 [ERROR]: [OperatorImageCollector] pinging container registry localhost:55000: Get "https://localhost:55000/v2/": http: server gave HTTP response to HTTPS client. +---- ++ +This occurred because oc-mirror plugin v2 was querying the local cache using HTTPS instead of HTTP. With this update, the HTTP client is now properly configured before the query, resolving the issue.(link:https://issues.redhat.com/browse/OCPBUGS-41503[*OCPBUGS-41503*]) + +* Previously, when using the oc-mirror plugin v2 in mirror-to-disk mode, catalog images and contents were stored in `subfolders` under `working-dir`, based on the image digest. During the disk-to-mirror process in fully disconnected environments, the plugin tried to resolve the catalog image tag through the source registry, which was unavailable, leading to such errors: +[source.terminal] ++ +---- +[ERROR] : [OperatorImageCollector] pinging container registry registry.redhat.io: Get "http://registry.redhat.io/v2/": dial tcp 23.217.255.152:80: i/o timeout +---- ++ +With this update, the plugin checks the local cache during the disk-to-mirror process to determine the digest, avoiding the need to query the registry. (link:https://issues.redhat.com/browse/OCPBUGS-36214[*OCPBUGS-36214*]) + +* Previously, when using oc-mirror plugin v2 in mirror-to-disk mode in disconnected environments, the plugin was unable to access `api.openshift.com` to download `graph.tar.gz`, resulting in mirroring failures. With this update, the plugin now searches the local cache for the graph image in disconnected environments where the `UPDATE_URL_OVERRIDE` environment variable is set. If the graph image is missing, the plugin skips it without failing. (link:https://issues.redhat.com/browse/OCPBUGS-38469[*OCPBUGS-38469*]) + +* Previously, oc-mirror plugin v2 failed to mirror operator catalogs from disk-to-mirror in fully disconnected environments. This issue also affected catalogs that specified a `targetCatalog` in the `ImageSetConfiguration` file. With this update, the plugin can now successfully mirror catalogs in fully disconnected environments, and the `targetCatalog` functionality works as expected. (link:https://issues.redhat.com/browse/OCPBUGS-34521[*OCPBUGS-34521*]) + +* Previously, with the oc-mirror plugin v2, there was no validation for the `-v2` vs `--v2` flags for the `oc-mirror` command. As a result, users who mistakenly used `-v2`, which sets the log level to 2, instead of `--v2`, which switches to oc-mirror plugin v2, received unclear error messages. With this update, flag validation is provided. If the `-v2` flag is used while the `ImageSetConfig` is using the `v2alpha1` API and `--v2` is not specified, an error message is displayed. The following message is now enabled that provides a clear guidance to the user: ++ +[source,terminal] +---- +[ERROR]: Detected a v2 `ImageSetConfiguration`, please use `--v2` instead of `-v2`. +---- ++ +(link:https://issues.redhat.com/browse/OCPBUGS-33121[*OCPBUGS-33121*]) + +* Previously, oc-mirror plugin v2 did not automatically perform retries when it encountered errors on registries, such as timeouts, expired authentication tokens, HTTP 500 errors, and so on. With this update, retries for these errors are implemented, and users can configure retry behavior with the following flags: ++ +** `--retry-times`: Specifies the number of retry attempts. Default is 2. +** `--retry-delay`: Sets the delay between retries. Default is 1 second. +** `--image-timeout`: Defines the timeout period for mirroring an image. Default is 10 minutes. +** `--max-parallel-downloads`: Controls the maximum number of layers to pull simultaneously during a single copy operation. Default is 6. ++ +(link:https://issues.redhat.com/browse/OCPBUGS-34021[*OCPBUGS-34021*]) + +* Previously, when using the oc-mirror plugin v2 with the `--rebuild-catalogs` flag, the catalog cache was regenerated locally, which caused failures either due to compatibility issues with the `opm` binary and the platform or due to cache integrity problems on the cluster. With this update, the `--rebuild-catalogs` flag defaults to true, so catalogs are rebuilt without regenerating the internal cache. Additionally, the image command has been modified to generate the cache during pod startup, which may delay pod initialization. (link:https://issues.redhat.com/browse/OCPBUGS-37667[*OCPBUGS-37667*]) + +* Previously, the oc-mirror plugin v2 did not use the system proxy configuration to recover signatures for releases when running behind a proxy with system proxy settings. With this release, the system proxy settings are now applied during the signature recovery process. (link:https://issues.redhat.com/browse/OCPBUGS-37055[*OCPBUGS-37055*]) + +* Previously, oc-mirror plugin v2 would stop the mirroring process when it encountered Operators using bundle versions that were not compliant with semantic versioning, which also prevented the creation of cluster resources like IDMS, ITMS, and `CatalogSource` objects. With this fix, the plugin now skips these problematic images instead of halting the process. If an image uses incorrect semantic versioning, a warning message is displayed in the console with the relevant image details. (link:https://issues.redhat.com/browse/OCPBUGS-33081[*OCPBUGS-33081*]) + +* Previously, oc-mirror plugin v2 did not generate `ImageDigestMirrorSet` (IDMS) or `ImageTagMirrorSet` (ITMS) files when mirroring failed due to network issues or invalid Operator catalogs. With this update, the `oc-mirror` continues mirroring other images when Operator or additional images fail, and stops only when release images fail. Cluster resources are generated based on successfully mirrored images, and all errors are collected in a log file for review. (link:https://issues.redhat.com/browse/OCPBUGS-34020[*OCPBUGS-34020*]) + +* Previously, {product-title} release images were not visible in certain registries, such as {quay}. This prevented users from installing {product-title} due to the missing release images. With this update, release images are always tagged to ensure they appear in registries like {quay}, enabling proper installation. (link:https://issues.redhat.com/browse/OCPBUGS-36410[*OCPBUGS-36410*]) + +* Previously, the `oc adm must-gather` command took a long time to gather CPU-related performance data in large clusters. With this release, the data is gathered in parallel instead of sequentially, which shortens the data collection time. (link:https://issues.redhat.com/browse/OCPBUGS-34360[*OCPBUGS-34360*]) + +* Previously, the `oc set env` command incorrectly changed the API version of `Route` and `DeploymentConfig` objects, for example, `apps.openshift.io/v1` became `v1`. This caused the command to exit with `unable to recognize no matches for kind` errors. With this release, the error is fixed so that the `os set env` command keeps the correct API version in `Route` and `DeploymentConfig` objects. (link:https://issues.redhat.com/browse/OCPBUGS-32108[*OCPBUGS-32108*]) + +* Previously, when a `must-gather` operation failed for any reason and the user manually deleted the leftover namespace, a cluster role binding created by the `must-gather` command would remain in the cluster. With this release, when the temporary `must-gather` namespace is deleted, the associated cluster role binding is automatically deleted with it. (link:https://issues.redhat.com/browse/OCPBUGS-31848[*OCPBUGS-31848*]) + +* Previously, when using the `--v2` flag with the oc-mirror plugin v2, if no images were mirrored and some were skipped, empty `imds.yaml` and `itms.yaml` files were generated. With this release, the custom resource generation is only triggered when at least one image is successfully mirrored, preventing the creation of empty files. (link:https://issues.redhat.com/browse/OCPBUGS-33775[*OCPBUGS-33775*]) + [discrete] [id="ocp-4-17-olm-bug-fixes_{context}"] ==== Operator Lifecycle Manager (OLM)