When a clusternetwork entry has an invalid hostPrefix, which is <= CIDR
mask and the custom IPv4 join subnet is provided in the install-config,
the installer gives a runtime panic error: negative shift amount.
This introduces a check to return a more user-friendly and descriptive
error message instead of runtime panic.
Previously clusters created with minimum permissions in existing VPC
(unmanaged VPC or BYO VPC) and BYO Public IPv4 Pool address (BYO IP)
on AWS failed to de-provision cluster without permissions to release
EIP address (ec2:ReleaseAddress).
This change ensures ec2:ReleaseAddress permission is exported to the
install-generated IAM policy when deploying a cluster on AWS with
BYO VPC and BYO Public IPv4 Pool.
clusternetwork CIDRs are present
When multiple clusternetwork CIDRs are present, hostPrefix fields must
be specified with the same value. If not, it can impact traffic between
pods in different subnets.
The patch applies validation for IPv4 CIDR. For IPv6, the only option
for hostPrefix is 64, thus naturally satisfying the requirement.
References:
[0] https://issues.redhat.com/browse/OCPBUGS-46514 (debug steps and explanation for root cause)
[1] https://access.redhat.com/solutions/7100460 (temporary solution for existing cluster)
Previously the subnets created by user (BYO VPC) on edge zones (Local or
Wavelength zones) were not tagged with
kubernetes.io/cluster/<InfraID>:shared.
This change ensures installer is also setting the same cluster tag as
regular zones.
When the host that runs the OpenShift install is configured with
IPv6 only, the kube-apiserver created with envtest would fail
as the service-cluster-ip-range would be configured with a default
IPv4 CIDR and the public address family, which is the host address,
would be configured with an IPv6. This commit fixes the issue by setting
a default IPv6 CIDR to service-cluster-ip-range, in case the host
has no IPv4 available.
Since CI is using tag latest, we can also use the same tag here. There
might be a case where latest mirrored image on CI points to an older
version. In that case, we can adjust local version accordingly.
Current version: golangci-lint v1.63.4
In OCPBUGS-47489, we see that some resources, particularly global
backend services are being leaked during the destroy process.
Analysis of the creation time stamps for the leaked resources shows
that the resources are clustered together, suggesting the leaks
may occur during periods of heavy load.
During periods of heavy load, the deletion may take longer to
process. This commit addresses the issue by adding waits for all
resource deletion. This ensures ample time to complete destroy calls.
The GCP destroy code repeated a lot of boilerplate operation handling.
This refactors all of that into a single function for increased
maintainability.
As per https://github.com/openshift/enhancements/pull/1637, we're trying
to get rid of all OpenShift-versioned components from the bootimages.
This means that there will no longer be `oc`, `kubelet`, or `crio`
binaries for example, which bootstrapping obviously relies on.
Instead, now we change things up so that early on when booting the
bootstrap node, we pull down the node image, unencapsulate it (this just
means convert it back to an OSTree commit), then mount over its `/usr`,
and import new `/etc` content.
This is done by isolating to a different systemd target to only bring
up the minimum number of services to do the pivot and then carry on
with bootstrapping.
This does not incur additional reboots and should be compatible
with AI/ABI/SNO. But it is of course, a huge conceptual shift in how
bootstrapping works. With this, we would now always be sure that we're
using the same binaries as the target version as part of bootstrapping,
which should alleviate some issues such as AI late-binding (see e.g.
https://issues.redhat.com/browse/MGMT-16705).
The big exception of course being the kernel. Relatedly, note we do
persist `/usr/lib/modules` from the booted system so that loading kernel
modules still works.
To be conservative, the new logic only kicks in when using bootimages
which do not have `oc`. This will allow us to ratchet this in more
easily.
Down the line, we should be able to replace some of this with
`bootc apply-live` once that's available (and also works in a live
environment). (See https://github.com/containers/bootc/issues/76.)
For full context, see the linked enhancement and discussions there.
- Added a quota constraint for server groups with a default of 2, reducing to 1 when no worker nodes are provisioned.
- Added a quota constraint for server group members, equal to the number of instances provisioned.
When UserProvisineDNS is enabled, in addition to machine-config-server
cert file, also update the individual cert and key files within
the bootstrap Ignition.
ec2:DescribeInstanceTypeOfferings is used by machine pools to discover
supported instance type in the region and zones, when it isn't set in
the pool (control plane, compute, or edge).
The discover falls back to the m6i which is supported in mostly regions,
althrough some regions (e.g. ap-southeast-4 and eu-south-2) will fail as
that tye isn't supported. To the discover mechanism works properly
globally the permission must be added by default.
This permission is missing since minimum permissions has been
introduced, currently only the edge pool is including this permission.
This change moves that requirement to default create group.