1
0
mirror of https://github.com/openshift/installer.git synced 2026-02-05 15:47:14 +01:00
Commit Graph

16122 Commits

Author SHA1 Message Date
Rafael Fonseca
f5aeff2cfa OCPBUGS-36327: capi/aws: bump provider for instance register fix
This bump includes the following fix to only register instances to the
LB when they are in a running state:
* https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/5040

This should avoid unnecessary AWS API calls and red herring error
messages in the log output.
2024-08-15 11:20:10 +02:00
openshift-merge-bot[bot]
4f78aec1c8 Merge pull request #8794 from openshift-cherrypick-robot/cherry-pick-8787-to-release-4.16
[release-4.16] OCPBUGS-37838: fix bogus analyze message when gather fails
2024-08-06 14:15:13 +00:00
openshift-merge-bot[bot]
38b9a82f07 Merge pull request #8767 from shiftstack/OCPBUGS-37492
OCPBUGS-37492: openstack: Fix security group tagging
2024-08-06 08:50:44 +00:00
Rafael Fonseca
fb6ca447ac CORS-2775: analyze: replace deprecated github.com/pkg/errors 2024-08-01 18:25:04 +00:00
Rafael Fonseca
5edecaff96 OCPBUGS-34953: analyze: improve error msg when gather fails
Give a better error message when the gather logs are not collected
rather than
```
time="2024-06-05T08:34:45-04:00" level=error msg="The bootstrap machine did not execute the release-image.service systemd unit"
```
The release-image service could have been executed but we don't know if
the installer cannot connect to the bootstrap node (e.g, in a private
install).
2024-08-01 18:25:04 +00:00
openshift-merge-bot[bot]
c9dc88b24d Merge pull request #8778 from openshift-cherrypick-robot/cherry-pick-8759-to-release-4.16
[release-4.16] OCPBUGS-37607: bootstrap gather fails in vsphere, only ipv6 address used
2024-08-01 16:21:31 +00:00
openshift-merge-bot[bot]
ade29f4fbb Merge pull request #8768 from openshift-cherrypick-robot/cherry-pick-8688-to-release-4.16
[release-4.16] OCPBUGS-37494: aws: do not require create permissions when BYO IAM role
2024-08-01 00:51:59 +00:00
openshift-merge-bot[bot]
a6bd634265 Merge pull request #8772 from r4f4/aws-subnet-tags-fix-4.16
OCPBUGS-37510: [release-4.16] aws: bump CAPA for subnet tagging fix
2024-07-31 03:32:35 +00:00
Joseph Callen
7b3b84f59f bootstrap gather fails in vsphere, only ipv6 address used
The machine manifests from capi have multiple addresses including
ipv4 and ipv6. In vSphere CI specifically ipv6 is non-routed
and since that is the first address in the list is being
used by default. This causes bootstrap gather failures.

This PR returns all available addresses from the machine
manifest and for vSphere only prioritizes the IPv4
address.

returned wrong var
2024-07-26 11:47:28 +00:00
openshift-merge-bot[bot]
86cb1053da Merge pull request #8769 from openshift-cherrypick-robot/cherry-pick-8744-to-release-4.16
[release-4.16] OCPBUGS-37180: ic: fix typo in warning message
2024-07-25 07:19:55 +00:00
Rafael Fonseca
64c9d060e0 capi/aws: update vendor 2024-07-24 14:36:42 +02:00
Rafael Fonseca
956047a357 OCPBUGS-36904: aws: bump CAPA for subnet tagging fix
This bump includes the following fix:
 * https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/5051
2024-07-24 14:34:17 +02:00
Rafael Fonseca
fb88aebf19 OCPBUGS-36780: ic: fix typo in warning message
The field name is ImageDigestSources.
2024-07-24 12:13:37 +00:00
openshift-merge-bot[bot]
940458c4fe Merge pull request #8734 from r4f4/aws-ingress-rules-fix-4.16
OCPBUGS-36968: [release-4.16]: capi/aws: bump provider for ingress rules fix
2024-07-24 11:58:50 +00:00
Rafael Fonseca
cd0788453a aws/permissions: add delete ignition permission unit tests. 2024-07-24 07:59:41 +00:00
Rafael Fonseca
1c40221166 aws/permissions: add create/delete base permission unit tests. 2024-07-24 07:59:41 +00:00
Rafael Fonseca
3f9b509bd8 aws/permissions: add PublicIPv4Pool permission unit tests. 2024-07-24 07:59:40 +00:00
Rafael Fonseca
8f6d5c25b5 aws/permissions: add Hosted Zone permission unit tests. 2024-07-24 07:59:40 +00:00
Rafael Fonseca
6bc545aa0d aws/permissions: add VPC permissions unit tests. 2024-07-24 07:59:40 +00:00
Rafael Fonseca
f9f7c355b7 aws/permissions: add KMS key permission unit tests. 2024-07-24 07:59:40 +00:00
Rafael Fonseca
89dbae1d26 aws/permissions: add IAM role permissions unit tests
This should help with making sure that create/delete* permissions for
IAM roles are required only when needed.
2024-07-24 07:59:40 +00:00
Rafael Fonseca
3b016287a0 aws: move permission list generation to its own function.
This will allow for it to be reused without code duplication and for it
to be unit tested.
2024-07-24 07:59:39 +00:00
Rafael Fonseca
6408878a71 OCPBUGS-36390: aws: do not require create permissions when BYO IAM role
The Installer is unconditionally requiring permissions needed to create
IAM roles even when the users use existing roles. They should be
included only when needed.
2024-07-24 07:59:39 +00:00
Rafael Fonseca
19f2231a91 cluster/aws: tag existing IAM instance roles with "shared".
This change kind of reverts
https://github.com/openshift/installer/pull/5286. IAM roles created by
the Installer are now consistently tagged with "owned". We should also
tag BYO roles so we know which clusters are using them, and so that it's
not deleted by the installer during cluster destroy.
2024-07-24 07:59:39 +00:00
Pierre Prinetti
819024655b openstack: Fix security group tagging
Before this patch, we used the Neutron call to add tags to the newly
created security groups. However, that API doesn't accept tags
containing special characters such as slash (`/`), even when
url-encoded.

With this change, the security groups are tagged with an alternative API
call (replace-all-tags) which accepts the tags in a JSON object.
Apparently, Neutron accepts special characters (including slash) when
they come in a JSON object.
2024-07-24 09:44:41 +02:00
openshift-merge-bot[bot]
38d2f1de5d Merge pull request #8733 from openshift-cherrypick-robot/cherry-pick-8649-to-release-4.16
[release-4.16] OCPBUGS-36965: destroy/gcp: set value for DiscardLocalSsd
2024-07-23 07:00:45 +00:00
openshift-merge-bot[bot]
41969e2919 Merge pull request #8716 from openshift-cherrypick-robot/cherry-pick-8703-to-release-4.16
[release-4.16] OCPBUGS-36720: CORS-3582: capi: remove unused feature gates
2024-07-16 10:35:44 +00:00
openshift-merge-bot[bot]
9798164f23 Merge pull request #8726 from r4f4/capi-run-controller-fail-fix-4.16
OCPBUGS-36890: [release-4.16] capi: start controllers after WaitGroup is created
2024-07-15 07:59:39 +00:00
Rafael Fonseca
4c06fd88f9 data: capi/aws: update infra CRD 2024-07-14 22:41:00 +02:00
Rafael Fonseca
76afad8c06 capi/aws: update vendor 2024-07-14 22:40:46 +02:00
Rafael Fonseca
defd47f6be OCPBUGS-35440: capi/aws: bump provider for ingress rules fix
* Brings in this fix
  https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/5024
  to avoid unnecessary revoke-authorize of ingress rules.
2024-07-14 22:39:13 +02:00
Rafael Fonseca
1bc40f376e destroy/gcp: set value for DiscardLocalSsd
Instance types for which `OnHostMaintenance` is set to `Terminate`, GCP
requires the `DiscardLocalSsd` value to be defined, otherwise we get the
following error when destroying a cluster:
```
WARNING failed to stop instance jiwei-0530b-q9t8w-worker-c-ck6s8 in zone us-central1-c: googleapi: Error 400: VM has a Local SSD attached but an undefined value for `discard-local-ssd`. If using gcloud, please add `--discard-local-ssd=false` or `--discard-local-ssd=true` to your command., badRequest
```

We are setting the value to `true` because we are about to destroy the
cluster, which means destroying the instances and all cluster-owned
resources.
2024-07-13 15:25:10 +00:00
openshift-merge-bot[bot]
e7483e6e3b Merge pull request #8654 from r4f4/cve-go-retryablehttp-4.16
OCPBUGS-36091: [release-4.16] bump go-retryablehttp for CVE fix
2024-07-12 10:09:04 +00:00
openshift-merge-bot[bot]
69099f638a Merge pull request #8719 from openshift-cherrypick-robot/cherry-pick-8599-to-release-4.16
[release-4.16] OCPBUGS-36777: Cleanup capi artifacts
2024-07-12 05:39:34 +00:00
Rafael Fonseca
66845adc18 capi: shutdown system even when capi failed to run
`clusterapi.System().Run()` is not atomic and it can fail after local
controlplane, kube-apiserver, etcd or some controllers are already
running. To make sure the capi system is properly shut down and the etcd
data is cleaned up on errors, we need to `defer` the cleanup before we
even attempt to run the capi system.
2024-07-11 18:58:40 +02:00
Rafael Fonseca
ce46bbfbc9 capi: always stop local control plane in capi Teardown
Instead of doing the control plane shutdown as part of the controllers
shutdown process, it should be done at Teardown time instead. This makes
sure that local control plane binaries are stopped even when we fail to
create controllers, for example when creating a cloud session for
controller setup.
2024-07-11 18:58:40 +02:00
Rafael Fonseca
7097ca340f OCPBUGS-36378: capi: start controllers after WaitGroup is created
Some providers like Azure require 2 controllers to run. If a controller
fails to be spawned (e.g cluster-api-provider-azureaso), we were not
stopping controllers that were already running (e.g. the cluster-api,
cluster-api-provider-azure), resulting in leak processes even though the
Installer reported it had stopped the capi system:

```
ERROR failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to run cluster api system: failed to run controller "azureaso infrastructure provider": failed to start controller "azureaso infrastructure provider": timeout waiting for process cluster-api-provider-azureaso to start successfully (it may have failed to start, or stopped unexpectedly before becoming ready)
INFO Shutting down local Cluster API control plane...
INFO Local Cluster API system has completed operations
```

By just changing the order of operations to run the controller *after*
the WaitGroup is created, we are able to properly shutdown all running
controllers and the local control plane in case of error:

```
ERROR failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to run cluster api system: failed to run controller "aws infrastructure provider": failed to extract provider "aws infrastructure provider": fake error
INFO Shutting down local Cluster API control plane...
INFO Stopped controller: Cluster API
INFO Local Cluster API system has completed operations
```
2024-07-11 18:58:40 +02:00
Rafael Fonseca
18c2497d68 capi: do not exit if controller fails to extract.
Doing `logrus.Fatal` when a controller fails to be extracted means that
we abort the installer process without giving it a chance to stop the
capi-related processes that are still running.

Let's just return an error instead and let the Installer go through the
normal capi shutdown procedure.
2024-07-11 18:58:40 +02:00
Brent Barbachem
85ebd2ec16 OCPBUGS-35370: Cleanup capi artifacts
** The .cluster_api directory and the auth directory contain extra artifacts that should be
removed on cluster destroy.
2024-07-09 22:24:11 +00:00
Rafael Fonseca
21aa2394c7 CORS-3582: capi: remove unused feature gates
Remove feature gate conditionals for platforms where CAPI is the
default.
2024-07-09 07:26:28 +00:00
openshift-merge-bot[bot]
e1f9f057ce Merge pull request #8705 from openshift-cherrypick-robot/cherry-pick-8557-to-release-4.16
[release-4.16] OCPBUGS-36607: aws: remove terraform configs
2024-07-09 04:02:23 +00:00
openshift-merge-bot[bot]
37233fc5e2 Merge pull request #8692 from mike-nguyen/416_bib
OCPBUGS-36324: update RHCOS 4.16 bootimage metadata to 416.94.202406282145-0
2024-07-08 21:25:48 +00:00
Michael Nguyen
2a633b88d3 update RHCOS 4.16 bootimage metadata to 416.94.202406251923-0
These changes will update the RHCOS 4.16 boot image metadata. Notable
changes in this update is:

OCPBUGS-36147 - Resizing LUKS on 512e disk causes ignition-ostree-growfs
    to fail with "Device size is not aligned to requested sector size."

This change was generated using:

```
    plume cosa2stream --target data/data/coreos/rhcos.json                \
        --distro rhcos --no-signatures --name 4.16-9.4                   \
        --url https://rhcos.mirror.openshift.com/art/storage/prod/streams \
        x86_64=416.94.202406251923-0                                      \
        aarch64=416.94.202406251923-0                                     \
        s390x=416.94.202406251923-0                                       \
        ppc64le=416.94.202406251923-0

```
2024-07-08 13:12:25 -04:00
openshift-merge-bot[bot]
1c7338c4c7 Merge pull request #8694 from openshift-cherrypick-robot/cherry-pick-8628-to-release-4.16
[release-4.16] OCPBUGS-36447: capi/aws: disable EKS controller in CAPA
2024-07-07 15:47:19 +00:00
Rafael Fonseca
91477e8826 OCPBUGS-35188: aws: remove terraform configs
With CAPI being the default, these configs are not used anymore and as
such are prone to be unmaintained. Users who still wish to use the
configs can access them in the 4.15 branch where they are still
maintained.
2024-07-05 07:26:46 +00:00
openshift-merge-bot[bot]
9c913a54e0 Merge pull request #8683 from openshift-cherrypick-robot/cherry-pick-8671-to-release-4.16
[release-4.16] OCPBUGS-36351: vSphere - If the folder pre-exists do not tag
2024-07-04 19:32:31 +00:00
Rafael Fonseca
07b4634909 OCPBUGS-35752: capi/aws: disable EKS controller in CAPA
The EKS controller feature gate is enabled by default in CAPA, which
causes the following lines to show up in the logs:

```
time="2024-06-18T11:43:59Z" level=debug msg="I0618 11:43:59.613409     349 logger.go:75] \"enabling EKS controllers and webhooks\" logger=\"setup\""
time="2024-06-18T11:43:59Z" level=debug msg="I0618 11:43:59.613416     349 logger.go:81] \"EKS IAM role creation\" logger=\"setup\" enabled=false"
time="2024-06-18T11:43:59Z" level=debug msg="I0618 11:43:59.613420     349 logger.go:81] \"EKS IAM additional roles\" logger=\"setup\" enabled=false"
time="2024-06-18T11:43:59Z" level=debug msg="I0618 11:43:59.613425     349 logger.go:81] \"enabling EKS control plane controller\" logger=\"setup\""
time="2024-06-18T11:43:59Z" level=debug msg="I0618 11:43:59.613449     349 logger.go:81] \"enabling EKS bootstrap controller\" logger=\"setup\""
time="2024-06-18T11:43:59Z" level=debug msg="I0618 11:43:59.613464     349 logger.go:81] \"enabling EKS managed cluster controller\" logger=\"setup\""
time="2024-06-18T11:43:59Z" level=debug msg="I0618 11:43:59.613496     349 logger.go:81] \"enabling EKS managed machine pool controller\" logger=\"setup\""
```

Although harmless, they can be confusing for users. This change
disables the feature so the lines are gone and we are not running
controllers unnecessarily.
2024-07-02 16:54:49 +00:00
openshift-merge-bot[bot]
9dfd90d9da Merge pull request #8672 from openshift-cherrypick-robot/cherry-pick-8661-to-release-4.16
[release-4.16] OCPBUGS-36286: PowerVS: Add ibmcloud plugins
2024-07-01 14:21:50 +00:00
openshift-merge-bot[bot]
c67b70315c Merge pull request #8632 from dtantsur/proxy-icc-4.16
OCPBUGS-35818: baremetal: bootstrap: bind icc to localhost
2024-07-01 04:51:42 +00:00
Joseph Callen
489117b20a folder unit test changes
- remove duplicate test
- since capv infrastructure can and will create custom
folders the check that makes sure the folder pre-exists is no
longer valid. Changed the test to pass if there is no expected
error.
2024-06-29 07:27:32 +00:00