1
0
mirror of https://github.com/openshift/installer.git synced 2026-02-06 09:47:02 +01:00

99 Commits

Author SHA1 Message Date
Roman Dobosz
c476567887 Refactor removing loadbalancer in OpenStack.
Currently, the approach for removing OpenStack loadbalancers is to look
for appropriate tag (i.e. openshiftClusterID=<cluster_id>) and then
delete it. Issue with this approach is that no such tags are applied on
the loadbalancer resources, and also there is no such tag in description
field. Hence, deleteLoadBalancer function will give 0 results for
existing loadbalancers, as they have no such tag either in tags nor on
description.

With this patch, deleteLoadBalancer has been refactored to get all the
loadbalancers and filter out that resources which have ClusterID in
description, so that they will be safely deleted.
2025-06-04 12:09:14 +02:00
Pierre Prinetti
42e471ee57 openstack: Update Gophercloud to v2 2024-06-18 10:39:56 +02:00
Michał Dulko
c781171412 Remove support for Kuryr
In 4.15 Kuryr is no longer a supported NetworkType, following its
deprecation in 4.12. This commit removes mentions of Kuryr from the
documentation and code, but also adds validation to prevent
installations from being executed when `networkType` is set to `Kuryr`.
2023-11-14 15:06:19 +01:00
Stephen Finucane
a979a0edd1 openstack: Use centralised OpenStack client
Avoid the duplication of configuring the client in multiple locations.
It also gives us a single point to start configuring a user agent for
the installer.

Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2023-10-03 11:34:02 +01:00
Michał Dulko
c89d3ac9d4 OpenStack: Remove SGS created by CPO on destroy
cloud-provider-openstack can be configured to create security groups for
the NodePorts of the load balancers it is creating. These SGs are then
attached to the nodes. On `cluster destroy` we're orphaning them. This
commit makes sure that we're looking for them.

As they aren't tagged or have a proper cluster ID in the name, we will
look at each of the ports, list its SGs and evaluate them comparing
names with the pattern. If it matches, `destroy` will attempt to delete
such SG.
2023-08-01 16:12:47 +02:00
OpenShift Merge Robot
15403a5ae7 Merge pull request #7133 from shiftstack/include-provided-port
OpenStack: support user provided dual-stack api and ingress Port
2023-06-12 14:30:49 -04:00
Maysa Macedo
229feb0e72 OpenStack: support user provided dual-stack api and ingress Port
When using dual-stack installations the user needs to pre-create
the api and ingress port given OpenStack does not allow direct
assignment of addresses when using slaac/stateless, consequently
the installer can't create those. This commit adds support to tag
those Ports, assign security groups to them, attach the Floating IP
when needed and allow clean up of resources.
2023-05-15 16:35:33 +02:00
Pierre Prinetti
2aee86869e openstack destroy: account for BULK DELETE limits on object-storage
Some object-storage instances may be set to have a limit to the LIST
operation that is higher to the limit to the BULK DELETE operation. On
those clouds, objects in the BULK DELETE call beyond the limit are
silently ignored. As a consequence, the call to destroy the container
fails and object deletion is re-queued after a growing waiting time,
potentially dilating deletion by hours.

With this change, object bulk deletion is put in a loop. After checking
that no errors were encountered, we reduce the BULK DELETE list by the
number of processed objects, and send it back to the server. As a
consequence, the object deletion routines should only complete when the
container is emtpy, thus avoiding the 409 error that causes a retry.
2023-05-08 10:05:23 +02:00
Pierre Prinetti
c1af83a2f6 openstack destroy: Limit Swift workers to 3
down from 10. More than that killed RabbitMQ on a standalone OpenStack
cloud.

Also add a smidge of logging information.
2023-05-05 12:25:47 +02:00
Pierre Prinetti
e876e320c7 openstack/destroy: BulkDelete more objects at once
With this change:
* listing of container object is no longer limited to 50, but left to
  the default (which is 10000 on a standard Swift configuration);
* the object deletion calls are issued in concurrent goroutines rather
  than serially, giving Swift a chance to work them in parallel. The
  limit is set to 10 concurrent goroutines.

The goal of this change is to tackle waiting times on OCP destroy on
clusters with massive amounts of data stored in OpenStack object
storage.
2023-03-24 15:09:06 +01:00
OpenShift Merge Robot
fee01a8b0d Merge pull request #6663 from shiftstack/bump_gophercloud
openstack: Revert Gophercloud workaround
2023-01-06 23:46:11 -05:00
OpenShift Merge Robot
7f980fc9a2 Merge pull request #6656 from shiftstack/refactor_deps
openstack: Rely on Go's stdlib for errors
2022-12-21 18:56:39 -05:00
Pierre Prinetti
a6d7a4b29e openstack: Revert Gophercloud workaround
With the bump to Gophercloud v1.1.1, the library should be able to
handle HTTP status 204 responses without `content-type` without
erroring. The workaround that was in place to force contentful responses
can then be removed.
2022-12-21 17:46:09 +01:00
OpenShift Merge Robot
1e05df75d1 Merge pull request #6707 from shiftstack/204_nocontent_objects
OCPBUGS-4941: OpenStack: Force JSON content-type in Swift object listing
2022-12-19 07:04:53 -05:00
Pierre Prinetti
3b6dbeba17 OpenStack: Force JSON content-type in Swift object listing
Some OpenStack object storages respond with `204 No Content` to list
requests when there are no containers or objects to list. In these
cases, when responding to requests with an `Accept: text/plain` or no
`Accept` header, some object storages omit the `content-type` header in
their status-204 responses.

Now, Gophercloud throws an error when the response does not contain a
`content-type` header.

With this change, we work around the issue by forcing Gophercloud to
request a JSON response from the object storage when listing objects.
When passed an `Accept: application/json` header, the server responds
with `200 Ok` and a `content-type` header in our tests.

This solution gives us a fix that is easily backportable because it
doesn't require any dependency bump.
2022-12-18 21:49:59 +01:00
Pierre Prinetti
0b796af553 openstack: Rely on Go's stdlib for errors
The package github.com/pkg/errors has been archived and its
functionality integrated in Go's standard library.
2022-12-18 21:48:30 +01:00
Rafael Fonseca
ef95c1bcd4 linter: fix issues since rev 75173a17cf 2022-12-16 18:14:21 +01:00
Rafael Fonseca
80e02a974d chore: fix import order 2022-12-13 15:40:58 +01:00
Pierre Prinetti
f7b763187c OCPBUGS-3933: OpenStack: Force JSON content-type in Swift
Some OpenStack object storages respond with `204 No Content` to list
requests when there are no containers or objects to list. In these
cases, when responding to requests with an `Accept: text/plain` or no
`Accept` header, some object storages omit the `content-type` header in
their status-204 responses.

Now, Gophercloud throws an error when the response does not contain a
`content-type` header.

With this change, we work around the issue by forcing Gophercloud to
request a JSON response from the object storage when listing containers.
When passed an `Accept: application/json` header, the server responds
with `220 Ok` and a `content-type` header in our tests.

This solution gives us a fix that is easily backportable because it
doesn't require any dependency bump.

Note that in my local tests, I didn't find the 'full' listing to take
more time than the short, name-only response we were requesting Swift
prior to this change.
2022-11-23 14:26:16 +01:00
Michał Dulko
deb173b6a1 OpenStack: Fix LoadBalancer FIP deletion on destroy
d2630f2995 implemented deleting ports from
networks even if they're untagged. The motivation was to not block
destroy when some untagged ports were orphaned on the network (in
Neutron tagging is a separate operation that can fail).

As this was always done before deleting the network, we also deleted the
LoadBalancer Services VIP ports (untagged). This means that we couldn't
track down FIPs created for these Services and these were orphaned.

This commit makes sure that we only attempt to delete untagged ports on
409 failure to delete the network. We also do that only after successful
deletion of the LBs to make sure all the Service FIPs are already
tracked down and taken care of.
2022-06-03 16:59:28 +02:00
Stephen Finucane
1512de72ab Bug 1965468: Revert "Bug 1909136: OpenStack: delete volume snapshots"
This reverts commit a272e59b99. As noted
in the corresponding bug [1], the expectation was always to revert this
once cloud-provided-openstack started providing cluster ID information
in snapshot metadata, which has been the case for some time now [2].

Conflicts:
  pkg/destroy/openstack/openstack.go

Changes:
  pkg/destroy/openstack/openstack.go

NOTE(stephenfin): Conflicts are due to commit 375fe6f389 ("OpenStack:
Optimize cluster deletion") which changed two blocks so that we now
continue in a loop rather than returning early. We introduce this same
logic into a newly restored block inside 'deleteSnapshots' to prevent a
regression here.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1965468
[2] https://github.com/kubernetes/cloud-provider-openstack/pull/1544

Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2022-05-17 17:55:12 +01:00
Roman Dobosz
d2630f2995 Delete all the ports from tagged Neutron networks.
Currently, we are trying to remove ports, which are tagged, but
sometimes it happens, that port is created but tagging has failed, or we
didn't get response from Neutron API, resulting with untagged port.

Having network already tagged we can use its id to select all the port
within such network, so that all the ports, not only tagged would be
deleted.
2022-04-29 10:22:46 +02:00
Michał Dulko
39dce19e06 OpenStack: Parallelize port deletion
OpenStack Neutron is slow but it's sometimes able to handle quite a few
requests in parallel. This commit splits the operation of Neutron port
deletion into 10 concurrent goroutines to speed it up. This mostly aids
Kuryr case when we can have hundreds of ports created that block
deletion of other resources.
2021-12-01 11:00:27 +01:00
Martin André
a53ddd5206 Validate OpenStack supports resource tagging
Networking resources tagging is a hard requirement for OpenShift on
OpenStack and we should refuse from running the installer when the
underlying OpenStack platform does not support it.

Also, the destroy script may delete unmanaged resources when network
tagging is not available. With this patch, the destroy script will
refuse to work when network tagging is not available.

Fixes Bug 2013877

Co-authored-by: Martin André <m.andre@redhat.com>
Co-authored-by: Pierre Prinetti <pierreprinetti@redhat.com>
2021-10-26 15:43:12 +02:00
Steve Kuznetsov
f8ba68a263 pkg/destroy/gcp: report cluster footprint in quota
When we destroy an entity in the public cloud relating to a cluster, we
now record the impact that the item had to the quota in the account that
the cluster was provisioned in. This will allow for downstream users of
the installer to reason about the footprint of clusters they run,
allowing for more automated reasoning about how many clusters of a type
can fit into an account.

Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
2021-09-07 09:58:32 -07:00
Maysa Macedo
37df7484fa Consider all Networks on Router clean up
When performing a cluster destroy we should
look for all cluster tagged Networks that might be
connected to the router, regardless if it was created
by CAPO. This commit updates the filtering for the
look up of Networks and ensure to skip the Subnet
when no Gateway is set.
2021-09-06 17:38:22 +02:00
Maysa Macedo
1d87b0858b Bug 1993364: openstack/destroy: fix Kuryr/BYON
When using byon with Kuryr the Router would be identified based
on the Primary Network used by the Servers, which would be filtered
by the device-owner compute:nova. However, when using azs the
device-owner name can change. This commit fixes the issue by
identifying the router by looking for any tagged networks that
has a subnet connected to the router. The gorouting for this
approach can't be run before other clean ups because the CNO
would always attempt to re-connect the service subnet to the
router.

Co-Authored-By: Maysa Macedo <maysa.macedo95@gmail.com>
Co-Authored-By: Emilien Macchi <emilien@redhat.com>
2021-09-06 12:31:46 +02:00
Michał Dulko
fdf0c8336b OpenStack: Remove FIPs of LBs created by cloud-provider
To implement LoadBalancer Services OpenStack Cloud Provider creates an
LB in Octavia for each of them. Those LBs have floating IPs associated
to the LB VIPs. As this is handled by the Cloud Provider itself,
Octavia's cascade delete will not remove those FIPs and we need to take
care of that ourselves.

This commit adds deleting of the FIPs associated to the LBs on
destroying cluster. In order to do that correctly it was required to
split deleting routers into two separate functions and make sure that
FIP detach and actual removal of the routers is happening after all the
delete functions finished (meaning that LBs are gone too). This is
because if FIP detach happens before we lose information about which FIP
was attached to which LB effectively preventing us from handling
deletion of them.

Only the function detaching the subnets from the routers is running as
part of main deletion step now.
2021-07-02 17:38:33 +02:00
Martin André
375fe6f389 OpenStack: Optimize cluster deletion
Previously each destroy function would exit on the first conflict and
it was expected to retry on a next iteration, hoping that in the mean
time the conflict that prevented removal of the resource was fixed.
While this strategy works, it is also slower then necessary. A better
strategy is to try deleting all resources and ignore the ones that have
conflicts. On the next iteration there will be less conflicts.

This patch was tested on a openstack cluster with openshiftSDN, and
we observed a 37% faster cluster deletion. I expect the performance
boost to be significantly higher for clusters using kuryr.
2021-06-22 09:10:21 +02:00
Martin André
aa13fa6ae4 Bug 1971518: Try deleting associated trunk after port delete failure
There could be cases where the trunk is not properly tagged, for
example the UPI scripts do not set tags to trunk since the openstack
client doesn't support it.

Failure to delete trunks could result in the destroy command stuck in
a loop until it hits the timeout.

In these cases, the cluster destroy should be smart enough to try
deleting trunks for which the tagged port is a parent.
2021-06-15 14:52:52 +02:00
Mike Fedosin
0eed570ead Bug 1820238: delete manila shares and snapshots along with the cluster
This commit adds a function that deletes manila shares and their
snapshots when the cluster is being destroyed.
2021-06-03 10:24:31 +02:00
Mike Fedosin
a272e59b99 Bug 1909136: OpenStack: delete volume snapshots
It turned out that Cinder CSI driver doesn't attach cluster id to
volume snapshot metadata. To workaround this we delete snapshots
based on their volume IDs.
2021-05-26 20:13:01 +02:00
Maysa Macedo
ef7e922f75 Fix Routers destroy with BYO
When using your own Network for the Machines
connected to a tagged Router created by CNO
on an installation with Kuryr, the Subnet of
that Network is not tagged and consequently
not identified by installer when destroying
the Router. This commit fixes the issue by
ensuring all Subnetes connected to a Router
created by Installer or CNO are removed.
2021-05-14 12:19:54 +02:00
Mike Fedosin
fb8f8fb243 Bug 1909136: destroy volumes and snapshots created by Cinder CSI driver
Now installer deletes only volumes created by the in-tree provisioner,
omitting those created by the CSI driver, which leads to resource
leakage.
This commit starts deleting CSI volumes and snapshots as well.
2021-04-20 16:23:53 +02:00
OpenShift Merge Robot
0bc7d5bb51 Merge pull request #4561 from shiftstack/bz-1786314
Bug 1786314: Bump dependencies
2021-02-17 02:09:27 -05:00
Pierre Prinetti
567f66058c Upgrade OpenStack dependencies
* github.com/gophercloud/gophercloud
* github.com/gophercloud/utils
* github.com/terraform-provider-openstack/terraform-provider-openstack

Also adjust a call to servergroup.List to comply with the new function
signature[1].

[1]: gophercloud/gophercloud#2070

Signed-off-by: Emilien Macchi <emilien@redhat.com>
2021-02-16 11:47:55 -05:00
Maysa Macedo
d0ca8925a5 Fix FIP detach from Router
During a FIP less installation VMs that were not
created by the installer can get a FIP detached when
the cluster in being destroyed. This commit fixes the
issue by detaching only FIPs that were created by
installer or Kuryr during a FIP less installation.
2021-02-09 16:36:39 -03:00
Maysa Macedo
0225c5e810 Fix cluster destroy when byo is used with Kuryr
When using byo the cluster destroy was relying on the
primary network tag to identify the router used and remove
the extra interfaces. As the provided network is not tagged
anymore the cluster is not able to get cleaned up. This commit
fixes the issue by relying on the network used by the Servers
to identify the router used.
2021-02-04 17:29:08 +00:00
Martin André
003baff185 Bug 1916692: OpenStack: Delete leftover LBs when destroying cluster
If the user destroyed a cluster without removing all the associated
service LBs, the `destroy` command would fail to remove the network
and loop until it hits the timeout.

The destroy command now looks if there are any leftover LBs where its
`VipNetworkID` matches the network ID and deprovisions it.  We filter on
services LBs created by the openstack cloud provider, matching the
`Kubernetes external service` string in the description [1], to ensure
we're not destroying a user-created resource by mistake.

[1] https://github.com/openshift/kubernetes/blob/442a69c/staging/src/k8s.io/legacy-cloud-providers/openstack/openstack_loadbalancer.go#L446
2021-02-03 15:15:46 +01:00
Mike Fedosin
05453ef0df Bug 1813949: ignore local env variables when we create a service client
This commit explicitly disables reading auth data from env variables
by setting an invalid EnvPrefix. By doing this, we make sure that the
data from clouds.yaml is enough to authenticate.

After this change we don't have to unset OS_CLOUD env variable explicitly
anymore.

Ref https://issues.redhat.com/browse/OSASINFRA-2152
2021-01-11 13:00:11 +01:00
Mike Fedosin
9162bd29bf Code cleanup and optimizations
This commit fixes issues that were found by:
`golangci-lint run pkg/... --disable-all -E gosimple -E unused`
2020-10-15 15:40:31 +02:00
Mike Fedosin
86f6896913 Bug 1876815: OpenStack: unset OS_CLOUD
We should unset OS_CLOUD env variable during cloudinfo and session
generation, and cluster destruction. We have to do it because the
real cloud name is defined by user in the install-config. OS_CLOUD
has more priority, so the user-defined value will be ignored if
OS_CLOUD contains something.

/label platform/openstack
2020-10-01 18:40:09 +02:00
Maysa Macedo
81c1a0fbaa Fix cluster destroy on a FIP less installation
In case the machines Subnet is not connected to a Router
there is not a need to clean-up the interfaces from the
custom Router as no additional interfaces would have
been created on it. Also, when using Kuryr if a Service
of LoadBalancer type was created, a floating ip would get
created for the load balancer and the removal of the service
subnet from the router would be blocked.

This commit fixes both issues by ignoring the custom Router
clean-up when no Router is found and moving the custom
Router clean-up to after the load balancers removal.
2020-09-18 13:15:42 +02:00
Maysa Macedo
ac6814f171 Add support to clean custom router
To support a fip less installation and bring your
own network when using Kuryr, new interfaces are
added to a custom Router that may exist, enabling
traffic between Pods, Services and VMs. As the
Router is not created by the installer its interfaces
must be cleaned up upon cluster Destroy.

This commit solves the issue by discovering the
Router through the Primary Network and the gateway
interface attached to the Router.
2020-09-11 18:01:28 +02:00
Mike Fedosin
ae13c97ee6 Bug 1862044: deleting servers using metadata-based filtering
In https://github.com/openshift/installer/pull/3818 we introduced
tag-based server delition. Unfortunately it turned out we can destroy
servers, created by previous versions, as no tags were set there.

To fix this situation we come back to the previous solution - deleting
servers by metadata.
2020-08-10 13:47:41 +02:00
Mike Fedosin
3af06a67ab OpenStack: replace error type assertions with errors.As() 2020-07-29 15:59:51 +02:00
OpenShift Merge Robot
9944c1ac13 Merge pull request #3818 from Fedosin/delete_servers
OpenStack: Deleting servers using tag-based filtering
2020-07-16 02:47:39 -04:00
Mike Fedosin
286b09fa1f Bug 1857188: OpenStack: skip container deletion if it was removed
During deletion of containers, we get a list of available containers
first, and then we iterate through them to find the ones we need,
based on their metadata.
Since this is not an atomic operation, it may happen that the containers
can be removed at runtime. We should ignore these cases and continue
to iterate through the remaining ones.
2020-07-15 14:35:21 +02:00
Mike Fedosin
013c689805 [OpenStack] Introduce bulk deletion of Swift objects
Now we delete Swift objects sequentially, Which, in the case of a
large number of objects, can take a very long time.
Gophercloud 0.11.0 added the possibility of objects bulk deletion:
https://github.com/gophercloud/gophercloud/blob/master/CHANGELOG.md#0110-may-14-2020

This commit starts using this feature to delete multiple objects
with one request. This should significantly improve cluster removal
time.

Implements: https://issues.redhat.com/browse/OSASINFRA-1059
2020-07-13 12:24:05 +02:00
Mike Fedosin
e7bf767b96 OpenStack: Deleting servers using tag-based filtering
Now to delete servers we first get a list of all available servers
from Nova, and then we iterate through them to find those with
required metadata. In the case of a large number of servers, this
can take a very long time.

Fortunately gophercloud introduced filtering by tags, so we can
start using this feature to get only servers with the required tag.
https://github.com/gophercloud/gophercloud/pull/1759
2020-07-01 15:13:02 +02:00