1
0
mirror of https://github.com/openshift/installer.git synced 2026-02-05 15:47:14 +01:00
Commit Graph

18336 Commits

Author SHA1 Message Date
Pablo Fontanilla
5797d192d6 OCPEDGE-1517: add-tnf-agent-based-installer (#9946)
* agent/installconfig: Add two-node-with-fencing topology and refactor
two-node validation

* feat: add override for control plane fencing creds

Signed-off-by: ehila <ehila@redhat.com>

* Add TNF fencing credentials override test

* Update integration test with new validation result

* Update installer verification and tests to only allow URLs with redfish on them for Two Nodes with Fencing topology

* Update validation check for redfish

* Remove simultaneous dual replica feature set restriction

* Update fencing address validation to include port

* Update validation to disallow http

* Update and expand url validation tests

* Revert "Update validation to disallow http"

This reverts commit e9595a8d4f.

* Update variable name

* Update tests

* Add YAML tags to Credential struct for fencing

Add explicit yaml struct tags to the Credential type to ensure proper
YAML serialization with lowercase field names (e.g., 'hostname' instead
of 'hostName'). This is required for the assisted-service to correctly
parse the fencing credentials file.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Add fencing credentials file generation for TNF clusters

Generate /etc/assisted/hostconfig/fencing-credentials.yaml containing
all fencing credentials from controlPlane.fencing.credentials[]. This
file is embedded in the agent ISO and consumed by assisted-service
during TNF cluster installation.

Key changes:
- Add OptionalInstallConfig to Ignition Dependencies()
- Add addFencingCredentials() function to generate the YAML file
- Call addFencingCredentials() in Generate() after NTP sources
- Add comprehensive unit tests for the new function

The single-file approach avoids directory naming collisions between
MAC-based host directories and hostname-based fencing credentials.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Revert fencing credentials override

The fencing credentials are now passed to assisted-service via the
hostconfig/fencing-credentials.yaml file embedded in the ISO, making
the install-config annotation override unnecessary.

This reverts commits:
- 105b3c95c9 Add TNF fencing credentials override test
- a06d1a766b feat: add override for control plane fencing creds

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Improve fencing credentials test coverage

Enhance TestIgnition_addFencingCredentials with:
- File owner verification (assert root ownership)
- Append behavior test with pre-existing files
- Fix misleading test name and add second credential to match
  valid TNF configuration (2 credentials required)
- Remove unused expectError field from test struct

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Support vendor-specific redfish schemes in fencing validation

Vendor-specific redfish schemes like idrac-redfish:// and ilo5-redfish://
use HTTPS (port 443) by default, so they should be valid without an
explicit port number.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* unit tests: Add missing OptionalInstallConfig dependency in ignition test

The TestIgnition_Generate test was panicking because the
OptionalInstallConfig asset was missing from the test dependencies.
This caused dependencies.Get() to return a nil value when the
addFencingCredentials function tried to access it.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* agent: Refactor fencing credentials into standalone asset

Move fencing credentials generation from inline ignition.go code into a
proper FencingCredentials asset following the installer's asset pattern.

This refactor:
- Creates pkg/asset/agent/manifests/fencingcredentials.go as a
  WritableAsset with Dependencies, Generate, Files, and Load methods
- Adds comprehensive unit tests in fencingcredentials_test.go
- Integrates FencingCredentials into AgentManifests dependency graph
- Removes addFencingCredentials() from ignition.go
- Adds positive integration test for TNF with fencing credentials
- Changes output path from /etc/assisted/hostconfig/ to
  /etc/assisted/manifests/ (standard manifests location)

The asset automatically returns empty Files() for non-TNF clusters,
so no fencing-credentials.yaml is generated unless fencing is configured.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* agent: Improve fencing credentials code quality

- Add explicit YAML library aliasing for clarity (goyaml for marshal,
  k8syaml for unmarshal) with documentation explaining why different
  libraries are used for each operation
- Improve error message to include credential count for debugging
- Add test case for empty fencing credentials array

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* agent: Fix CI failures for gofmt and integration tests

- Add blank line between k8s.io and github.com/openshift import groups
  in ignition_test.go to satisfy gci formatting requirements
- Add featureSet: TechPreviewNoUpgrade to tnf_with_fencing_credentials
  integration test to enable the DualReplica feature gate required for
  TNF fencing configuration

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* agent: Move fencing credentials to agentconfig package

FencingCredentials is a host-scoped configuration asset, not a
cluster-scoped manifest. Moving it from manifests/ to agentconfig/
aligns with the package's purpose and follows the pattern used by
other host configuration assets like AgentHosts.

This change also updates ignition.go to import from the new location
and removes the now-unused fencing credentials from agent.go manifests.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* agent: Add FencingCredentials to ignition test dependencies

The TestIgnition_Generate test was failing with a panic because the
FencingCredentials asset was added as a dependency to Ignition.Generate()
but wasn't included in the test's buildIgnitionAssetDefaultDependencies()
helper function.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* agent: Fix fencing credentials path in integration test

The integration test expected the fencing credentials file at
/etc/assisted/manifests/ but assisted-service reads it from
/etc/assisted/hostconfig/ (HOST_CONFIG_DIR default). The installer
correctly embeds the file at hostconfig/, so the test expectation
was wrong.

Changed test path from manifests to hostconfig to match both the
installer implementation and assisted-service expectations.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* agent: Add nolint directive for gosec G101 false positive

The gosec linter flags fencingCredentialsFilename as "potential
hardcoded credentials" (G101) because the variable name contains
"credentials". This is a false positive - the variable contains
a filename string, not actual credentials.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* agent: Fix expected YAML field order in TNF integration test

The expected fencing-credentials.yaml had fields in a different order
than the actual YAML serialization output. Updated the expected file
to match the actual field order: hostname, username, password, address,
certificateVerification.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Signed-off-by: ehila <ehila@redhat.com>
Co-authored-by: ehila <ehila@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
2026-02-05 12:30:34 +00:00
openshift-merge-bot[bot]
5503756564 Merge pull request #10286 from vimauro/add-tna-platform-external
OCPEDGE-2276: Add support for platform None and External in TNA clusters
2026-02-04 19:59:49 +00:00
Vincenzo Mauro
ef751a9235 Add support for platform External in TNA clusters 2026-02-04 15:15:38 +01:00
openshift-merge-bot[bot]
2e77e61a96 Merge pull request #10236 from vimauro/tna-platform-none-support
OCPEDGE-2276: Add support for platform None in TNA (Two-Node Arbiter) clusters
2026-02-03 23:49:36 +00:00
openshift-merge-bot[bot]
9351917279 Merge pull request #10261 from tthvo/CORS-4055-cred-source
CORS-4055: migrate credential provider check to AWS SDK v2
2026-02-03 08:15:44 +00:00
openshift-merge-bot[bot]
1a283806d7 Merge pull request #10257 from tthvo/CORS-4328
CORS-4328: configure AWS CCM NodeIPFamilies for dual-stack support
2026-02-03 04:50:00 +00:00
openshift-merge-bot[bot]
e10bd06ca9 Merge pull request #10258 from tthvo/CORS-4055-elb
CORS-4055, CORS-4078: migrate ELB/ELBv2 API calls to AWS SDK v2
2026-02-03 00:09:45 +00:00
openshift-merge-bot[bot]
2a7b94f45c Merge pull request #10279 from jinyunma/fix-OCPBUGS-74631
OCPBUGS-74631: Add validation to reject userProvisionedDNS on Azure Stack Hub
2026-01-31 10:48:33 +00:00
openshift-merge-bot[bot]
8253a91853 Merge pull request #10267 from tthvo/CORS-4055-region
CORS-4055: migrate default region check to AWS SDK v2
2026-01-31 07:15:54 +00:00
Thuan Vo
3330f83b95 CORS-4055: migrate default region check to AWS SDK v2
The commit is an incremental step to migrate AWS API calls
to AWS SDK v2. This only focuses on logics to get the default region
from loaded config for the survey.
2026-01-30 14:25:27 -08:00
openshift-merge-bot[bot]
53147365d5 Merge pull request #10269 from patrickdillon/gcp-filter-ai-zone
OCPBUGS-74625: gcp: skip AI zones
2026-01-30 04:20:33 +00:00
Jinyun Ma
de51668ab4 Add validation to reject userProvisionedDNS on Azure Stack Hub
Custom DNS (userProvisionedDNS) is not supported on Azure Stack Hub. This
change adds validation to prevent users from setting userProvisionedDNS on
Azure Stack Hub.
2026-01-30 11:46:50 +08:00
Thuan Vo
08ad0f8617 CORS-4055, CORS-4078: migrate ELB/ELBv2 API calls to AWS SDK v2
The commit is an incremental step to migrate AWS API calls to AWS SDK
v2. This focuses on ELB/ELBv2 clients in the pkg/asset and dependent
pkg(s).

The ELB and ELBv2 clients now use SDK v2 with custom endpoint resolvers
that maintain backwards compatibility with SDK v1 service endpoint
configurations. Special handling is included for the fact that SDK v1 used
the same endpoint identifier ("elasticloadbalancing") for both ELB classic
and ELBv2, while SDK v2 uses distinct service IDs.
2026-01-29 13:07:08 -08:00
Patrick Dillon
fee6f94711 GCP: skip AI zones
Filter out AI zones when discovering zones in the region. AI zones
do not have quota for general compute resources, so we should not provision
nodes there by default.
2026-01-29 10:36:08 -05:00
openshift-merge-bot[bot]
277456d55f Merge pull request #10245 from tthvo/CORS-4055-iam
CORS-4055: migrate IAM API calls to AWS SDK v2
2026-01-28 17:59:20 +00:00
openshift-merge-bot[bot]
c44c2dbd93 Merge pull request #10242 from tthvo/CORS-4055-s3
CORS-4055: migrate S3 API calls to AWS SDK v2
2026-01-28 14:23:48 +00:00
openshift-merge-bot[bot]
cbe2b67c22 Merge pull request #10081 from barbacbd/OCPBUGS-63305
OCPBUGS-63305: Make SimulatePrincipalPolicy optional
2026-01-28 14:23:40 +00:00
openshift-merge-bot[bot]
f77608818d Merge pull request #10224 from rna-afk/azure_client_version
OCPBUGS-67816: Revert storage account API version for client
2026-01-27 21:41:44 +00:00
Thuan Vo
d67b14e479 CORS-4055: migrate credential provider check to AWS SDK v2
This commit is an incremental step to migrate AWS API calls
to AWS SDK v2. This focuses on handlers that retrieve the source
or provider of credentials, for example, via shared credential file
and via environment variables.

Note: these logics are to determine whether the credential provider
is static, which is safe to transfer to the cluster as-is in Mint and
Passthrough credentialsMode.
2026-01-27 12:03:27 -08:00
openshift-merge-bot[bot]
16a52e0981 Merge pull request #10234 from jinyunma/fix-OCPBUGS-74078
OCPBUGS-74078: add support for NVIDIA H100 and H200 enabled machine series
2026-01-27 11:55:30 +00:00
Thuan Vo
552b61936e CORS-4058: Migrate AWS Destroy to SDK v2 (#9982)
* pkg/destroy/aws/ec2helpers.go

** the bulk of the changes are to the ec2helpers file. All of the sdk v1 imports
are removed except for session as this one is engrained too many files currently.

pkg/destroy/aws/aws.go

** Add a client for ELB ELBV2 and IAM to the Cluster Removal Struct. Even though
these changes are mainly to ec2helpers, the other clients were required in for
certain operations.

** The rest of the file updates are alter ARN import to come from aws sdk v2.

* pkg/destroy/aws/iamhelpers.go

** Remove/Change all imports from AWS sdk v1 to v2.

pkg/destroy/aws/errors.go
pkg/destroy/aws/ec2helpers.go

** Remove the Error checking/formatting function from ec2helpers and put the function
in the errors.go file.

* pkg/destroy/aws/elbhelpers.go

** Remove all SDK v1 imports from elb helpers.

* Add reference to correct HandleErrorCode function.

* pkg/destroy/aws/aws.go

** Update Route53, s3, and efs services to sdk v2. This is slowly removing the
requirement for aws session.

* ** Vendor updates for S3 and EFS services.
** This caused updates to other packages such as aws/config, credentials, stscreds, and
a list of aws internal packages.

* Clean up references and use the exported config creator to create new clients in destroyer.

* ** Migrate the use of resource tagging api to the sdk V2.

pkg/destroy/aws:

** Alter the function name from HandleErrorCode to handleErrorCode. The initial thought was that
this function could be used in other areas of the code, but it will remain in destroy for now.

pkg/destroy/aws/shared.go:

** Remove the session import and uses in the file.

* Fix references to HandleErrorCode.

* kg/destroy/aws/aws.go:

** Remove session from the imports. Added the agent handler to the configurations.

* Fix package updates for vendoring.

* Use the correct private and public zone clients.
Set a Destroy User Agent.
Cleanup pointer references to use the aws sdk.

* The ListUsers API call does not return tags for the IAM users in the
response. There is a separate call ListUserTags to fetch its tag for
checking in the installer code.

* rebase: fix other imports after rebase

* revert: use GetRole/GetUser to fetch tags

An older commit uses ListRoleTags/ListUserTags in order to save
bandwidth by fetching only tags. However, the minimal permission
required for the installer does not have permission iam:ListUserTags or
iam:ListRoleTags, thus causing the deprovisioning to skip users and
roles. This is part of the reasons for previous CI leaks.

This commit reverts the optimisation idea to just user GetRole/GetUser,
which should have sufficient minimal permission policy.

---------

Co-authored-by: barbacbd <barbacbd@gmail.com>
2026-01-27 11:55:23 +00:00
openshift-merge-bot[bot]
5aa688f0a7 Merge pull request #10211 from barbacbd/installer-n4a-instances
CORS-4299,CORS-4300: Allow N4A Instance Types in the installer
2026-01-27 06:23:08 +00:00
openshift-merge-bot[bot]
0c2ec6ece6 Merge pull request #10190 from tthvo/claude-cmd
no-jira: add trace-config Claude command for installconfig field usage analysis
2026-01-27 06:23:01 +00:00
Thuan Vo
edb4e5af40 tests: add unit tests for NodeIPFamilies configurations 2026-01-26 18:34:15 -08:00
Thuan Vo
ab593238e2 CORS-4328: configure NodeIPFamilies for dual-stack support
Add NodeIPFamilies configuration to AWS cloud provider config
when dual-stack networking is enabled. The cloud provider now
sets the appropriate IP family ordering (ipv4/ipv6 or ipv6/ipv4)
based on the install config's IPFamily setting.

For dual-stack IPv4 primary clusters, NodeIPFamilies is set to:

NodeIPFamilies=ipv4
NodeIPFamilies=ipv6

For dual-stack IPv6 primary clusters, NodeIPFamilies is set to:

NodeIPFamilies=ipv6
NodeIPFamilies=ipv4

Single-stack IPv4 clusters continue to use the minimal config with an
empty Global section.
2026-01-26 18:31:08 -08:00
openshift-merge-bot[bot]
c573c82a4f Merge pull request #10254 from pawanpinjarkar/modify-hw-storage-requirements-for-ove
AGENT-1309: Increase disk size requirements for master and SNO
2026-01-26 23:52:58 +00:00
openshift-merge-bot[bot]
b68bfd6f5e Merge pull request #10246 from yunjiang29/aws-m7
OSDOCS-17769: Add AWS m7 instance types
2026-01-26 23:52:51 +00:00
openshift-merge-bot[bot]
56e3874a13 Merge pull request #10238 from tthvo/CORS-4073
CORS-4073: validate instance type support IPv6 in dual-stack
2026-01-26 20:06:58 +00:00
Pawan Pinjarkar
524811bbae AGENT-1309: Increase disk size requirements for master and SNO 2026-01-26 09:59:51 -05:00
openshift-merge-bot[bot]
960239fe51 Merge pull request #10249 from barbacbd/CORS-4318
OCPBUGS-74363: Remove region option for the GCP Private Service Connect Endpoint
2026-01-26 12:31:56 +00:00
barbacbd
8066014ea0 OCPBUGS-74363: Remove region option for the GCP Private Service Connect Endpoint
** While the regional support is valid, we will not be using this in openshift. Regional support
requires that each api have its own endpoint. Only one api is associated with an endpoint, and managing
this access will be difficult and unnessary at this time.
2026-01-23 09:19:39 -05:00
Yunfei Jiang
3b4c91caa4 OCPSTRAT-2506 Test and validate AWS m7 instance types for OpenShift Container Platform 2026-01-22 17:07:19 +08:00
Thuan Vo
352241d9f5 CORS-4055: migrate IAM API calls to AWS SDK v2
The commit is an incremental step to migrate AWS API calls to AWS SDK
v2. This focuses on IAM clients in the pkg/asset and dependent pkg(s).
2026-01-21 17:53:00 -08:00
Thuan Vo
deb94a3815 CORS-4055: migrate S3 API calls to AWS SDK v2
The commit is an incremental step to migrate AWS API calls to AWS SDK
v2. This focuses on S3 clients in the pkg/asset and dependent pkg(s).
2026-01-20 16:59:19 -08:00
openshift-merge-bot[bot]
d228bea76c Merge pull request #10240 from jianlinliu/golint
NO-JIRA: use v2 config for go-lint
2026-01-20 22:32:43 +00:00
Thuan Vo
adfe5e7b4a tests: add unit tests for IPv6 networking validations 2026-01-20 13:38:10 -08:00
Thuan Vo
3a2f742642 CORS-4073: validate instance type support IPv6 in dual-stack
In order to attach IPv6 addresses to the ENI of EC2 instances, the
instance type must support IPv6 networking. The installer must validate
it by inspecting the networking capabilities of instance type via EC2
API calls.
2026-01-20 13:38:10 -08:00
openshift-merge-bot[bot]
b6202667db Merge pull request #10237 from pawanpinjarkar/fix-assisted-install-ui-url
AGENT-1425: Fix stale dependency in agent-register-infraenv
2026-01-20 17:39:55 +00:00
Pawan Pinjarkar
2e027b13dd AGENT-1425: TUI does not show local web UI URL
The agent-ui service was previously updated to 'Type=notify' to improve startup ordering and reliability.
However, the lack of container monitor '--sdnotify=conmon' flag, resulted in UI URL to be not displayed on the TUI.
Without this flag, agent-ui systemd waits for a readiness signal which never comes and the service remains in 'activating' state.
This causes the TUI availability check to fail, making the user only see "Waiting for services" instead of UI URL
( even though the UI is already avaialble via the usual URL)

This commit adds the missing flag, ensuring the notification handshake between the container running UI and the agent-ui systemd completes successfully and unblocking the TUI. This commit also fixes the stale dependency in agent-register-infraenv related to agent-ui systemd naming.
2026-01-20 00:10:47 -05:00
Jianlin Liu
66a1230669 use v2 config for go-lint 2026-01-20 10:55:15 +08:00
openshift-merge-bot[bot]
dfdec6e1da Merge pull request #10176 from pawanpinjarkar/modify-hw-storage-requirements-for-ove
AGENT-1309: Modify NoRegistryClusterInstall storage requirements
2026-01-19 20:19:23 +00:00
openshift-merge-bot[bot]
617269249e Merge pull request #10223 from gpei/fix-OCPBUGS-56770
OCPBUGS-56770: Honor user-specified bootDiagnostics on Azure Stack Hub
2026-01-19 16:35:56 +00:00
barbacbd
f7eb72b373 CORS-4300: Update installer to allow n4a instances
pkg/types/gcp/machinepools.go:

Include the n4a instance type in the map as well as the (current) supported disk types:
- hyperdisk-balanced

pkg/asset/installconfig/gcp/validation.go:

Include n4a in the types of arm instance families.
2026-01-19 11:28:45 -05:00
barbacbd
322e2929d1 CORS-4299: Update GCP MAPI Provider
Update the GCP provider reference so that N4A instances can be validated.

Note: govmomi was set to v0.51.0 because the MAPI updates were causing an automatic
update to v0.52.0 resulting in build issues that have no current solution.
2026-01-19 10:59:55 -05:00
Vincenzo Mauro
bb7d56e927 Added support for platform None in TNA clusters 2026-01-19 16:05:04 +01:00
openshift-merge-bot[bot]
19e15798a0 Merge pull request #10193 from abhay-nutanix/OCPBUGS-63028
OCPBUGS-63028: filtering only PEs from cluster list
2026-01-19 09:01:40 +00:00
Jinyun Ma
d5751b6598 Azure: add support for NVIDIA H100 and H200 enabled machine series 2026-01-19 16:15:15 +08:00
openshift-merge-bot[bot]
e04b9d5eab Merge pull request #10207 from sadasu/dual-stack-config
CORS-4075, CORS-4113: Install-config and Infra manifest updates for DualStack for AWS and Azure
2026-01-17 02:18:31 +00:00
openshift-merge-bot[bot]
71aea74175 Merge pull request #10202 from jinyunma/OCPBUGS-72525
OCPBUGS-72525: add newly detected instance types for Azure during 4.21 regression test
2026-01-16 04:38:09 +00:00
Gaoyun
e7bd4cae84 Check whether the user has explicitly configured bootDiagnostics in the mpool's bootDiagnostics field. If not configured, the Azure Stack Hub default is applied 2026-01-16 00:42:21 +00:00