1
0
mirror of https://github.com/openshift/installer.git synced 2026-02-05 15:47:14 +01:00

168 Commits

Author SHA1 Message Date
barbacbd
ffca92e42a no-jira: Fix linting issues for golangci-lint v2
pkg/agent/logging.go:
QF1006: could lift into loop condition
Skip lint check.

pkg/asset/manifests/azure/cluster.go:
QF1003: could use tagged switch on subnetType
Use a switch instead of if-else

pkg/infrastructure/azure/storage.go:
QF1007: could merge conditional assignment into variable declaration

pkg/infrastructure/baremetal/image.go:
QF1009: probably want to use time.Time.Equal instead
Use function for time.Equal rather than ==.
2025-12-02 11:34:14 -05:00
Richard Su
38c05c786b OCPBUGS-18658: Unify agent install-complete with installer
Removed custom agent wait-for install-complete code.

Moved installer WaitForInstallComplete function from
cmd/openshift-install/main to cmd/openshift-install/command so
that the function can be made public.

Modified agent.newWaitForInstallCompleted() to use the common
WaitForInstallComplete function.

The benefit of moving agent over to the common
WaitForInstallComplete function is that the common function has a
step to wait for operators to be in a stable state before calling
the cluster installation complete.
2024-12-09 02:09:09 -05:00
Pawan Pinjarkar
16b3a7fd50 AGENT-1028: Day2 - Save and manage authentication tokens as cluster secrets
- Store the three authentication tokens (userAuth, agentAuth, watcherAuth)
  as a secret in the cluster when creating the node ISO.
- Automatically regenerate expired tokens and refresh the asset store
  to maintain valid authentication credentials in the cluster secret.
2024-11-19 11:09:37 -05:00
openshift-merge-bot[bot]
5e705f0f65 Merge pull request #9039 from pawanpinjarkar/create-seperate-tokens-4-each-user-authz
AGENT-950: Implement Separate JWT Tokens for Different User Personas
2024-11-15 05:12:45 +00:00
Bob Fournier
bea32e5da1 OCPBUGS-44580: Agent handle IPv6 correctly in net.Dial
IPv6 was not being handled correctly when calling this function.
Use net.JoinHostPort.

From the docs:
JoinHostPort combines host and port into a network address of the form "host:port".
If host contains a colon, as found in literal IPv6 addresses, then JoinHostPort returns "[host]:port".
2024-11-14 17:53:09 -05:00
Pawan Pinjarkar
a189a7d3ec AGENT-950: Implement Separate JWT Tokens for Different User Personas
- Create 3 seperate JWT tokens- AGENT_AUTH_TOKEN, USER_AUTH_TOKEN, WATCHER_AUTH_TOKEN
- Update the claim to set 'auth_scheme' to identify the user persona
-  Assisted service checks the `auth_scheme` to determine which user persona is allowed to access an endpoint
- WATCHER_AUTH_TOKEN is used with header `Watcher-Authorization` and is used by wait-for command ( watcher persona)
- USER_AUTH_TOKEN is used with header `Authorization` and is used by curl API requests, systemd services ( user persona)
- AGENT_AUTH_TOKEN is used with header `X-Secret-Key` and is used by agent service ( agent persona)
2024-11-12 22:16:01 -05:00
Richard Su
ff3b202730 AGENT-967: Improve monitoring output for multi-node
Monitoring output are now batched and displayed every 5 seconds
for each node. This makes the logs easier to read because the logs
for each node are more likely to be grouped together.
2024-11-05 17:36:00 -05:00
Zane Bitter
83de9258ba OCPBUGS-43768: Log correct hostname for validation status
The log prefix was getting set the first time through the host loop and
not getting modified for subsequent hosts. We need to calculate it anew
for each host.
2024-10-24 17:14:31 +13:00
Bob Fournier
82c7aa3bf9 OCPBUGS-36532: Agent installer wait-for, just test connectivity to host
In the agent wait-for, when the APIs are not available, an attempt is
made to check the host by ssh'ing to it. This results in confusing
debug messages if the keys aren't present. This changes the check to
just test connectivity to the host.
2024-10-08 13:51:46 -04:00
openshift-merge-bot[bot]
cb18b15669 Merge pull request #8783 from rwsu/AGENT-862-timeout
AGENT-862: Change day-2 monitor timeout back to 90 minutes
2024-07-30 19:03:12 +00:00
Richard Su
e2085e17d3 AGENT-862: Change day-2 monitor timeout back to 90 minutes
It was mistakenly changed to 1 minute.
2024-07-29 18:06:52 -04:00
Pawan Pinjarkar
c5048c102b AGENT-919: Authenticate day2 operations
Creating a Nodes ISO:
- secret does not exist
   1. Generate a new public key and JWT token with an expiration time of 48 hours
   2. The public key and token gets saved into the asset store
   3. Create a secret named agent-auth-token in the openshift-config namespace with this token and public key from the asset store.

- secret already exists
   1. Retrieve the stored token and check if the secret with JWT token is older than 24 hours.

- secret already exists and the token is older than 24 Hours
   1. Generate a new public key and JWT token with a new expiration time of 48 hours
   2. Update the secret with a new public key and JWT token

- secret already exists and the token is not older than 24 Hours
   1. Retrieve the token and public key from the secret and update the asset store with the values from the secret.

Running monitor-add-nodes Command:
  1. Retrieve the token from the secret.
  2. Register the agent installer client with the retrieved token.
  3. Send the auth token to the assisted service API via HTTP headers
2024-07-26 12:19:22 -04:00
openshift-merge-bot[bot]
57154e1622 Merge pull request #8507 from rwsu/AGENT-862
AGENT-862: Extend monitor-add-nodes to support multiple nodes
2024-07-25 22:49:23 +00:00
Richard Su
d240a953fc Fix linting issue 2024-07-23 14:19:16 -04:00
Richard Su
67434ffb1c Update comment about node readiness before second CSR is approved 2024-07-23 11:32:57 -04:00
Richard Su
5aa318cf2d AGENT-922: Remove misleading inClusterConfig warning
Remove the "Neither --kubeconfig nor --master was specified.
Using the inClusterConfig.  This might not work." warning because
it does work and would confuse the user when the monitor-add-nodes
command is run in cluster.
2024-07-18 16:01:23 -04:00
Richard Su
91dfe6757c Use a global context for all nodes 2024-07-17 17:45:32 -04:00
Richard Su
f35fa62d5f AGENT-862: Extend monitor-add-nodes to support multiple nodes
Multiple IP addresses can now be specified for monitoring at the
same time.
2024-07-15 12:07:31 -04:00
Pawan Pinjarkar
e47dc98f3f Review fixes 2024-06-28 20:13:15 -04:00
Pawan Pinjarkar
47edd6afcb Move common code into NewCluster() method 2024-06-28 20:13:15 -04:00
Pawan Pinjarkar
8caccf729a AGENT-871: Authenticate wait-for commands
- Base64 encode ECDSA public, private keys
- Set Authorization header param in the client
2024-06-28 20:13:14 -04:00
Brent Barbachem
b414f7b1ec no-jira: Minor updates to fix linting issues.
The Linter is upgraded in https://github.com/openshift/release/pull/52723. The linter caught
3 issues that are addressed here (including name change, potential secret exposed, and unnecessary newline).
2024-06-05 10:42:18 -04:00
Richard Su
5ad9955503 Resolve IP address to hostname during startup
Previously, the code was resolving each time the CSRs were checked.

Changed error logging to improve clarity.
2024-05-15 12:07:54 -04:00
Richard Su
1f2e11bedc AGENT-903: monitor-add-nodes should only show CSRs matching node
The first and second CSRs pending approval have the node name
(hostname) embedded in their specs. monitor-add-nodes should only
show CSRs pending approval for a specific node. Currently it shows
all CSRs pending approval for all nodes.

If the IP address of the node cannot be resolved to a hostname,
we will not be able to determine if there are any CSRs pending
approval for that node. The monitoring command will skip showing
CSRs pending approval. In this case, users can still approve the
CSRs, and the monitoring command will continue to check if the node
has joined the cluster and has become Ready.
2024-05-15 11:35:48 -04:00
Richard Su
440a9bc7c9 Check status interatively instead of using a map
Co-authored-by: Andrea Fasano <afasano@redhat.com>
2024-05-02 16:02:18 -04:00
Richard Su
6623d4287d Refactor NewCluster and LogAssistedServiceStatus functions
using feedback from Andrea Fasano.
2024-05-02 16:02:17 -04:00
Richard Su
82db16c08a Remove checks requiring ssh. Add kubelet check. Refactor CSR
checks.
2024-05-02 16:02:17 -04:00
Richard Su
92da35875d Fix linting issues 2024-05-02 16:02:17 -04:00
Richard Su
63c5738cb9 Add assetDir and kubeconfigPath parameters to NewCluster
NewCluster needs both assetDir for install workflow and
kubeconfigPath for addnodes workflow.

Cluster.assetDir should only be initialized for the install
workflow.
2024-05-02 16:02:16 -04:00
Richard Su
51a57560cf Add assisted-service validations and events to monitor-add-nodes 2024-05-02 16:02:16 -04:00
Richard Su
6c69310306 Refactor into addNodeMonitor 2024-05-02 16:02:16 -04:00
Richard Su
c4b0465e78 AGENT-861: day2 monitor-add-nodes single node
Adds the ability to monitor a node being added during day2.

The command is:

node-joiner monitor-add-nodes --kubeconfig <kubeconfig-file-path>
<IP-address-of-node-to-monitor>

Both the kubeconfig file and IP address are required.

Multi node monitoring will be added in a future PR.
2024-05-02 16:02:15 -04:00
Richard Su
152458ee1b Refactor agent.NewCluster for reuse
The function now requires kubeconfig file path, rendezvousIP, and
sshKey as parameters. Previously it had a single parameter, assetStore,
and it searched the asset store to determine the three parameters
above.
2024-05-02 16:02:15 -04:00
Bob Fournier
81f048a5f4 reinstated agentconfig test; fixed nmstateconfig lint error 2023-10-20 12:41:50 -04:00
Richard Su
050f357551 Remove trace level logging by agent wait-for to
.openshift_install.log

The trace level information can be confusing and too verbose.
2023-09-27 18:19:45 -04:00
Bob Fournier
385834c303 OCPBUGS-13108: Log additional host info at warning level
When the host fails to boot due to pending-user-action (this can
occur when the disk boot order is set incorrectly) log the
message at Warning level instead of Debug to make it obvious.

This also removes an additional spurious info message that was
being logged.
2023-05-26 09:40:10 -04:00
Bob Fournier
21b5a9fbb2 OCPBUGS-4998: Add additional info in wait-for when status is pending-user-action
When the cluster status is installing-pending-user-action the install
won't complete. Most likely this is due to an invalid boot disk. When
this status is detected also log the host's status_info for hosts that
have this status.
2023-04-06 12:33:37 -04:00
Bob Fournier
65033f3313 OCPBUGS-8094: In agent 'wait-for bootstrap' command, test ssh to Node0
If the Rest API and Kube API are not reachable, it may be because
network connectivity checks are preventing the install from progressing
(which will also prevent the Node0 SSH server from starting). Attempt
to SSH to the node and provide instructions for further debug to the
user upon failure.
2023-03-06 09:51:08 -05:00
Zane Bitter
9b613cf816 OCPBUGS-3706: Don't report bootstrap errors as install errors
When running the 'agent wait-for install-complete' command, we first
check that bootstrapping is complete (by running the equivalent of
'agent wait-for bootstrap-complete'. However, if this failed because the
bootstrapping timed out, we would report it as an install failure along
with the corresponding debug messages (stating that the problem is with
the cluster operators, and inevitably failing to fetch data about
which).

If the failure occurs during bootstrapping, report it as a bootstrap
error the same as you would get from 'agent wait-for
bootstrap-complete'.
2022-12-22 12:02:09 -05:00
Zane Bitter
48058f9cb9 Refactor agent wait-for commands
Create the Cluster object outside of the WaitFor...() implementation and
pass it in, instead of creating it inside and returning it.
2022-12-22 11:34:58 -05:00
Zane Bitter
596f1622fe Remove dead code
err is always nil at this point, because we check it further up and it
is not overwritten by the variable of the same name that is shadowing it
inside the anonymous function, as was probably intended.
2022-12-22 10:02:55 -05:00
Zane Bitter
804af8c35e Remove unnecessary warning
All errors produced by this function are accompanied by Fatal-level
logs, so there is no need for an additional warning-level log.
2022-12-21 16:58:00 -05:00
OpenShift Merge Robot
5ef0adfaaa Merge pull request #6688 from pawanpinjarkar/OCPBUGS-3706-new
OCPBUGS-3706: Wait longer for baremetal
2022-12-15 17:21:05 -05:00
Pawan Pinjarkar
379d9f6f3d Just increase bootstrap timeout 2022-12-13 10:19:16 -05:00
Rafael Fonseca
80e02a974d chore: fix import order 2022-12-13 15:40:58 +01:00
Pawan Pinjarkar
0772d2cd55 gofmt
Signed-off-by: Pawan Pinjarkar <ppinjark@redhat.com>
2022-12-12 09:27:50 -05:00
Pawan Pinjarkar
dceae07887 Increase bootstrap timeout to 60 min 2022-12-09 16:10:04 -06:00
Pawan Pinjarkar
cc16c0d8ad OCPBUGS-3706: Wait longer for baremetal 2022-12-09 15:54:25 -06:00
Rafael Fonseca
428688c9cd Replace deprecated io/ioutil package
`io/ioutil` has been deprecated since go-1.16 [1]. We should use `io`
and `os` instead.

[1] https://github.com/golang/go/issues/42026
2022-11-18 20:08:57 +01:00
Bob Fournier
79656261b5 OCPBUGS-3280: Automatically retry install
In https://github.com/openshift/installer/pull/6470 we detected a failure when the cluster state moves
back to Ready after an installation has been initiated. This adds an automatic retry when that condition occurs.
It will help resolve issues like https://issues.redhat.com/browse/OCPBUGS-3280 and, in general, any problems
that cause a cluster prepare failure.
2022-11-09 12:25:27 -05:00