installer

mirror of https://github.com/openshift/installer.git synced 2026-02-05 15:47:14 +01:00

Author	SHA1	Message	Date
barbacbd	ffca92e42a	no-jira: Fix linting issues for golangci-lint v2 pkg/agent/logging.go: QF1006: could lift into loop condition Skip lint check. pkg/asset/manifests/azure/cluster.go: QF1003: could use tagged switch on subnetType Use a switch instead of if-else pkg/infrastructure/azure/storage.go: QF1007: could merge conditional assignment into variable declaration pkg/infrastructure/baremetal/image.go: QF1009: probably want to use time.Time.Equal instead Use function for time.Equal rather than ==.	2025-12-02 11:34:14 -05:00
Richard Su	38c05c786b	OCPBUGS-18658: Unify agent install-complete with installer Removed custom agent wait-for install-complete code. Moved installer WaitForInstallComplete function from cmd/openshift-install/main to cmd/openshift-install/command so that the function can be made public. Modified agent.newWaitForInstallCompleted() to use the common WaitForInstallComplete function. The benefit of moving agent over to the common WaitForInstallComplete function is that the common function has a step to wait for operators to be in a stable state before calling the cluster installation complete.	2024-12-09 02:09:09 -05:00
Pawan Pinjarkar	16b3a7fd50	AGENT-1028: Day2 - Save and manage authentication tokens as cluster secrets - Store the three authentication tokens (userAuth, agentAuth, watcherAuth) as a secret in the cluster when creating the node ISO. - Automatically regenerate expired tokens and refresh the asset store to maintain valid authentication credentials in the cluster secret.	2024-11-19 11:09:37 -05:00
openshift-merge-bot[bot]	5e705f0f65	Merge pull request #9039 from pawanpinjarkar/create-seperate-tokens-4-each-user-authz AGENT-950: Implement Separate JWT Tokens for Different User Personas	2024-11-15 05:12:45 +00:00
Bob Fournier	bea32e5da1	OCPBUGS-44580: Agent handle IPv6 correctly in net.Dial IPv6 was not being handled correctly when calling this function. Use net.JoinHostPort. From the docs: JoinHostPort combines host and port into a network address of the form "host:port". If host contains a colon, as found in literal IPv6 addresses, then JoinHostPort returns "[host]:port".	2024-11-14 17:53:09 -05:00
Pawan Pinjarkar	a189a7d3ec	AGENT-950: Implement Separate JWT Tokens for Different User Personas - Create 3 seperate JWT tokens- AGENT_AUTH_TOKEN, USER_AUTH_TOKEN, WATCHER_AUTH_TOKEN - Update the claim to set 'auth_scheme' to identify the user persona - Assisted service checks the `auth_scheme` to determine which user persona is allowed to access an endpoint - WATCHER_AUTH_TOKEN is used with header `Watcher-Authorization` and is used by wait-for command ( watcher persona) - USER_AUTH_TOKEN is used with header `Authorization` and is used by curl API requests, systemd services ( user persona) - AGENT_AUTH_TOKEN is used with header `X-Secret-Key` and is used by agent service ( agent persona)	2024-11-12 22:16:01 -05:00
Richard Su	ff3b202730	AGENT-967: Improve monitoring output for multi-node Monitoring output are now batched and displayed every 5 seconds for each node. This makes the logs easier to read because the logs for each node are more likely to be grouped together.	2024-11-05 17:36:00 -05:00
Zane Bitter	83de9258ba	OCPBUGS-43768: Log correct hostname for validation status The log prefix was getting set the first time through the host loop and not getting modified for subsequent hosts. We need to calculate it anew for each host.	2024-10-24 17:14:31 +13:00
Bob Fournier	82c7aa3bf9	OCPBUGS-36532: Agent installer wait-for, just test connectivity to host In the agent wait-for, when the APIs are not available, an attempt is made to check the host by ssh'ing to it. This results in confusing debug messages if the keys aren't present. This changes the check to just test connectivity to the host.	2024-10-08 13:51:46 -04:00
openshift-merge-bot[bot]	cb18b15669	Merge pull request #8783 from rwsu/AGENT-862-timeout AGENT-862: Change day-2 monitor timeout back to 90 minutes	2024-07-30 19:03:12 +00:00
Richard Su	e2085e17d3	AGENT-862: Change day-2 monitor timeout back to 90 minutes It was mistakenly changed to 1 minute.	2024-07-29 18:06:52 -04:00
Pawan Pinjarkar	c5048c102b	AGENT-919: Authenticate day2 operations Creating a Nodes ISO: - secret does not exist 1. Generate a new public key and JWT token with an expiration time of 48 hours 2. The public key and token gets saved into the asset store 3. Create a secret named agent-auth-token in the openshift-config namespace with this token and public key from the asset store. - secret already exists 1. Retrieve the stored token and check if the secret with JWT token is older than 24 hours. - secret already exists and the token is older than 24 Hours 1. Generate a new public key and JWT token with a new expiration time of 48 hours 2. Update the secret with a new public key and JWT token - secret already exists and the token is not older than 24 Hours 1. Retrieve the token and public key from the secret and update the asset store with the values from the secret. Running monitor-add-nodes Command: 1. Retrieve the token from the secret. 2. Register the agent installer client with the retrieved token. 3. Send the auth token to the assisted service API via HTTP headers	2024-07-26 12:19:22 -04:00
openshift-merge-bot[bot]	57154e1622	Merge pull request #8507 from rwsu/AGENT-862 AGENT-862: Extend monitor-add-nodes to support multiple nodes	2024-07-25 22:49:23 +00:00
Richard Su	d240a953fc	Fix linting issue	2024-07-23 14:19:16 -04:00
Richard Su	67434ffb1c	Update comment about node readiness before second CSR is approved	2024-07-23 11:32:57 -04:00
Richard Su	5aa318cf2d	AGENT-922: Remove misleading inClusterConfig warning Remove the "Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work." warning because it does work and would confuse the user when the monitor-add-nodes command is run in cluster.	2024-07-18 16:01:23 -04:00
Richard Su	91dfe6757c	Use a global context for all nodes	2024-07-17 17:45:32 -04:00
Richard Su	f35fa62d5f	AGENT-862: Extend monitor-add-nodes to support multiple nodes Multiple IP addresses can now be specified for monitoring at the same time.	2024-07-15 12:07:31 -04:00
Pawan Pinjarkar	e47dc98f3f	Review fixes	2024-06-28 20:13:15 -04:00
Pawan Pinjarkar	47edd6afcb	Move common code into NewCluster() method	2024-06-28 20:13:15 -04:00
Pawan Pinjarkar	8caccf729a	AGENT-871: Authenticate wait-for commands - Base64 encode ECDSA public, private keys - Set Authorization header param in the client	2024-06-28 20:13:14 -04:00
Brent Barbachem	b414f7b1ec	no-jira: Minor updates to fix linting issues. The Linter is upgraded in https://github.com/openshift/release/pull/52723. The linter caught 3 issues that are addressed here (including name change, potential secret exposed, and unnecessary newline).	2024-06-05 10:42:18 -04:00
Richard Su	5ad9955503	Resolve IP address to hostname during startup Previously, the code was resolving each time the CSRs were checked. Changed error logging to improve clarity.	2024-05-15 12:07:54 -04:00
Richard Su	1f2e11bedc	AGENT-903: monitor-add-nodes should only show CSRs matching node The first and second CSRs pending approval have the node name (hostname) embedded in their specs. monitor-add-nodes should only show CSRs pending approval for a specific node. Currently it shows all CSRs pending approval for all nodes. If the IP address of the node cannot be resolved to a hostname, we will not be able to determine if there are any CSRs pending approval for that node. The monitoring command will skip showing CSRs pending approval. In this case, users can still approve the CSRs, and the monitoring command will continue to check if the node has joined the cluster and has become Ready.	2024-05-15 11:35:48 -04:00
Richard Su	440a9bc7c9	Check status interatively instead of using a map Co-authored-by: Andrea Fasano <afasano@redhat.com>	2024-05-02 16:02:18 -04:00
Richard Su	6623d4287d	Refactor NewCluster and LogAssistedServiceStatus functions using feedback from Andrea Fasano.	2024-05-02 16:02:17 -04:00
Richard Su	82db16c08a	Remove checks requiring ssh. Add kubelet check. Refactor CSR checks.	2024-05-02 16:02:17 -04:00
Richard Su	92da35875d	Fix linting issues	2024-05-02 16:02:17 -04:00
Richard Su	63c5738cb9	Add assetDir and kubeconfigPath parameters to NewCluster NewCluster needs both assetDir for install workflow and kubeconfigPath for addnodes workflow. Cluster.assetDir should only be initialized for the install workflow.	2024-05-02 16:02:16 -04:00
Richard Su	51a57560cf	Add assisted-service validations and events to monitor-add-nodes	2024-05-02 16:02:16 -04:00
Richard Su	6c69310306	Refactor into addNodeMonitor	2024-05-02 16:02:16 -04:00
Richard Su	c4b0465e78	AGENT-861: day2 monitor-add-nodes single node Adds the ability to monitor a node being added during day2. The command is: node-joiner monitor-add-nodes --kubeconfig <kubeconfig-file-path> <IP-address-of-node-to-monitor> Both the kubeconfig file and IP address are required. Multi node monitoring will be added in a future PR.	2024-05-02 16:02:15 -04:00
Richard Su	152458ee1b	Refactor agent.NewCluster for reuse The function now requires kubeconfig file path, rendezvousIP, and sshKey as parameters. Previously it had a single parameter, assetStore, and it searched the asset store to determine the three parameters above.	2024-05-02 16:02:15 -04:00
Bob Fournier	81f048a5f4	reinstated agentconfig test; fixed nmstateconfig lint error	2023-10-20 12:41:50 -04:00
Richard Su	050f357551	Remove trace level logging by agent wait-for to .openshift_install.log The trace level information can be confusing and too verbose.	2023-09-27 18:19:45 -04:00
Bob Fournier	385834c303	OCPBUGS-13108: Log additional host info at warning level When the host fails to boot due to pending-user-action (this can occur when the disk boot order is set incorrectly) log the message at Warning level instead of Debug to make it obvious. This also removes an additional spurious info message that was being logged.	2023-05-26 09:40:10 -04:00
Bob Fournier	21b5a9fbb2	OCPBUGS-4998: Add additional info in wait-for when status is pending-user-action When the cluster status is installing-pending-user-action the install won't complete. Most likely this is due to an invalid boot disk. When this status is detected also log the host's status_info for hosts that have this status.	2023-04-06 12:33:37 -04:00
Bob Fournier	65033f3313	OCPBUGS-8094: In agent 'wait-for bootstrap' command, test ssh to Node0 If the Rest API and Kube API are not reachable, it may be because network connectivity checks are preventing the install from progressing (which will also prevent the Node0 SSH server from starting). Attempt to SSH to the node and provide instructions for further debug to the user upon failure.	2023-03-06 09:51:08 -05:00
Zane Bitter	9b613cf816	OCPBUGS-3706: Don't report bootstrap errors as install errors When running the 'agent wait-for install-complete' command, we first check that bootstrapping is complete (by running the equivalent of 'agent wait-for bootstrap-complete'. However, if this failed because the bootstrapping timed out, we would report it as an install failure along with the corresponding debug messages (stating that the problem is with the cluster operators, and inevitably failing to fetch data about which). If the failure occurs during bootstrapping, report it as a bootstrap error the same as you would get from 'agent wait-for bootstrap-complete'.	2022-12-22 12:02:09 -05:00
Zane Bitter	48058f9cb9	Refactor agent wait-for commands Create the Cluster object outside of the WaitFor...() implementation and pass it in, instead of creating it inside and returning it.	2022-12-22 11:34:58 -05:00
Zane Bitter	596f1622fe	Remove dead code err is always nil at this point, because we check it further up and it is not overwritten by the variable of the same name that is shadowing it inside the anonymous function, as was probably intended.	2022-12-22 10:02:55 -05:00
Zane Bitter	804af8c35e	Remove unnecessary warning All errors produced by this function are accompanied by Fatal-level logs, so there is no need for an additional warning-level log.	2022-12-21 16:58:00 -05:00
OpenShift Merge Robot	5ef0adfaaa	Merge pull request #6688 from pawanpinjarkar/OCPBUGS-3706-new OCPBUGS-3706: Wait longer for baremetal	2022-12-15 17:21:05 -05:00
Pawan Pinjarkar	379d9f6f3d	Just increase bootstrap timeout	2022-12-13 10:19:16 -05:00
Rafael Fonseca	80e02a974d	chore: fix import order	2022-12-13 15:40:58 +01:00
Pawan Pinjarkar	0772d2cd55	gofmt Signed-off-by: Pawan Pinjarkar <ppinjark@redhat.com>	2022-12-12 09:27:50 -05:00
Pawan Pinjarkar	dceae07887	Increase bootstrap timeout to 60 min	2022-12-09 16:10:04 -06:00
Pawan Pinjarkar	cc16c0d8ad	OCPBUGS-3706: Wait longer for baremetal	2022-12-09 15:54:25 -06:00
Rafael Fonseca	428688c9cd	Replace deprecated io/ioutil package `io/ioutil` has been deprecated since go-1.16 [1]. We should use `io` and `os` instead. [1] https://github.com/golang/go/issues/42026	2022-11-18 20:08:57 +01:00
Bob Fournier	79656261b5	OCPBUGS-3280: Automatically retry install In https://github.com/openshift/installer/pull/6470 we detected a failure when the cluster state moves back to Ready after an installation has been initiated. This adds an automatic retry when that condition occurs. It will help resolve issues like https://issues.redhat.com/browse/OCPBUGS-3280 and, in general, any problems that cause a cluster prepare failure.	2022-11-09 12:25:27 -05:00

1 2 3 4

168 Commits