On RHEL (and IIRC Fedora as well) installing Libvirt doesn't actually
automatically mean you pulled in a hypervisor to actually run VMs on. As
a result you can encounter this error because qemu-kvm or equivalent is
not present:
Could not find any guests for architecure type hvm/x86_64
To avoid this, explicitly install qemu-kvm (if qemu-kvm-rhev or
qemu-kvm-ev are available in the machine's yum/dnf configuration they
will automatically get pulled in instead). The other package needed is
libvirt-daemon-kvm.
The kube-addon operator was the last remaining component in that
namespace, and it was just controlling a metrics server. Metrics
aren't critical to cluster functions, and dropping kube-addon means we
don't need the old pull secret anymore (although we will shortly need
new pull secrets for pulling private release images [1]).
Also drop the admin and user roles [2], although I'm less clear on
their connection.
[1]: https://github.com/openshift/installer/pull/663
[2]: https://github.com/openshift/installer/pull/682#issuecomment-439145907
The account.coreos.com reference was stale, and pull-secrets aren't
libvirt-specific, so I've dropped them from the libvirt docs entirely.
From Clayton, the flow for getting a pull secret will be:
1. Log in to try.openshift.com.
2. Accept the terms.
3. Get a pull secret you can download or copy/paste back into a local
file.
Podman doesn't really come into it. Currently the secret you get
there looks like:
$ cat ~/.personal/pull-secret.json
{
"auths": {
"cloud.openshift.com": {"auth": "...", "email": "..."},
"quay.io": {"auth": "...", "email": "..."}
}
}
Besides pulling images, the secret may also be used to authenticate to
other services (e.g. telemetry) on hosts that do not contain image
registries, which is more reason to decouple this from Podman.
Or at least, it's in what looks like an unreliable location ;).
Here's my local kubeconfig:
$ sha1sum wking/auth/kubeconfig
dd7f1796fe5aed9b0f453498e60bfea9c6a56586 wking/auth/kubeconfig
And here's looking on master:
[core@wking-master-0 ~]$ sudo find / -xdev -name 'kubeconfig*' -exec sha1sum {} \+ 2>/dev/null
aa7e5544c36f2b070c33cbbea12102d64bc52928 /sysroot/ostree/deploy/rhcos/var/lib/kubelet/kubeconfig
aa7e5544c36f2b070c33cbbea12102d64bc52928 /var/lib/kubelet/kubeconfig
227e8aa1c09c7b5f8602a5528077f3bd34b8544e /etc/kubernetes/kubeconfig
dd7f1796fe5aed9b0f453498e60bfea9c6a56586 /etc/kubernetes/checkpoint-secrets/kube-system/pod-checkpointer-5crhb/controller-manager-kubeconfig/kubeconfig
[core@wking-master-0 ~]$ grep 'user: ' /etc/kubernetes/kubeconfig
user: kubelet
Reaching into checkpoint-secrets is probably not what we want to
recommend, so instead I'm suggesting folks just copy their kubeconfig
over from their local host.
I'd originally left the boostrap suggestion alone, but now I'm
recommending scp for that as well, because:
1. Having only one way is less to think about.
2. With [1], the bootstrap node is becoming a fairly short-lived
thing, so it's not worth spending much time talking about access to
it.
3. Abhinav asked for it [2] ;).
[1]: https://github.com/openshift/installer/pull/579
[2]: https://github.com/openshift/installer/pull/585#issuecomment-434864437
This is what I do. `dnf` no longer complains if invoked as `yum`;
there's no point to having two separate sets of instructions.
Also use `systemctl enable --now` for further brevity.
Otherwise virsh may not be able to find the nodes:
$ virsh -c $OPENSHIFT_INSTALL_LIBVIRT_URI domifaddr master0
Name MAC address Protocol Address
----------------------------------------------------------------
vnet1 0a:11:5f:07:f8:b5 ipv4 192.168.126.11/24
$ virsh domifaddr master0
error: failed to get domain 'master0'
error: Domain not found: no domain with matching name 'master0'
Using Terraform to remove all resources created by the bootstrap
modules. For this to work, all platforms must define a bootstrap
module (and they all currently do).
This command moves the previous destroy-cluster into a new 'destroy
cluster' subcommand, because grouping different destroy flavors into
sub-commands makes the base command easier to understand. We expect
both destroy flavors to be long-running, because it's hard to write
generic logic for "is the cluster sufficiently live for us to remove
the bootstrap". We don't want to hang forever if the cluster dies
before coming up, but there's no solid rules for how long to wait
before deciding that it's never going to come up. When we start
destroying the bootstrap resources automatically in the future, will
pick reasonable timeouts, but will want to still provide callers with
the ability to manually remove the bootstrap resources if we happen to
fall out of that timeout on a cluster that does eventually come up.
I've also created a LoadMetadata helper to share the "retrieve the
metadata from the asset directory" logic between the destroy-cluster
and destroy-bootstrap logic. The new helper lives in the cluster
asset plackage close to the code that determines that file's location.
I've pushed the Terraform module unpacking and 'terraform init' call
down into a helper used by the Apply and Destroy functions to make
life easier on the callers.
I've also fixed a path.Join -> filepath.Join typo in Apply, which
dates back to ff5a57b0 (pkg/terraform: Modify some helper functions
for the new binary layout, 2018-09-19, #289). These aren't network
paths ;).
The typo is from af6d904c (fixing cmd and typo, 2018-09-20, #293),
which was itself fixing typos from 21ef0d4f (adding details regarding
using of firewalld instead of iptables, 2018-09-19, #284).
This document is meant for an operator author, who wants to integrate a second-level operator into the installer. The document is a guide on all the possible and acceptable methods.
Most libvirt installs will already have an interface
utilizing 192.168.124.0/24 network. This commit
updates default cluster cidr to a 192.168.126.0/24
This restores the console docs which we'd removed from the README in
feb41e9d (docs: rework documentation, 2018-09-24, #328). And it moves
the kubeconfig location information over from the libvirt-specific
docs. Launching the cluster is nice, but these other operations are
important too ;). Putting them in the README makes increases their
visibility. It also lets us drom them from the libvirt-specific docs,
now that the libvirt docs link to the README quick-start for these
generic operations.
Docs for Go's build constraints are in [1]. This commit allows folks
with local libvirt C libraries to compile our libvirt deletion logic
(and get a dynamically-linked executable), while release binaries and
folks without libvirt C libraries can continue to get
statically-linked executables that lack libvirt deletion.
I've also simplified the public names (e.g. NewDestroyer -> New),
dropping information which is already encoded in the import path.
Pulling the init() registration out into separate files is at
Abhinav's request [2].
[1]: https://golang.org/pkg/go/build/#hdr-Build_Constraints
[2]: https://github.com/openshift/installer/pull/387#discussion_r221763315
These changes catch us up with the recent shift from 'tectonic' to 'openshift-install'.
I also:
* Dropped the section numbers, since these are tedious to maintain.
The ordering should be clear enough from whether a section is above
or below in the file ;).
* Dropped/adjusted references for settings which are no longer
configurable, although we might restore the ability to configure IP
ranges, RHCOS image, etc., in the future.
* Dropped the 30-min caveat. The cluster comes up faster now, but I
don't have a more accurate time to plug in, so I've just dropped
that line.
Add a few things to the libvirt howto after my first pass running it:
- Add dependency installation
- Start libvirtd
- Show how to create the default libvirt storage pool
- Renumber sections after inserting new sections
- Fix rhcos image name
- Clarify that when running the --permanent commands for firewalld are
in addition to running the same commands without the flag
- change reference to ../libvirt.yaml to libvirt.yaml to match where
the file will be based on past instructions
Since we're instructing to use 192.168.122.1 for the libvirt URI, which
is apparently what's used by the clusterapi-controller to talk to
libvirt, the firewall has to match, otherwise it looks likt this in the
logs:
```
E0924 21:26:08.925983 1 controller.go:115] Error checking
existance of machine instance for machine object worker-fdtdg; Failed to
build libvirt client: virError(Code=38, Domain=7, Message='unable to
connect to server at '192.168.122.1:16509': Connection timed out')
```