Previously, destroy support was behind TAGS=libvirt_destroy and create
support was always built in. But since 3fb4400c (terraform/plugins:
add `libvirt`, `aws`, `ignition`, `openstack` to KnownPlugins,
2018-12-14, #919), the bundled libvirt Terraform provider has also
been behind libvirt_destroy. That leads to cluster creation failing
with:
$ openshift-install create cluster
...
ERROR Missing required providers.
ERROR
ERROR The following provider constraints are not met by the currently-installed
ERROR provider plugins:
ERROR
ERROR * libvirt (any version)
ERROR
ERROR Terraform can automatically download and install plugins to meet the given
ERROR constraints, but this step was skipped due to the use of -get-plugins=false
ERROR and/or -plugin-dir on the command line.
...
With this commit, folks trying to 'create cluster' without libvirt
compiled in will get:
FATAL failed to fetch Common Manifests: failed to load asset "Install Config": invalid "install-config.yaml" file: platform: Invalid value: types.Platform{AWS:(*aws.Platform)(nil), Libvirt:(*libvirt.Platform)(0xc4209511f0), OpenStack:(*openstack.Platform)(nil)}: platform must be one of: aws, openstack
before we get to Terraform.
Now that the build tag guards both creation and deletion, I've renamed
it from 'libvirt_destroy' to the unqualified 'libvirt'.
I've also adjusted the install-config validation testing to use
regular expressions so we can distinguish between failures because
libvirt was not compiled in as a valid platform and failures because
some portion of the libvirt configuration was broken. In order to get
stable error messages for comparison, I've added some strings.Sort
calls for various allowed-value string-slice computations.
The long forms are less likely to exist in the user's environment
since 6be4c253 (*: remove support for environment variables,
2018-12-10, #861), and we no longer need the context to distinguish
from all the other environment variables on a user's system.
The environment variables were originally added to make CI testing a
little easier, since the installer didn't support consumption of
provided assets (e.g. the install config). Now that the installer
supports consumption, there is no need for most of the environment
variables anymore. The variables have actually been confusing to users,
so their removal should simplify the mental model.
That approach should be documented in the CVO itself, since it's not
installer-specific and moving it gets the docs and implementation for
that approach into the same repository. I've filed [1] to land
dynamic-object docs in the CVO repo (based on some of the content I'm
removing here). Naming files, etc. are already covered by the
existing CVO documentation.
[1]: https://github.com/openshift/cluster-version-operator/pull/59
On RHEL (and IIRC Fedora as well) installing Libvirt doesn't actually
automatically mean you pulled in a hypervisor to actually run VMs on. As
a result you can encounter this error because qemu-kvm or equivalent is
not present:
Could not find any guests for architecure type hvm/x86_64
To avoid this, explicitly install qemu-kvm (if qemu-kvm-rhev or
qemu-kvm-ev are available in the machine's yum/dnf configuration they
will automatically get pulled in instead). The other package needed is
libvirt-daemon-kvm.
The kube-addon operator was the last remaining component in that
namespace, and it was just controlling a metrics server. Metrics
aren't critical to cluster functions, and dropping kube-addon means we
don't need the old pull secret anymore (although we will shortly need
new pull secrets for pulling private release images [1]).
Also drop the admin and user roles [2], although I'm less clear on
their connection.
[1]: https://github.com/openshift/installer/pull/663
[2]: https://github.com/openshift/installer/pull/682#issuecomment-439145907
The account.coreos.com reference was stale, and pull-secrets aren't
libvirt-specific, so I've dropped them from the libvirt docs entirely.
From Clayton, the flow for getting a pull secret will be:
1. Log in to try.openshift.com.
2. Accept the terms.
3. Get a pull secret you can download or copy/paste back into a local
file.
Podman doesn't really come into it. Currently the secret you get
there looks like:
$ cat ~/.personal/pull-secret.json
{
"auths": {
"cloud.openshift.com": {"auth": "...", "email": "..."},
"quay.io": {"auth": "...", "email": "..."}
}
}
Besides pulling images, the secret may also be used to authenticate to
other services (e.g. telemetry) on hosts that do not contain image
registries, which is more reason to decouple this from Podman.
Or at least, it's in what looks like an unreliable location ;).
Here's my local kubeconfig:
$ sha1sum wking/auth/kubeconfig
dd7f1796fe5aed9b0f453498e60bfea9c6a56586 wking/auth/kubeconfig
And here's looking on master:
[core@wking-master-0 ~]$ sudo find / -xdev -name 'kubeconfig*' -exec sha1sum {} \+ 2>/dev/null
aa7e5544c36f2b070c33cbbea12102d64bc52928 /sysroot/ostree/deploy/rhcos/var/lib/kubelet/kubeconfig
aa7e5544c36f2b070c33cbbea12102d64bc52928 /var/lib/kubelet/kubeconfig
227e8aa1c09c7b5f8602a5528077f3bd34b8544e /etc/kubernetes/kubeconfig
dd7f1796fe5aed9b0f453498e60bfea9c6a56586 /etc/kubernetes/checkpoint-secrets/kube-system/pod-checkpointer-5crhb/controller-manager-kubeconfig/kubeconfig
[core@wking-master-0 ~]$ grep 'user: ' /etc/kubernetes/kubeconfig
user: kubelet
Reaching into checkpoint-secrets is probably not what we want to
recommend, so instead I'm suggesting folks just copy their kubeconfig
over from their local host.
I'd originally left the boostrap suggestion alone, but now I'm
recommending scp for that as well, because:
1. Having only one way is less to think about.
2. With [1], the bootstrap node is becoming a fairly short-lived
thing, so it's not worth spending much time talking about access to
it.
3. Abhinav asked for it [2] ;).
[1]: https://github.com/openshift/installer/pull/579
[2]: https://github.com/openshift/installer/pull/585#issuecomment-434864437
This is what I do. `dnf` no longer complains if invoked as `yum`;
there's no point to having two separate sets of instructions.
Also use `systemctl enable --now` for further brevity.
Otherwise virsh may not be able to find the nodes:
$ virsh -c $OPENSHIFT_INSTALL_LIBVIRT_URI domifaddr master0
Name MAC address Protocol Address
----------------------------------------------------------------
vnet1 0a:11:5f:07:f8:b5 ipv4 192.168.126.11/24
$ virsh domifaddr master0
error: failed to get domain 'master0'
error: Domain not found: no domain with matching name 'master0'
Using Terraform to remove all resources created by the bootstrap
modules. For this to work, all platforms must define a bootstrap
module (and they all currently do).
This command moves the previous destroy-cluster into a new 'destroy
cluster' subcommand, because grouping different destroy flavors into
sub-commands makes the base command easier to understand. We expect
both destroy flavors to be long-running, because it's hard to write
generic logic for "is the cluster sufficiently live for us to remove
the bootstrap". We don't want to hang forever if the cluster dies
before coming up, but there's no solid rules for how long to wait
before deciding that it's never going to come up. When we start
destroying the bootstrap resources automatically in the future, will
pick reasonable timeouts, but will want to still provide callers with
the ability to manually remove the bootstrap resources if we happen to
fall out of that timeout on a cluster that does eventually come up.
I've also created a LoadMetadata helper to share the "retrieve the
metadata from the asset directory" logic between the destroy-cluster
and destroy-bootstrap logic. The new helper lives in the cluster
asset plackage close to the code that determines that file's location.
I've pushed the Terraform module unpacking and 'terraform init' call
down into a helper used by the Apply and Destroy functions to make
life easier on the callers.
I've also fixed a path.Join -> filepath.Join typo in Apply, which
dates back to ff5a57b0 (pkg/terraform: Modify some helper functions
for the new binary layout, 2018-09-19, #289). These aren't network
paths ;).
The typo is from af6d904c (fixing cmd and typo, 2018-09-20, #293),
which was itself fixing typos from 21ef0d4f (adding details regarding
using of firewalld instead of iptables, 2018-09-19, #284).
This document is meant for an operator author, who wants to integrate a second-level operator into the installer. The document is a guide on all the possible and acceptable methods.
Most libvirt installs will already have an interface
utilizing 192.168.124.0/24 network. This commit
updates default cluster cidr to a 192.168.126.0/24
This restores the console docs which we'd removed from the README in
feb41e9d (docs: rework documentation, 2018-09-24, #328). And it moves
the kubeconfig location information over from the libvirt-specific
docs. Launching the cluster is nice, but these other operations are
important too ;). Putting them in the README makes increases their
visibility. It also lets us drom them from the libvirt-specific docs,
now that the libvirt docs link to the README quick-start for these
generic operations.
Docs for Go's build constraints are in [1]. This commit allows folks
with local libvirt C libraries to compile our libvirt deletion logic
(and get a dynamically-linked executable), while release binaries and
folks without libvirt C libraries can continue to get
statically-linked executables that lack libvirt deletion.
I've also simplified the public names (e.g. NewDestroyer -> New),
dropping information which is already encoded in the import path.
Pulling the init() registration out into separate files is at
Abhinav's request [2].
[1]: https://golang.org/pkg/go/build/#hdr-Build_Constraints
[2]: https://github.com/openshift/installer/pull/387#discussion_r221763315
These changes catch us up with the recent shift from 'tectonic' to 'openshift-install'.
I also:
* Dropped the section numbers, since these are tedious to maintain.
The ordering should be clear enough from whether a section is above
or below in the file ;).
* Dropped/adjusted references for settings which are no longer
configurable, although we might restore the ability to configure IP
ranges, RHCOS image, etc., in the future.
* Dropped the 30-min caveat. The cluster comes up faster now, but I
don't have a more accurate time to plug in, so I've just dropped
that line.
Add a few things to the libvirt howto after my first pass running it:
- Add dependency installation
- Start libvirtd
- Show how to create the default libvirt storage pool
- Renumber sections after inserting new sections
- Fix rhcos image name
- Clarify that when running the --permanent commands for firewalld are
in addition to running the same commands without the flag
- change reference to ../libvirt.yaml to libvirt.yaml to match where
the file will be based on past instructions
Since we're instructing to use 192.168.122.1 for the libvirt URI, which
is apparently what's used by the clusterapi-controller to talk to
libvirt, the firewall has to match, otherwise it looks likt this in the
logs:
```
E0924 21:26:08.925983 1 controller.go:115] Error checking
existance of machine instance for machine object worker-fdtdg; Failed to
build libvirt client: virError(Code=38, Domain=7, Message='unable to
connect to server at '192.168.122.1:16509': Connection timed out')
```