openshift/installer

Fork 0

mirror of https://github.com/openshift/installer.git synced 2026-02-05 15:47:14 +01:00

Files

Stefan Junker ffd5eeeee6 docs/libvirt-howto: add faq & troubleshooting

2018-09-26 23:10:58 +02:00

11 KiB

Raw Blame History

Libvirt HOWTO

Tectonic has limited support for installing a Libvirt cluster. This is useful especially for operator development.

1. One-time setup

It's expected that you will create and destroy clusters often in the course of development. These steps only need to be run once (or once per RHCOS update).

1.1 Pick a name and ip range

In this example, we'll set the baseDomain to tt.testing, the name to test1 and the ipRange to 192.168.124.0/24

1.2 Clone the repo

git clone https://github.com/openshift/installer.git
cd installer

1.3 (Optional) Download and prepare the operating system image

By default, the installer will download the latest RHCOS image every time it is invoked. This may be problematic for users who create a large number of clusters or who have limited network bandwidth. The installer allows a local image to be used instead.

Download the latest RHCOS image (you will need access to the Red Hat internal build systems):

curl http://aos-ostree.rhev-ci-vms.eng.rdu2.redhat.com/rhcos/images/cloud/latest/rhcos-qemu.qcow2.gz | gunzip > rhcos-qemu.qcow2

1.4 Get a pull secret

Go to https://account.coreos.com/ and obtain a Tectonic pull secret.

1.5 Make sure you have permisions for `qemu:///system`

You may want to grant yourself permissions to use libvirt as a non-root user. You could allow all users in the wheel group by doing the following:

cat <<EOF >> /etc/polkit-1/rules.d/80-libvirt.rules
polkit.addRule(function(action, subject) {
  if (action.id == "org.libvirt.unix.manage" && subject.local && subject.active && subject.isInGroup("wheel")) {
      return polkit.Result.YES;
  }
});
EOF

1.6 Configure libvirt to accept TCP connections

The Kubernetes cluster-api components drive deployment of worker machines. The libvirt cluster-api provider will run inside the local cluster, and will need to connect back to the libvirt instance on the host machine to deploy workers.

In order for this to work, you'll need to enable TCP connections for libvirt.

Configure libvirtd.conf

To do this, first modify your /etc/libvirt/libvirtd.conf and set the following:

listen_tls = 0
listen_tcp = 1
auth_tcp="none"
tcp_port = "16509"

Note that authentication is not currently supported, but should be soon.

Configure the service runner to pass `--listen` to libvirtd

In addition to the config, you'll have to pass an additional command-line argument to libvirtd. On Fedora, modify /etc/sysconfig/libvirtd and set:

LIBVIRTD_ARGS="--listen"

On Debian based distros, modify /etc/default/libvirtd and set:

libvirtd_opts="--listen"

Next, restart libvirt: systemctl restart libvirtd

Firewall

Finally, if you have a firewall, you may have to allow connections from the IP range used by your cluster nodes. If you're using the default subnet of 192.168.124.0/24, something along these lines should work:

iptables -I INPUT -p tcp -s 192.168.124.0/24 -d 192.168.124.1 --dport 16509 \
  -j ACCEPT -m comment --comment "Allow insecure libvirt clients"

If using firewalld, simply obtain the name of the existing active zone which can be used to integrate the appropriate source and ports to allow connections from the IP range used by your cluster nodes. An example is shown below.

$ sudo firewall-cmd --get-active-zones
FedoraWorkstation
  interfaces: enp0s25 tun0

With the name of the active zone, include the source and port to allow connections from the IP range used by your cluster nodes. The default subnet is 192.168.124.0/24 unless otherwise specified.

sudo firewall-cmd --zone=FedoraWorkstation --add-source=192.168.124.0/24
sudo firewall-cmd --zone=FedoraWorkstation --add-port=16509/tcp

Verification of the source and port can be done listing the zone

sudo firewall-cmd --zone=FedoraWorkstation --list-ports
sudo firewall-cmd --zone=FedoraWorkstation --list-sources

NOTE: When the firewall rules are no longer needed, sudo firewalld-cmd --reload will remove the changes made as they were not permanently added. For persistence, include the --permanent to the commands that add-source and add-port.

1.7 Prepare the installer configuration file

cp examples/libvirt.yaml ./
Edit the configuration file:
1. Set an email and password in the admin section
2. Set a baseDomain (to tt.testing)
3. Set the sshKey in the admin section to the contents of an ssh key (e.g. ssh-rsa AAAA...)
4. Set the name (e.g. test1)
5. Look at the podCIDR and serviceCIDR fields in the networking section. Make sure they don't conflict with anything important.
6. Set the pullSecret to your JSON pull secret.
7. (Optional) Change the image to the file URL of the operating system image you downloaded (e.g. file:///home/user/Downloads/rhcos.qcow). This will allow the installer to re-use that image instead of having to download it every time.

1.8 Set up NetworkManager DNS overlay

This step is optional, but useful for being able to resolve cluster-internal hostnames from your host.

Edit /etc/NetworkManager/NetworkManager.conf and set dns=dnsmasq in section [main]

Tell dnsmasq to use your cluster. The syntax is server=/<baseDomain>/<firstIP>.

For this example:

echo server=/tt.testing/192.168.124.1 | sudo tee /etc/NetworkManager/dnsmasq.d/tectonic.conf

systemctl restart NetworkManager

1.9 Install the terraform provider

Make sure you have the virsh binary installed: sudo dnf install libvirt-client libvirt-devel
Install the libvirt terraform provider:

GOBIN=~/.terraform.d/plugins go get -u github.com/dmacvicar/terraform-provider-libvirt

1.10 Cache terrafrom plugins (optional, but makes subsequent runs a bit faster)

cat <<EOF > $HOME/.terraformrc
plugin_cache_dir = "$HOME/.terraform.d/plugin-cache"
EOF

2. Build the installer

Following the instructions in the root README:

bazel build tarball

3. Create a cluster

tar -zxf bazel-bin/tectonic-dev.tar.gz
alias tectonic="${PWD}/tectonic-dev/installer/tectonic"

Initialize (the environment variables are a convenience):

tectonic init --config=../libvirt.yaml
export CLUSTER_NAME=<the cluster name>
export BASE_DOMAIN=<the base domain>

Install ($CLUSTER_NAME is test1):

tectonic install --dir=$CLUSTER_NAME

When you're done, destroy:

tectonic destroy --dir=$CLUSTER_NAME

Be sure to destroy, or else you will need to manually use virsh to clean up the leaked resources. The virsh-cleanup script may help with this, but note it will currently destroy all libvirt resources.

With the cluster removed, you no longer need to allow libvirt nodes to reach your libvirtd. Restart firewalld to remove your temporary changes as follows:

sudo firewall-cmd --reload

4. Exploring your cluster

Some things you can do:

Watch the bootstrap process

The bootstrap node, e.g. test1-bootstrap.tt.testing, runs the tectonic bootstrap process. You can watch it:

ssh core@$CLUSTER_NAME-bootstrap.$BASE_DOMAIN
sudo journalctl -f -u bootkube -u tectonic

You'll have to wait for etcd to reach quorum before this makes any progress.

Inspect the cluster with kubectl

You'll need a kubectl binary on your path.

export KUBECONFIG="${PWD}/${CLUSTER_NAME}/generated/auth/kubeconfig"
kubectl get -n tectonic-system pods

Alternatively, if you didn't set up DNS on the host (or you want to do things from the node for other reasons), on the master you can

host# virsh domifaddr master0  # to get the master IP
host# ssh core@<ip of master>
master0# export KUBECONFIG=/var/opt/tectonic/auth/kubeconfig
master0# kubectl get -n tectonic-system pods

Connect to the cluster console

This will take ~30 minutes to be available. Simply go to https://${CLUSTER_NAME}-api.${BASE_DOMAIN}:6443/console/ (e.g. test1.tt.testing) and log in using the credentials above.

FAQ

Libvirt vs. AWS

There isn't a load balancer on libvirt. This means:
1. We need to manually remap ports that the loadbalancer would

Troubleshooting

If following the above steps hasn't quite worked, please review this section for well known issues.

SELinux might prevent access to image files

Configuring the storage pool to store images in a path incompatible with the SELinux policies (e.g. your home directory) might lead to the following errors:

Error: Error applying plan:

1 error(s) occurred:

* libvirt_domain.etcd: 1 error(s) occurred:

* libvirt_domain.etcd: Error creating libvirt domain: virError(Code=1, Domain=10, Message='internal error: process exited while connecting to monitor: 2018-07-30T22:52:54.865806Z qemu-kvm: -fw_cfg name=opt/com.coreos/config,file=/home/user/VirtualMachines/etcd.ign: can't load /home/user/VirtualMachines/etcd.ign')

As described here you can workaround by disabling SELinux, or store the images in a place well-known to work, e.g. by using the default pool.

Random domain creation errors due to libvirt race conditon

Depending on your libvirt version you might encounter a race condition leading to an error similar to:

* libvirt_domain.master.0: Error creating libvirt domain: virError(Code=43, Domain=19, Message='Network not found: no network with matching name 'tectonic'')

This is also being tracked on the libvirt-terraform-provider but is likely not fixable on the client side, which is why you should upgrade libvirt to >=4.5 or a patched version, depending on your environment.

MacOS support currently broken

Support for libvirt on Mac OS is currently broken and being worked on.

Error with firewall initialization on Arch Linux

If you're on Arch Linux and get an error similar to

libvirt: “Failed to initialize a valid firewall backend”

error: Failed to start network default
error: internal error: Failed to initialize a valid firewall backend

please check out this thread on superuser.

Github Issue Tracker

You might find other reports of your problem in the Issues tab for this repository where we ask you to provide any additional information. If your issue is not reported, please do.

11 KiB Raw Blame History