We haven't needed these since we dropped the parsers in f7a4e68b
(pkg/types/config: Drop ParseConfig and other Parse* methods,
2018-10-02, #403).
Generated with:
$ sed -i 's/ yaml:.*/`/' $(git grep -l yaml pkg/tfvars)
The intention here is to help people debug. If you like this
I may also try to do something similar for masters.
V2: Applied code changes mostly written by @wking
The kube-addon operator was the last remaining component in that
namespace, and it was just controlling a metrics server. Metrics
aren't critical to cluster functions, and dropping kube-addon means we
don't need the old pull secret anymore (although we will shortly need
new pull secrets for pulling private release images [1]).
Also drop the admin and user roles [2], although I'm less clear on
their connection.
[1]: https://github.com/openshift/installer/pull/663
[2]: https://github.com/openshift/installer/pull/682#issuecomment-439145907
To keep the platform-specific code separate.
The only thing that wasn't perfectly clean about this separation is
that now the AWS and OpenStack packages have their own defaultVPCCIDR.
There's no reason that they need to agree on that CIDR, though, so I
haven't bothered setting up a common-default package for them to share
a single constant.
This allows the user to supply and use an on-disk asset (such as
install-config.yml) without the need to also supply the state file that was
created. This is helpful when re-using an on-disk asset for multiple
installations. In particular, hive would like to run openshift-install
with a supplied install-config.yml and no state file.
To effect this behavior, the asset store loads all of the on-disk assets that a
fetched asset depends upon prior to fetching the dependencies for the fetched
asset. From this, the asset store can determine whether the fetched asset is
dirty or not. If the fetched asset is not dirty and is on-disk or in the state
file, then the asset is used as is without generating any of the dependent
assets--as they would be ignored in resolving the fetched asset anyway.
Conflicts can occur when the asset store resolves in different ways the fetch of
two assets that share a dependency. For example, let us say that there are two
assets, A and B, that both depend upon asset C. Asset A is present on disk, and
asset B is not present on disk. When the asset store fetchs asset A, then asset
C will not be generated. However, when the asset store fetches asset B, then
asset C will be generated in order to generate asset B. Asset A could
potentially have data that conflicts with the data that would have been taken
from the asset C that was generated.
The new load function creates new asset instances to store the asset state loaded
from on-disk and from the state file. The store tests were relying on the same
asset instance being used throughout the test. Unfortunately, the tests now
need to use a lot of global variables, making the tests more fragile.
The assetToPurge field was removed since the same information can be obtained by
iterating over the assets map. Also, the parameter passed to the purge function
was changed to a single Asset instead of an Asset slice since the function is
only ever called with a single Asset.
Fixes https://github.com/openshift/installer/issues/545
To keep the platform-specific code separate. These converters are a
bit tiny for their own packages, but they seemed too
application-specific to belong to pkg/types.
To avoid wiping out the caller's whole libvirt environment, regardless
of whether it was associated with our cluster or not. Using
cluster-name prefixes still makes me a bit jumpy, so I've added
warnings to both the environment-variable and asset-prompt docs
warning libvirt users to pick something sufficiently unique.
Also:
* Use {cluster-name}-master-{count} naming. We used to use
master{count}, which diverged from other usage (e.g. AWS, which has
used master-{count} since way back in ca443c5e (openstack/nova:
replace cloud-init with ignition, 2017-02-27,
coreos/tectonic-installer#7).
* Rename module.libvirt_base_volume -> module.volume. There's no
reason to diverge from the module source for that name.
And I've rerolled deletion to use a single call to each deleter,
failing fast if they error. That should address cases where we cannot
destroy a shut-off domain [1]:
$ virsh -c $OPENSHIFT_INSTALL_LIBVIRT_URI list --all
Id Name State
----------------------------------------------------
- master0 shut off
- test1-worker-0-zd7hd shut off
$ bin/openshift-install destroy cluster --dir test --log-level debug
DEBUG Deleting libvirt volumes
DEBUG Deleting libvirt domains
DEBUG Deleting libvirt network
DEBUG Exiting deleting libvirt network
DEBUG goroutine deleteNetwork complete
ERROR Error destroying domain test1-worker-0-zd7hd: virError(Code=55, Domain=10, Message='Requested operation is not valid: domain is not running')
DEBUG Exiting deleting libvirt domains
DEBUG Exiting deleting libvirt volumes
DEBUG goroutine deleteVolumes complete
DEBUG Deleting libvirt domains
ERROR Error destroying domain test1-worker-0-zd7hd: virError(Code=55, Domain=10, Message='Requested operation is not valid: domain is not running')
[...]
Now we'll fail-fast in those cases, allowing the caller to clear the
stuck domains, after which they can restart deletion.
The previous goroutine approach was borrowed from the AWS destroyer.
But AWS has a large, complicated resource dependency graph which
includes cycles. Libvirt is much simpler, with volumes and a network
that are all independent, followed by domains which depend on the
network and some of the volumes. With this commit we now take a
single pass at destroying those resources starting at the leaf domains
and working our way rootwards.
I've retained some looping (although no longer in a separate
goroutine) for domain deletion. This guards against racing domain
creation, as discussed in the new godocs for deleteDomains.
Also:
* Rename from libvirt_prefix_deprovision.go to libvirt.go. The name
is from 998ba306 (cmd,pkg/destroy: add non-terraform destroy,
2018-09-25, #324), but the implementation doesn't need to be
represented in the filename. This commit renames to libvirt.go to
match the package name, since this file is the guts of this package.
* Simplify the AlwaysTrueFilter implementation. No semantic changes,
but this saves us a few lines of code.
* Add trailing periods for godocs to comply with [2].
[1]: https://github.com/openshift/installer/issues/656#issue-379634884
[2]: https://github.com/golang/go/wiki/CodeReviewComments#comment-sentences
We'd been defaulting it to ClusterName in InstallConfig.Generate, and
I see no reason for the user to want to create a separate name for the
network alone. The variable dates back to 4a08942c (steps: bootstrap
/ etcd / topology support for libvirtm 2018-04-24,
coreos/tectonic-installer#3213), where it is not explicitly motivated.
The old *PlatformType are from cccbb37a (Generate installation assets
via a dependency graph, 2018-08-10, #120), but since 476be073
(pkg/asset: use vendored cluster-api instead of go templates,
2018-10-30, #573), we've had variables for the name strings in the
more central pkg/types. With this commit, we drop the more peripheral
forms. I've also pushed the types.PlatformName{Platform} variables
down into types.{platform}.Name at Ahbinav's suggestion [1].
I've added a unit test to enforce sorting in PlatformNames, because
the order is required by sort.SearchStrings in queryUserForPlatform.
[1]: https://github.com/openshift/installer/pull/659#discussion_r232849156
OpenStack creds cold be in 3 different paths (etc, home config and
current dir). Instead of re-implementing the logic to find and read the
clouds.yaml file, we should use gophercloud which is the standard
go library for OpenStack.
Note that deployments on OpenStack are currently broken unless there's
a clouds.yaml under /etc/openstack.
Fixes #550
The asset store tests that call Fetch create residual state files (and could also use
any state files left over from previous tests). This change uses a temporary directory
for each test run so that the environment is clean before and after the tests.
This decouples our platforms a bit and makes it easier to distinguish
between platform-specific and platform-agnostic code. It also gives
us much more compact struct names, since now we don't need to
distinguish between many flavors of machine pool, etc. in a single
package.
I've also updated pkg/types/doc.go; pkg/types includes more than
user-specified configuration since 78c31183 (pkg: add ClusterMetadata
asset,type that can be used for destroy, 2018-09-25, #324).
I've also added OWNERS files for some OpenStack-specific directories
that were missing them before.
There's still more work to go in this direction (e.g. pushing default
logic into subdirs), but this seems like a reasonable chunk.
1. Move files from manifests/content to templates directory
2. Create new asset called templates that the target manifest-templates can directly call
3. All template files are separate assets by themselves, and 'templates' asset depends on all leaf template assets
4. Manifest/tectonic assets now use templates as parent assets that they depend upon
Other templates (e.g. ignition/machines) are not moved into assets in this commit.
data/data/manifests: move all yaml content to its own files
So that a yaml lint check can catch the inappropriate ones.
No functional change at runtime.
It's easier for humans and linters to find this content if it's not
hidden in Go variables.
Since we're effectively pulling these files from Git now (either at
build time or at run-time depending on release vs. dev mode in
hack/build.sh), I'm being a bit more relaxed about file modes than the
previous implementation. Files are now either 0555 (if they are in a
'bin' directory) or 0600 (if they aren't). This is a change for files
like manifests.Manifests, which had previously been 0644.
I've flattened the manifest overrides into a single directly, because
the filenames are sufficient for sorting them by operator. And all of
the override manifests now have their own comment explaining their
target and eventual location.
Keep the environment variable (with a warning about using it), but
drop the interactive prompt. The default is solid, and users
manipulating it are more likely to break something (e.g. by continuing
to use the old v1 pipeline), while the installer can update its
default to track the RHCOS folks (e.g. like 11178211, rhcos: implement
image discovery for new pipeline, 2018-10-26, #554).
Brian says [1]:
The third time I had to abort was when prompted for base domain
followed by cluster name (I included my cluster name in my base
domain because I'm using a well structure name server delegation
structure).
And I've seen a number of other cases where folks suggest including
the cluster name again in the base domain. Sometimes you might need
to do that (e.g. if you cannot create subdomains without the
additional namespacing). But in most cases, including the cluster
name in the base domain is redundant.
[1]: https://github.com/openshift/installer/issues/627#issue-378069672
The new pipeline handles Content-Encoding gzip:
$ curl -LI --compressed https://releases-rhcos.svc.ci.openshift.org/storage/releases/maipo/47.73/redhat-coreos-maipo-47.73-qemu.qcow2
HTTP/1.1 302 Moved Temporarily
Server: nginx/1.12.1
Date: Tue, 06 Nov 2018 20:55:06 GMT
Content-Type: text/html
Content-Length: 161
Location: https://d26v6vn1y7q7fv.cloudfront.net/releases/maipo/47.73/redhat-coreos-maipo-47.73-qemu.qcow2
Set-Cookie: ...; path=/; HttpOnly; Secure
HTTP/1.1 200 OK
Content-Type: binary/octet-stream
Content-Length: 726878219
Connection: keep-alive
Date: Tue, 06 Nov 2018 20:23:37 GMT
Last-Modified: Tue, 06 Nov 2018 17:56:36 GMT
ETag: "021080ef3b515d2443be3749ebbb0b08-87"
Content-Encoding: gzip
Accept-Ranges: bytes
Server: AmazonS3
Age: 1890
X-Cache: Hit from cloudfront
Via: 1.1 d85d7507ed6501757cfe600c02a26c7d.cloudfront.net (CloudFront)
X-Amz-Cf-Id: ...
so we can rely on Go's default compression handling. See
DisableCompression, which defaults to false, in [1]. To confirm the
default handling, try to retrieve from a local server:
$ export OPENSHIFT_INSTALL_LIBVIRT_IMAGE=http://localhost:8080/example.qcow
$ openshift-install --dir=wking create cluster
INFO Fetching OS image...
FATAL Error executing openshift-install: failed to fetch Terraform Variables: failed to generate asset "Terraform Variables": failed to get Tfvars: failed to use cached libvirt image: Get http://localhost:8080/example.qcow: EOF
In another terminal, I was using Ncat as a dummy server to capture
request headers:
$ nc -l -p 8080 </dev/null
GET /example.qcow HTTP/1.1
Host: localhost:8080
User-Agent: Go-http-client/1.1
Accept-Encoding: gzip
This commit drops the now-unnecessary suffix handling from 7abe94d4
(config/libvirt/cache: Decompress when URI has ".gz" suffix,
2018-09-21, #280).
[1]: https://golang.org/pkg/net/http/#Transport
Without this, round-robin clients will fail when they hit the
bootstrap DNS entry (after the bootstrap node stops serving its
control plane).
The implementation is a bit awkward; I'd have preferred the AWS
approach, with:
resource "aws_lb_target_group_attachment" "bootstrap" {
count = "${var.target_group_arns_length}"
target_group_arn = "${var.target_group_arns[count.index]}"
target_id = "${aws_instance.bootstrap.private_ip}"
}
in the bootstrap module. But the libvirt host entries are only
available as a subsection of a libvirt_network resource (because the
whole network is defined in a single XML object, including the DNS
entries [1]). So instead I've added an additional variable which we
can tweak to disable the bootstrap entry. The default value for the
new variable includes the bootstrap entry for the initial cluster
'apply' call; on destry I override it via an *.auto.tfvars file (which
Terraform loads automatically [2]) to remove the bootstrap entry.
[1]: https://libvirt.org/formatnetwork.html#elementsAddress
[2]: https://www.terraform.io/docs/configuration/variables.html
Manifest to create this resources belongs in installer. The renderer in
cluster-kube-apiserver-operator will need to be also changed to stop
creating the same manifest file.
Also, updated the dependency graph at docs/design/resource_dep.svg
Save and Purge functions don't really seem to belong to Store interface.
Fetch fetches the asset, checkpoints the state file to disk and consumes
all the assets that were loaded from disk to create the asset.
Also fixes the error in save() where it would only save the assets in memory and
drops all the other assets present in the state file but were not fetched into the store's
asset map
eg:
```console
./openshift-install ign configs # state has everything
./openshift-install install-config # state now only has assets uptill install-config, all the ign config assets are lost from statefile
./openshift-install ign configs # state has everything
./openshift-install install-config # state still includes all the state from 'ign-config' run.
```
Destroy deletes the asset from all the possible sources, state file and disk.
Back in 54242f8, RemainAfterExit was added to prevent bootkube.service
and tectonic.service from running multiple times when progress.service
restarted. Due to a systemd bug [1], this causes the restart to be
ignored on those services.
This new approach uses conditions instead. This allows systemd to
activate the services but not actually start them if they have already
completed.
[1]: https://github.com/systemd/systemd/issues/3396
Empty machinepool is a valid for libvirt because we don't use it.
Fixes the error
```console
FATAL Error executing openshift-install: non-Libvirt machine-pool: ""
``
introduced in https://github.com/openshift/installer/pull/573