We'd been defaulting it to ClusterName in InstallConfig.Generate, and
I see no reason for the user to want to create a separate name for the
network alone. The variable dates back to 4a08942c (steps: bootstrap
/ etcd / topology support for libvirtm 2018-04-24,
coreos/tectonic-installer#3213), where it is not explicitly motivated.
The old *PlatformType are from cccbb37a (Generate installation assets
via a dependency graph, 2018-08-10, #120), but since 476be073
(pkg/asset: use vendored cluster-api instead of go templates,
2018-10-30, #573), we've had variables for the name strings in the
more central pkg/types. With this commit, we drop the more peripheral
forms. I've also pushed the types.PlatformName{Platform} variables
down into types.{platform}.Name at Ahbinav's suggestion [1].
I've added a unit test to enforce sorting in PlatformNames, because
the order is required by sort.SearchStrings in queryUserForPlatform.
[1]: https://github.com/openshift/installer/pull/659#discussion_r232849156
This decouples our platforms a bit and makes it easier to distinguish
between platform-specific and platform-agnostic code. It also gives
us much more compact struct names, since now we don't need to
distinguish between many flavors of machine pool, etc. in a single
package.
I've also updated pkg/types/doc.go; pkg/types includes more than
user-specified configuration since 78c31183 (pkg: add ClusterMetadata
asset,type that can be used for destroy, 2018-09-25, #324).
I've also added OWNERS files for some OpenStack-specific directories
that were missing them before.
There's still more work to go in this direction (e.g. pushing default
logic into subdirs), but this seems like a reasonable chunk.
Currently the Machine and MachineSet objects are go templates. Moving them
to vendored code allows other consumers like openshift/hive to use these public
helpers effectively.
openstack platform still uses go templates because of vendoring problems.
This code was recently reworked in:
https://github.com/openshift/installer/pull/493
Adding OpenStack here means the destroy bootstrap option will
work for OpenStack, which is a first step towards full destroy
support for the OpenStack platform.
While analyzing the generated cluster-config.yaml for an OpenStack
deployment, I noticed vpcID under the OpenStack platform config. This
is not used anywhere and should just be removed.
This allows combining values for machine configuration from various
sources like,
1. Defaults decided by installer
2. Platform level defaults in InstallConfig
3. Per pool options.
This removes a few Cluster properties which are no longer used by the
new installer. It also drops YAML loading for Cluster now that it's a
one-way map from InstallConfig to Terraform JSON. I've shifted
validation into its own package, since we should be validating these
at the InstallConfig level.
I've also shifted the Terraform variable generation out of pkg/types,
because it's not about public types. I've reduced the variable
generation to a single function.
I've dropped the old Cluster (and subtype) specific validation. We'll
want to add InstallConfig validation once we support loading it from
the disk, but for now we can rely on the UserProvided validation (and
this commit was already large enough ;).
With openshift-install, the config type is a one-way map from
InstallConfig to Terraform, so we can drop these methods. The last
consumers were removed in b6c0d8c1 (installer: remove package,
2018-09-26, #342).
Libvirt can't use the HTTP cache directly, because the cached content
contains HTTP headers and other metadata. Rather than writing our own
HTTP cache [1] to perform this unpacking, I've just added a separate
cache layer for the unpacked images.
I'm currently trusting upstream ETags as cache keys, but if we want to
be paranoid at the expense of hashing a gigabyte or two of data, we
could use SHA1 sums of the payload or similar.
I'm using Flock [2] to protect against parallel calls racing unpacks.
My intention is to have:
1. Alice gets the lock, starts unpacking into the image cache.
2. Bob blocks on the lock.
3. Alice finishes unpacking and renames to the final filename.
4. Charlie hits the same ETag and grabs the unpacked image without
even checking the lock.
5. Alice releases the lock.
6. Bob aquires the lock, sees that the final image is already in
place, and uses it without unpacking again.
The quote-splitting extracts the opaque-tag component from the
entity-tag value [3]. Slashes and other filesystem-sensitive
components are also legal opaque-tag value, so I'm washing the
opaque-tag through MD5 to guard against that. The MD5 wash also
ensures that the .tmp and .lock suffixes are safe, since the '.'
separator (and none of the other suffix characters) are not valid hex.
We'll still leak cache into imageCacheDir (like we used to leak into
/tmp), but:
* Now it's off of /tmp, which can cause memory issues for folks who
use a tmpfs for /tmp and don't have gigs of extra memory ;).
* Now we only leak one unpacked image per ETag, while before we leaked
one unpacked image per non-file:// UseCachedImage call.
[1]: https://godoc.org/github.com/gregjones/httpcache#Cache
[2]: https://godoc.org/golang.org/x/sys/unix#Flock
[3]: https://tools.ietf.org/html/rfc7232.html#section-2.3
When the user passes the installer a domain name ending in dot `.`,
the S3 bucket name generated by the installer is invalid.
This commit automatically drops the trailing dot to create valid
S3 bucket names.
This slightly improves some of the logging messages and hides the
Terraform output by default (it can still be enabled by setting the log
level to debug). Installing a cluster with libvirt looks like the
following:
INFO Fetching OS image...
INFO Using Terraform to create cluster...
Docs for Go's build constraints are in [1]. This commit allows folks
with local libvirt C libraries to compile our libvirt deletion logic
(and get a dynamically-linked executable), while release binaries and
folks without libvirt C libraries can continue to get
statically-linked executables that lack libvirt deletion.
I've also simplified the public names (e.g. NewDestroyer -> New),
dropping information which is already encoded in the import path.
Pulling the init() registration out into separate files is at
Abhinav's request [2].
[1]: https://golang.org/pkg/go/build/#hdr-Build_Constraints
[2]: https://github.com/openshift/installer/pull/387#discussion_r221763315
This commit includes support for OpenStack as a target deployment
platform. There are still some things to implement, such as DNS and
destroy support, that will come in future PRs.
Contributors (in alphabetical order) include:
Co-authored-by: Flavio Percoco <flavio@redhat.com>
Co-authored-by: Jeremiah Stuever <jstuever@redhat.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Steven Hardy <shardy@redhat.com>
Co-authored-by: Tomas Sedovic <tsedovic@redhat.com>
Co-authored-by: W. Trevor King <wking@tremily.us>
To match a lot of other error handling in this package and give more
useful error messages (if we somehow get past the input validation in
pkg/asset/installconfig).
We haven't set it in the cluster struct yet, that's a few lines below.
Fixes [1]:
# openshift-install cluster
? Email Address dgoodwin@redhat.com
? Password [? for help] ********
? SSH Public Key <none>
? Base Domain REDACTED
? Cluster Name dgoodwin1
? Pull Secret REDACTED
? Platform aws
? Region us-east-1 (N. Virginia)
FATA[0046] failed to generate asset: failed to determine default AMI: MissingRegion: could not find region configuration
and a copy/paste bug from c6f02bae (pkg/cluster/config: add missing
ami fetch, 2018-09-25, #324).
[1]: https://github.com/openshift/installer/issues/335#issue-363995958
Add default VPCCIDIR, network type, IfName, IPRange, imageURL.
Also set different default node numbers for different platform.
This is because on libvirt, there's a potential race for the network
setup on libvirt platform
Also ask users for the libvirt image URL.
Previously, the installer binary generates the ignitions for master
and worker nodes, and pass the filename to the terraform.
However, in the new installer binary, we will instead pass the content
of the ignition files to terraform.
Passing the compressed images to libvirt was giving me "no bootable
device" errors with my:
$ rpm -q libvirt qemu-kvm-rhev
libvirt-3.9.0-14.el7_5.7.x86_64
qemu-kvm-rhev-2.9.0-16.el7_4.13.x86_64
Ideally, the HTTP headers would tell us that the image was compressed
(via Content-Type [1] or Content-Encoding [2]), but
aos-ostree.rhev-ci-vms.eng.rdu2.redhat.com isn't configured to do that
at the moment:
$ curl -I http://aos-ostree.rhev-ci-vms.eng.rdu2.redhat.com/rhcos/images/cloud/latest/rhcos-qemu.qcow2.gz
HTTP/1.1 200 OK
Server: nginx/1.8.0
Date: Sat, 22 Sep 2018 04:59:53 GMT
Content-Type: application/octet-stream
Content-Length: 684751099
Last-Modified: Fri, 21 Sep 2018 13:56:07 GMT
Connection: keep-alive
ETag: "5ba4f877-28d078fb"
Accept-Ranges: bytes
$ curl -I http://aos-ostree.rhev-ci-vms.eng.rdu2.redhat.com/rhcos/images/cloud/latest/rhcos-qemu.qcow2
HTTP/1.1 404 Not Found
Server: nginx/1.8.0
Date: Sat, 22 Sep 2018 04:59:56 GMT
Content-Type: text/html
Content-Length: 168
Connection: keep-alive
I've opened [3] about getting Content-Encoding support via gzip_static
[4], but in the meantime, just assume that anything with a ".gz"
suffix is gzipped.
Unzipping is also somewhat expensive (although not as expensive as
network requests). It would be nice for callers to be able to
configure whether we cache the compressed images (less disk
consumption) or the uncompressed images (less CPU load when launching
clusters). For now, we're just caching the network response.
[1]: https://tools.ietf.org/html/rfc7231#section-3.1.1.5
[2]: https://tools.ietf.org/html/rfc7231#section-3.1.2.2
[3]: https://projects.engineering.redhat.com/browse/COREOS-593
[4]: http://nginx.org/en/docs/http/ngx_http_gzip_static_module.html
Assume the caller has HOME set and XDG_CACHE_HOME not set, because
that's easy. The ~/.cache default is from [1]. Once we bump to Go
1.11, we can revert this commit and get the more-robust stdlib
implementation, which is why I haven't bothered with a robust
implementation here.
[1]: https://standards.freedesktop.org/basedir-spec/basedir-spec-0.7.html
Checking our RHCOS source against ETag [1] / If-None-Match [2] and
Last-Modified [3] / If-Modified-Since [4], it seems to support
In-None-Match well, but only supports If-Modified-Since for exact
matches:
$ URL=http://aos-ostree.rhev-ci-vms.eng.rdu2.redhat.com/rhcos/images/cloud/latest/rhcos-qemu.qcow2.gz
$ curl -I "${URL}"
HTTP/1.1 200 OK
Server: nginx/1.8.0
Date: Wed, 19 Sep 2018 04:32:19 GMT
Content-Type: application/octet-stream
Content-Length: 684934062
Last-Modified: Tue, 18 Sep 2018 20:05:24 GMT
Connection: keep-alive
ETag: "5ba15a84-28d343ae"
Accept-Ranges: bytes
$ curl -sIH 'If-None-Match: "5ba15a84-28d343ae"' "${URL}" | head -n1
HTTP/1.1 304 Not Modified
$ curl -sIH 'If-Modified-Since: Tue, 18 Sep 2018 20:05:24 GMT' "${URL}" | head -n1
HTTP/1.1 304 Not Modified
$ curl -sIH 'If-Modified-Since: Tue, 18 Sep 2015 20:05:24 GMT' "${URL}" | head -n1
HTTP/1.1 200 OK
$ curl -sIH 'If-Modified-Since: Tue, 18 Sep 2018 20:05:25 GMT' "${URL}" | grep 'HTTP\|Last-Modified'
HTTP/1.1 200 OK
Last-Modified: Tue, 18 Sep 2018 20:05:24 GMT
That last entry should have 304ed, although the spec has [4]:
When used for cache updates, a cache will typically use the value of
the cached message's Last-Modified field to generate the field value
of If-Modified-Since. This behavior is most interoperable for cases
where clocks are poorly synchronized or when the server has chosen
to only honor exact timestamp matches (due to a problem with
Last-Modified dates that appear to go "back in time" when the origin
server's clock is corrected or a representation is restored from an
archived backup). However, caches occasionally generate the field
value based on other data, such as the Date header field of the
cached message or the local clock time that the message was
received, particularly when the cached message does not contain a
Last-Modified field.
...
When used for cache updates, a cache will typically use the value of
the cached message's Last-Modified field to generate the field value
of If-Modified-Since. This behavior is most interoperable for cases
where clocks are poorly synchronized or when the server has chosen
to only honor exact timestamp matches (due to a problem with
Last-Modified dates that appear to go "back in time" when the origin
server's clock is corrected or a representation is restored from an
archived backup). However, caches occasionally generate the field
value based on other data, such as the Date header field of the
cached message or the local clock time that the message was
received, particularly when the cached message does not contain a
Last-Modified field.
So the server is violating the SHOULD by not 304ing dates greater than
Last-Modified, but it's not violating a MUST-level requirement. The
server requirements around If-None-Match are MUST-level [2], so using
it should be more portable. The RFC also seems to prefer clients use
If(-None)-Match [4,5].
I'm using gregjones/httpcache for the caching, since that
implemenation seems reasonably popular and the repo's been around for
a few years. That library uses the belt-and-suspenders approach of
setting both If-None-Match (to the cached ETag) and If-Modified-Since
(to the cached Last-Modified) [6], so we should be fine.
UserCacheDir requires Go 1.11 [7,8,9].
[1]: https://tools.ietf.org/html/rfc7232#section-2.3
[2]: https://tools.ietf.org/html/rfc7232#section-3.2
[3]: https://tools.ietf.org/html/rfc7232#section-2.2
[4]: https://tools.ietf.org/html/rfc7232#section-3.3
[5]: https://tools.ietf.org/html/rfc7232#section-2.4
[6]: 9cad4c3443/httpcache.go (L169-L181)
[7]: https://github.com/golang/go/issues/22536
[8]: 816154b065
[9]: 50bd1c4d4e
A recent commit, d01ac5d70b, introduced
a regression for libvirt.
$ tectonic init --config=../libvirt.yaml
FATA[0020] failed to get configuration from file "../libvirt.yaml": ../libvirt.yaml is not a valid config file: failed to determine default AMI: ...
Fix this by only bothering to determine the default AMI when the platform
is aws.
We're pushing public AMIs since openshift/os@6dd20dc6 (jenkins: Make
RHCOS AMI Public, 2018-09-18, openshift/os#304). There's still no
public analog to [1], so I'm just scraping this from metadata on
images available via the AWS API. The analogous AWS command line
invocation is:
$ AWS_DEFAULT_REGION=us-east-1 aws ec2 describe-images --filter 'Name=name,Values=rhcos*' --query 'sort_by(Images, &CreationDate)[-1].ImageId' --output text
with a few extra filters thrown in. The full set of metadata on the
most recent current image is:
$ AWS_DEFAULT_REGION=us-east-1 aws ec2 describe-images --filter 'Name=name,Values=rhcos*' --query 'sort_by(Images, &CreationDate)[-1]' --output json
{
"VirtualizationType": "hvm",
"Description": "Red Hat CoreOS 4.0.5846 (c9a6bb48b837b5bcfeb9bd427be9a18b5bd75b6c57cb289245f211ff98b2a740)",
"Hypervisor": "xen",
"EnaSupport": true,
"SriovNetSupport": "simple",
"ImageId": "ami-08a5792a684330602",
"State": "available",
"BlockDeviceMappings": [
{
"DeviceName": "/dev/xvda",
"Ebs": {
"Encrypted": false,
"DeleteOnTermination": true,
"VolumeType": "gp2",
"VolumeSize": 8,
"SnapshotId": "snap-00a45db4ad6173805"
}
},
{
"DeviceName": "/dev/xvdb",
"VirtualName": "ephemeral0"
}
],
"Architecture": "x86_64",
"ImageLocation": "531415883065/rhcos_dev_c9a6bb4-hvm",
"RootDeviceType": "ebs",
"OwnerId": "531415883065",
"RootDeviceName": "/dev/xvda",
"CreationDate": "2018-09-19T23:40:54.000Z",
"Public": true,
"ImageType": "machine",
"Name": "rhcos_dev_c9a6bb4-hvm"
}
That doesn't include the "tested" information, so there's still no
support for changing channels. We'll need to wait for a public analog
of [1], which is blocked on getting stable, production hosting for the
release metadata.
I'd prefer to use JMESPath and server-side filtering in Go as well, to
only return the latest matching AMI. But the AWS Go library doesn't
seem to support server-side filtering at the moment [2]. Docs for the
AWS Go APIs I'm using are in [3,4,5,6,7,8].
The filters I'm adding here are similar to those we used for Container
Linux before they were dropped in 702ee7bb (*: Remove stale Container
Linux references, 2018-09-11, #233). I added a few more just to be
conservative (e.g. we don't want to match a pending or failed image,
so I require state to be available).
I haven't pushed the Context variables all the way up the stack yet,
so there are some context.TODO() entries. The 30-second timeout keeps
us from hanging excessively when the caller lacks AWS credentials; the
error messages look like:
failed to init cluster: failed to parse test config: failed to determine default AMI: NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors
You can test this error condition by removing the explicit AMI values
I've added to our fixtures in this commit and running:
$ AWS_PROFILE=does-not-exist go test ./installer/pkg/...
[1]: http://aos-ostree.rhev-ci-vms.eng.rdu2.redhat.com/rhcos/images/aws-us-east-1-tested.json
[2]: https://github.com/aws/aws-sdk-go/issues/2156
[3]: https://docs.aws.amazon.com/sdk-for-go/api/aws/session/#NewSessionWithOptions
[4]: https://docs.aws.amazon.com/sdk-for-go/api/aws/session/#Options
[5]: https://docs.aws.amazon.com/sdk-for-go/api/aws/session/#Must
[6]: https://docs.aws.amazon.com/sdk-for-go/api/service/ec2/#New
[7]: https://docs.aws.amazon.com/sdk-for-go/api/service/ec2/#EC2.DescribeImagesWithContext
[8]: https://docs.aws.amazon.com/sdk-for-go/api/service/ec2/#DescribeImagesInput
Providing a fallback here allows you to avoid repeating the same
config across several machine pools. For example, you may want to use
the same AWS instance type for both masters and workers.
Also improve the LibvirtPlatform.URI comment to make it easier to
distinguish from installer/pkg/config/libvirt's Libvirt.Image. That
image location currently has no analog in the InstallConfig structure.
It was removed in 9ee45fb6 (*: provide a default OS image for libvirt,
2018-09-17, #271), but we may restore it soon.
Instead of requiring the user to download RHCOS and unpack it, this
allows them to be lazy and just let the installer download the image
from a default location. Note that there is no caching, so it is
recommended that users still manually download and unpack RHCOS.
I'd missed this while rerolling 89f05dac (pkg/types/installconfig: Add
AWSPlatform.UserTags, 2018-09-12, #239).
I've also updated the UserTags comment to use "for the cluster"
instead of "by the cluster". Resources can be created by the
installer (not part of the cluster) or by operators living inside the
cluster, but regardless of the creator, these cluster resources should
get UserTags.