Inserting the instructions to install a cluster expanding
nodes in Local Zones (new VPC), into existing documentation
of installing in existing VPC.
The Day-2 section is also added for reference of existing Local
Zone automation. The Day 2 is not part of the official documentation
delivered on 4.14, but it is mapped as an open question in the
enhancement proposal [1232](https://github.com/openshift/enhancements/pull/1232).
The steps described on the KCS was validated with QE and SDN teams.
The `MachinesSubnet` field has been reshaped as `controlPlanePort`,
this commit updates the docs to ensure `controlPlanePort` is used.
Also, this commit adds dual-stack documentation.
When a machine is created with a compute availability zone (defined via `mpool.zones`) and a storage root volume (defined as `mpool.rootVolume`) and that `rootVolume` has no specified `zones`, CAPO will use the compute AZ for the volume AZ.
This can be problematic if the AZ doesn't exist in Cinder.
Source:
9d183bd479/pkg/cloud/services/compute/instance.go (L439-L442)
```golang
func (s *Service) getOrCreateRootVolume(eventObject runtime.Object, instanceSpec *InstanceSpec, imageID string) (*volumes.Volume, error) {
(...)
availabilityZone := instanceSpec.FailureDomain
if rootVolume.AvailabilityZone != "" {
availabilityZone = rootVolume.AvailabilityZone
}
(...)
```
If a compute AZ is provided alongside with a root volume, we now require
the root volume to have an AZ, so we force the user to make a choice on
which AZ the root volume is deployed on.
We are also enforcing it via CEL validation in OpenShift API.
* Do nothing - at the risk of hitting this situation: a failure domain with a Compute AZ and a root volume with no AZ, CAPO using the compute AZ to create the volume but that AZ doesn't exist in Cinder, leading into Machine creation errors.
* Only do a validation in the CPMS - which will require CPMS manual
edits from the user.
* Change logic in CAPO wrt how root volume AZ is picked - unlikely to happen
When attaching a manila network by editing a machinset, you probably
want to disable allowed address pairs. Document this.
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2.5 years ago we allowed to configure `serverGroupPolicy` in
install-config so a user could choose which Nova scheduling policy
to adopt for the machines.
However, if the masters were configured with AZ, Terraform would
create one ServerGroup in OpenStack (the one from master-0) but
configure the Machine providerSpec with different ServerGroups, one
per AZ. This was unwanted and now we want to use a single ServerGroup
for masters.
With compute AZ support, the users already have the possibility to
ensure that masters aren't on the same failure domain as others.
Also, even if there is less than 3 AZs (e.g. 2), the default
`soft-anti-affinity` server group policy would make Nova to
scheduling in best effort the machines on different hosts within a same
AZ.
Therefore, there is no need to configure the master machines with a
`serverGroup` per availability zone in their Machines.
Also, note that in OCP 4.14, CPMS will be enabled by default.
If a user has set multiple AZ for the controlPlane and upgrade from
4.13 to 4.14, CPMS will adopt the control plane and create a CPMS in
Inactive mode, with a single `serverGroup`. The `serverGroup` will
likely be the one from master-0, and this will be shared across all
control plane machines.
It'll be up to the user to set the CPMS to Active
and then the masters will be redeployed in the unique group for all
masters. They will never have a ServerGroup with "clusterID + role" name
because in previous releases we added the AZ name in it.
* Updated the link to the rhcos.json to point to the new location in the
Installer repo (data/data/coreos/rhcos.json)
* Updated the json path to include the architecture and the images
content.
* Changed the instruction to use the existing boot image in the
rhcos-cloud project instead of copying it as a new image.
Fix a syntax error in the validation script. Before this change, the
script would not detect, and thus error, on an endpoint with a schema
that is not HTTP and not HTTPS.
Before this change the validation steps, and the script, assumed that
`internal` and `admin` OpenStack endpoints were always reachable. With
this change, the manual steps and the script are amended to only check
the validity of HTTPS certificates on the `public` endpoints of the
OpenStack catalog.
Provide manual instructions to check the HTTPS certificates of the
OpenStack endpoints for systems where the required tools for the
provided script aren't available.
Our indicated supported version was incorrect. Rather than having to
remember to update it for each new OSP version, simply remove this
snippet.
The LB FIP is now called the API FIP.
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Add Documentation for Phase-1[1] of installing OCP cluster in existing VPC
with Local Zone Subnets. The documentation includes CloudFormation Templates
to create Local Zone public subnet and route table association.
[1] Enhancement Proposal: https://github.com/openshift/enhancements/pull/1232
Before this documentation patch, the known issue about
soft-anti-affinity had several issues:
* it was in the UPI section, when it is not a UPI-specific issue
* it mentioned Control plane scale-out, when OCP only supports exactly 3 masters
* it is now possible to set strict anti-affinity from the
install-config.yaml, and that should be the recommended solution when
VM distribution across hosts is required.