1
0
mirror of https://github.com/openshift/openshift-docs.git synced 2026-02-05 12:46:18 +01:00

Remote worker node at the Edge

This commit is contained in:
Michael Burke
2020-09-16 20:14:04 -04:00
committed by openshift-cherrypick-robot
parent fe5cf7d35a
commit 524ef6175e
5 changed files with 275 additions and 0 deletions

View File

@@ -1282,6 +1282,12 @@ Topics:
- Name: Enabling features using FeatureGates
File: nodes-cluster-enabling-features
Distros: openshift-enterprise,openshift-webscale,openshift-origin
- Name: Remote worker nodes on the network edge
Dir: edge
Distros: openshift-enterprise,openshift-webscale,openshift-origin
Topics:
- Name: Using remote worker node at the network edge
File: nodes-edge-remote-workers
---
Name: Logging
Dir: logging

View File

@@ -0,0 +1,24 @@
// Module included in the following assemblies:
//
// * logging/nodes-edge-remote-workers.adoc
[id="nodes-edge-remote-workers-network_{context}"]
= Network separation with remote worker nodes
All nodes send heartbeats to the Kubernetes Controller Manager Operator (kube controller) in the {product-title} cluster every 10 seconds. If the controller manager cannot reach a remote node because of network issues, {product-title} responds using several default mechanisms.
{product-title} is designed to be resilient to network partitions and other disruptions. You can mitigate some of the more common disruptions, such as interruptions from software upgrades, network splits, and routing issues. Mitigation strategies include ensuring that pods on remote worker nodes request the correct amount of CPU and memory resources, configuring an appropriate replication policy, using redundancy across zones, and using Pod Disruption Budgets on workloads.
If the kube controller cannot reach a node after a configured period, the node controller on the control plane updates the node health to `Unhealthy` and marks the node `Ready` condition as `Unknown`. In response, the scheduler stops scheduling pods to that node. The on-premise node controller adds a `node.kubernetes.io/unreachable` taint with a `NoExecute` effect to the node and schedules pods on the node for eviction after five minutes, by default.
On that node, the kubelet adds a `node.kubernetes.io/unreachable` taint with a `NoExecute` effect to the node and begins to evict pods on the node after five minutes, by default.
If a workload controller, such as a Deployment or StatefulSet, is directing traffic to pods on the unhealthy node and other nodes can reach the cluster, {product-title} routes the traffic away from the pods on the node. Nodes that cannot reach the cluster do not get updated with the new traffic routing. As a result, the workloads on those nodes might continue to attempt to reach the unhealthy node.
You can mitigate the effects of connection loss by:
* using DaemonSets to create pods that tolerate the taints
* using static pods that automatically restart if a node goes down
* using Kubernetes zones to control pod eviction
* configuring pod tolerations to delay or avoid pod eviction
* configuring the kubelet to control the timing of when it marks nodes as unhealthy.

View File

@@ -0,0 +1,27 @@
// Module included in the following assemblies:
//
// * logging/nodes-edge-remote-workers.adoc
[id="nodes-edge-remote-workers-power_{context}"]
= Power loss on remote worker nodes
If a remote worker node loses power or restarts ungracefully, {product-title} responds using several default mechanisms.
If the Kubernetes Controller Manager Operator (kube controller) cannot reach a node after a configured period, the control plane updates the node health to `Unhealthy` and marks the node `Ready` condition as `Unknown`. In response, the scheduler stops scheduling pods to that node. The on-premise node controller adds a `node.kubernetes.io/unreachable` taint with a `NoExecute` effect to the node and schedules pods on the node for eviction after five minutes, by default.
On the node, the pods must be restarted when the node recovers power and reconnects with the control plane.
[NOTE]
====
If you want the pods to restart immediately upon restart, use static pods.
====
After the node restarts, the kubelet also restarts and attempts to restart the pods that were scheduled on the node. If the connection to the control plane takes longer than the default five minutes, the control plane cannot update the node health and remove the `node.kubernetes.io/unreachable` taint. On the node, the kubelet terminates any running pods. When these conditions are cleared, the scheduler can start scheduling pods to that node.
You can mitigate the effects of power loss by:
* using DaemonSets to create pods that tolertate the taints
* using static pods that automatically restart with a node
* configuring pods tolerations to delay or avoid pod eviction
* configuring the kubelet to control the timing of when the node controller marks nodes as unhealthy.

View File

@@ -0,0 +1,161 @@
// Module included in the following assemblies:
//
// * logging/nodes-edge-remote-workers.adoc
[id="nodes-edge-remote-workers-strategies_{context}"]
= Remote worker node strategies
If you use remote worker nodes, consider which objects to use to run your applications.
It is recommend using DaemonSets or static pods based on the behavior you want in the event of network issues or power loss. In addition, you can use Kubernetes zones and tolerations to control or avoid pod evictions if the control plane cannot reach remote worker nodes.
[id="nodes-edge-remote-workers-strategies-daemonsets_{context}"]
DaemonSets::
DaemonSets are the best approach to managing pods on remote worker nodes for the following reasons:
--
* DaemonSets do not typically need rescheduling behavior. If a Node disconnects from the cluster, pods on the node can continue to run. {product-title} does not change the state of DaemonSet pods, and leaves the pods in the state they last reported. For example, if a DaemonSet pod is in the `Running` state, when a node stops communicating, the pod keeps running and is assumed to be running by {product-title}.
* DaemonSet pods, by default, are created with `NoExecute` tolerations for the `node.kubernetes.io/unreachable` and `node.kubernetes.io/not-ready` taints with no `tolerationSeconds` value. These default values ensure that DaemonSet pods are never evicted if the control plane cannot reach a node. For example:
+
.Tolerations added to DaemonSet pods by default
[source,yaml]
----
tolerations:
- key: node.kubernetes.io/not-ready
operator: Exists
effect: NoExecute
- key: node.kubernetes.io/unreachable
operator: Exists
effect: NoExecute
- key: node.kubernetes.io/disk-pressure
operator: Exists
effect: NoSchedule
- key: node.kubernetes.io/memory-pressure
operator: Exists
effect: NoSchedule
- key: node.kubernetes.io/pid-pressure
operator: Exists
effect: NoSchedule
- key: node.kubernetes.io/unschedulable
operator: Exists
effect: NoSchedule
----
* DaemonSets can use labels to ensure that a workload runs on a matching worker node.
* You can use an {product-title} service endpoint to load balance DaemonSet pods.
[NOTE]
====
DaemonSets do not schedule pods after a reboot of the node if {product-title} cannot reach the node.
====
--
[id="nodes-edge-remote-workers-strategies-static_{context}"]
Static pods::
If you want pods restart if a node reboots, after a power loss for example, consider link:https://kubernetes.io/docs/tasks/configure-pod-container/static-pod/[static pods]. The kubelet on a node automatically restarts static pods as node restarts.
[NOTE]
====
Static pods cannot use secrets and ConfigMaps.
====
[id="nodes-edge-remote-workers-strategies-zones_{context}"]
Kubernetes zones::
link:https://kubernetes.io/docs/setup/best-practices/multiple-zones/[Kubernetes zones] can slow down the rate or, in some cases, completely stop pod evictions.
When the control plane cannot reach a node, the node controller, by default, applies `node.kubernetes.io/unreachable` taints and evicts pods at a rate of 0.1 nodes per second. However, in a cluster that uses Kubernetes zones, pod eviction behavior is altered.
If a zone is fully disrupted, where all nodes in the zone have a `Ready` condition that is `False` or `Unknown`, the control plane does not apply the taint to the nodes in that zone.
For partially disrupted zones, where more than 55% of the nodes have a `False` or `Unknown` condition, the pod eviction rate is reduced to 0.01 nodes per second. Nodes in smaller clusters, with fewer than 50 nodes, are not tainted. Your cluster must have more than three zones for these behavior to take effect.
You assign a node to a specific zone by applying the `topology.kubernetes.io/region` label in the node specification.
.Sample node labels for Kubernetes zones
[source,yaml]
----
kind: Node
apiVersion: v1
metadata:
labels:
topology.kubernetes.io/region=east
----
[id="nodes-edge-remote-workers-strategies-kubeconfig_{context}"]
KubeletConfig objects::
--
You can adjust the amount of time that the Kubernetes Controller Manager Operator (controller manager) checks the state of each node.
To set the interval that affects the timing of when the on-premise node controller marks nodes with the `Unhealthy` or `Unreachable` condition, create a KubeletConfig object that contains the `node-status-update-frequency` parameter:
.Example KubeletConfig
[source,yaml]
----
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
name: disable-cpu-units
spec:
machineConfigPoolSelector:
matchLabels:
machineconfiguration.openshift.io/role: worker
kubeletConfig:
node-status-update-frequency: <1>
- "5s"
----
<1> Specify the amount of time that the controller manager checks the state of each node that is associated with this MachineConfig. The default is `5s`.
This parameter works with the `node-monitor-grace-period` and the `pod-eviction-timeout` parameters, which are not configurable.
* The `node-monitor-grace-period` parameter specifies how long the {product-title} waits after a node associated with this MachineConfig is marked `Unhealthy` if the controller manager cannot reach the node. Workloads on the node continue to run after this time. If the remote worker node rejoins the cluster after the `node-monitor-grace-period` expires, pods continue to run. New pods can be scheduled to that node. The `node-monitor-grace-period` interval is `40s`.
* The `pod-eviction-timeout` parameter specifies the amount of time {product-title} waits after marking a node that is associated with this MachineConfig as `Unreachable` to start marking pods for eviction. Evicted pods are rescheduled on other nodes. If the remote worker node rejoins the cluster after `pod-eviction-timeout` expires, the pods running on the remote worker node are terminated because the node controller has evicted the pods on-premise. Pods can then be rescheduled to that node. The `pod-eviction-timeout` period is `5m0s`.
--
[id="nodes-edge-remote-workers-strategies-tolerations_{context}"]
Tolerations::
You can use pod tolerations to mitigate the effects if the on-premise node controller adds a `node.kubernetes.io/unreachable` taint with a `NoExecute` effect to a node it cannot reach.
A taint with the `NoExecute` effect affects pods that are running on the node in the following ways:
* Pods that do not tolerate the taint are queued for eviction.
* Pods that tolerate the taint without specifying a `tolerationSeconds` value in their toleration specification remain bound forever.
* Pods that tolerate the taint with a specified `tolerationSeconds` value remain bound for the specified amount of time. After the time elapses, the pods are queued for eviction.
You can delay or avoid pod eviction by configuring pods tolerations with the `NoExecute` effect for the `node.kubernetes.io/unreachable` and `node.kubernetes.io/not-ready` taints.
.Example toleration in a pod spec
[source,yaml]
----
...
tolerations:
- key: "node.kubernetes.io/unreachable"
operator: "Exists"
effect: "NoExecute" <1>
tolerationSeconds: 0
- key: "node.kubernetes.io/not-ready"
operator: "Exists"
effect: "NoExecute" <2>
tolerationSeconds: 0
...
----
<1> The `NoSchedule` effect with `tolerationSeconds`: 0 allows pods to remain if the control plane cannot reach the node.
<2> The `NoSchedule` effect with `tolerationSeconds`: 0 allows pods to remain if the control plane marks the node as `Unhealthy`.
{product-title} uses the `tolerationSeconds` value after the `pod-eviction-timeout` value elapses.
Other types of {product-title} objects::
You can use ReplicaSets, Deployments, and ReplicationControllers. The scheduler can reschedule these pods onto other nodes after the node is disconnected for five minutes. Rescheduling onto other nodes can be beneficial for some workloads, such as REST APIs, where an administrator can guarantee a specific number of pods are running and accessible.
[NOTE]
====
When working with remote worker nodes, rescheduling pods on different nodes might not be acceptable if remote worker nodes are intended to be reserved for specific functions.
====
[id="nodes-edge-remote-workers-strategies-statefulset_{context}"]
https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/[StatefulSets] do not get restarted when there is an outage. The pods remain in the `terminating` state until the control plane can acknowledge that the pods are terminated.
To avoid scheduling a to a node that does not have access to the same type of persistent storage, {product-title} cannot migrate pods that require persistent volumes to other zones in the case of network separation.

View File

@@ -0,0 +1,57 @@
:context: nodes-edge-remote-workers
[id="nodes-edge-remote-workers"]
= Using remote worker nodes at the network edge
include::modules/common-attributes.adoc[]
toc::[]
You can configure {product-title} clusters with nodes located at your network edge. In this topic, they are called _remote worker nodes_. A typical cluster with remote worker nodes combines on-premise master and worker nodes with worker nodes in other locations that connect to the cluster. This topic is intended to provide guidance on best practices for using remote worker nodes and does not contain specific configuration details.
There are multiple use cases across different industries, such as telecommunications, retail, manufacturing, and government, for using a deployment pattern with remote worker nodes. For example, you can separate and isolate your projects and workloads by combining the remote worker nodes into xref:../../nodes/edge/nodes-edge-remote-workers.adoc#nodes-edge-remote-workers-strategies-zones_nodes-edge-remote-workers[Kubernetes zones].
However, having remote worker nodes can introduce higher latency, intermittent loss of network connectivity, and other issues. Among the challenges in a cluster with remote worker node are:
* *Network separation*: The {product-title} control plane and the remote worker nodes must be able communicate with each other. Because of the distance between the control plane and the remote worker nodes, network issues could prevent this communication. See xref:../../nodes/edge/nodes-edge-remote-workers.adoc#nodes-edge-remote-workers-network_nodes-edge-remote-workers[Network separation with remote worker nodes] for information on how {product-title} responds to network separation and for methods to diminish the impact to your cluster.
* *Power outage*: Because the control plane and remote worker nodes are in separate locations, a power outage at the remote location or at any point between the two can negatively impact your cluster. See xref:../../nodes/edge/nodes-edge-remote-workers.adoc#nodes-edge-remote-workers-power_nodes-edge-remote-workers[Power loss on remote worker nodes] for information on how {product-title} responds to a node losing power and for methods to diminish the impact to your cluster.
* *Latency spikes or temporary reduction in throughput*: As with any network, any changes in network conditions between your cluster and the remote worker nodes can negatively impact your cluster. These types of situations are beyond the scope of this documentation.
Note the following limitations when planning a cluster with remote worker nodes:
* Remote worker nodes are supported on only Bare Metal clusters with user-provisioned infrastructure.
* {product-title} does not support remote worker nodes that use a different cloud provider than the on-premise cluster uses.
* Moving workloads from one Kubernetes zone to a different Kubernetes zone can be problematic due to system and environment issues, such as a specific type of memory not being available in a different zone.
* Proxies and firewalls can present additional limitations that are beyond the scope of this document. Refer to the relevant {product-title} documentation for how to address such limitations, such as xref:../../installing/install_config/configuring-firewall.adoc#configuring-firewall[Configuring your firewall].
* You are responsible for configuring and maintaining L2/L3-level network connectivity between the control plane and the network-edge nodes.
include::modules/nodes-edge-remote-workers-network.adoc[leveloffset=+1]
For more information on using these objects in a cluster with remote worker nodes, see xref:../../nodes/edge/nodes-edge-remote-workers.html#nodes-edge-remote-workers-strategies_nodes-edge-remote-workers[About remote worker node strategies].
include::modules/nodes-edge-remote-workers-power.adoc[leveloffset=+1]
For more information on using these objects in a cluster with remote worker nodes, see xref:../../nodes/edge/nodes-edge-remote-workers.html#nodes-edge-remote-workers-strategies_nodes-edge-remote-workers[About remote worker node strategies].
include::modules/nodes-edge-remote-workers-strategies.adoc[leveloffset=+1]
.Additional resources
* For more information on Daemonesets, see xref:../../nodes/jobs/nodes-pods-daemonsets.adoc#nodes-pods-daemonsets[DaemonSets].
* For more information on taints and tolerations, see xref:../../nodes/scheduling/nodes-scheduler-taints-tolerations.adoc#nodes-scheduler-taints-tolerations-about_nodes-scheduler-taints-tolerations[Controlling pod placement using node taints].
* For more information on configuring KubeletConfig objects, see xref:../../post_installation_configuration/node-tasks.adoc#create-a-kubeletconfig-crd-to-edit-kubelet-parameters_post-install-node-tasks[Creating a KubeletConfig CRD].
* For more information on ReplicaSets, see xref:../../applications/deployments/what-deployments-are.html#deployments-repliasets_what-deployments-are[ReplicaSets].
* For more information on Deployments, see xref:../../applications/deployments/what-deployments-are.html#deployments-kube-deployments_what-deployments-are[Deployments].
* For more information on ReplicationControllers, see xref:../../applications/deployments/what-deployments-are.html#deployments-replicationcontrollers_what-deployments-are[ReplicationControllers].
* For more information on the controller manager, see xref:../../operators/operator-reference.html#kube-controller-manager-operator_red-hat-operators[Kubernetes Controller Manager Operator].