the initialDelaySeconds behavior does not match the documentation

2026-02-05 12:46:18 +01:00 · 2020-11-04 22:41:44 -05:00
parent 43ec374e34
commit 7d7d1f0ef3
4 changed files with 266 additions and 216 deletions
--- a/applications/application-health.adoc
+++ b/applications/application-health.adoc
@@ -1,6 +1,6 @@
 :context: application-health
 [id="application-health"]
-= Monitoring application health
+= Monitoring application health by using health checks
 include::modules/common-attributes.adoc[]

 toc::[]
--- a/modules/application-health-about.adoc
+++ b/modules/application-health-about.adoc
@@ -5,125 +5,214 @@
 [id="application-health-about_{context}"]
 = Understanding health checks

-A probe is a Kubernetes action that periodically performs diagnostics on a
-running container. Currently, two types of probes exist, each serving a
-different purpose.
+A health check periodically performs diagnostics on a
+running container using any combination of the readiness, liveness, and startup health checks.

-Readiness Probe::
-A Readiness check determines if the container in which it is scheduled is ready to service requests. If
-the readiness probe fails a container, the endpoints controller ensures the
-container has its IP address removed from the endpoints of all services. A
-readiness probe can be used to signal to the endpoints controller that even
-though a container is running, it should not receive any traffic from a proxy.
+You can include one or more probes in the specification for the pod that contains the container which you want to perform the health checks.

-For example, a Readiness check can control which pods are used. When a pod is not ready,
-it is removed.
+[NOTE]
+====
+If you want to add or edit health checks in an existing pod, you must edit the pod deployment configuration or use the *Developer* perspective in the web console. You cannot use the CLI to add or edit health checks for an existing pod.
+====

-Liveness Probe::
-A Liveness checks determines if the container in which it is scheduled is still
-running. If the liveness probe fails due to a condition such as a deadlock, the kubelet kills the container The container then
+Readiness probe::
+A _readiness probe_ determines if a container is ready to accept service requests. If
+the readiness probe fails for a container, the kubelet removes the pod from the list of available service endpoints.
+
+After a failure, the probe continues to examine the pod. If the pod becomes available, the kubelet adds the pod to the list of available service endpoints.
+
+Liveness health check::
+A _liveness probe_ determines if a container is still
+running. If the liveness probe fails due to a condition such as a deadlock, the kubelet kills the container. The pod then
 responds based on its restart policy.
+
+For example, a liveness probe on a pod with a `restartPolicy` of `Always` or `OnFailure`
+kills and restarts the container.

-For example, a liveness probe on a node with a `restartPolicy` of `Always` or `OnFailure`
-kills and restarts the Container on the node.
+Startup probe::
+A _startup probe_ indicates whether the application within a container is started. All other probes are disabled until the startup succeeds. If the startup probe does not succeed within a specified time period, the kubelet kills the container, and the container is subject to the pod `restartPolicy`.
+
+Some applications can require additional start-up time on their first initialization. You can use a startup probe with a liveness or readiness probe to delay that probe long enough to handle lengthy start-up time using the `failureThreshold` and `periodSeconds` parameters.
+
+For example, you can add a startup probe, with a `failureThreshold` of 30 seconds and a `periodSeconds` of 10 seconds (30 * 10 = 300s) for a maximum of 5 minutes, to a liveness probe. After the startup probe succeeds the first time, the liveness probe takes over. 

-.Sample Liveness Check
+You can configure liveness, readiness, and startup probes with any of the following types of tests:
+
+* HTTP `GET`: When using an HTTP `GET` test, the test determines the healthiness of the container by using a web hook. The test is successful if the HTTP response code is between `200` and `399`.
+
+You can use an HTTP `GET` test with applications that return HTTP status codes when completely initialized.
+
+* Container Command: When using a container command test, the probe executes a command inside the container. The probe is successful if the test exits with a `0` status.
+
+* TCP socket: When using a TCP socket test, the probe attempts to open a socket to the container. The container is only
+considered healthy if the probe can establish a connection. You can use a TCP socket test with applications that do not start listening until
+initialization is complete.
+
+You can configure several fields to control the behavior of a probe:
+
+* `initialDelaySeconds`: The time, in seconds, after the container starts before the probe can be scheduled. The defaults is `0`.
+* `periodSeconds`: The delay, in seconds, between performing probes. The default is `10`.
+* `timeoutSeconds`: The number of seconds of inactivity after which the probe times out and the container is assumed to have failed. The defaults is `1`.
+* `successThreshold`: The number of times that the probe must report success after a failure in order to reset the container status to successful. The value must be `1` for a liveness probe. The default is `1`. 
+* `failureThreshold`: The number of times that the probe is allowed to fail. The default is 3. After the specified attempts: 
+** for a liveness probe, the container is restarted 
+** for a readiness probe, the pod is marked `Unready`
+** for a startup probe, the container is killed and is subject to the pod's `restartPolicy` 
+
+[NOTE]
+====
+The `timeoutSeconds` parameter has no effect on the readiness and liveness
+probes for container command probes, as {product-title} cannot time out on an exec call into
+the container. One way to implement a timeout in a  container command probe is by using the `exec-timeout` command to run your
+liveness or readiness probes, as shown in the examples.
+====
+
+[discrete]
+[id="application-health-examples"]
+== Example probes
+
+The following are samples of different probes as they would appear in an object specification.  
+
+.Sample readiness probe with a container command readiness probe in a pod spec
 [source,yaml]
 ----
 apiVersion: v1
 kind: Pod
 metadata:
  labels:
-    test: liveness
-  name: liveness-http
+    test: health-check
+  name: my-application
+...
 spec:
  containers:
-  - name: liveness-http
-    image: k8s.gcr.io/liveness <1>
+  - name: goproxy-app <1>
    args:
-    - /server
-    livenessProbe: <2>
-      httpGet:   <3>
-        # host: my-host
-        # scheme: HTTPS
+    image: k8s.gcr.io/goproxy:0.1 <2>
+    readinessProbe: <3>
+      exec: <4>
+        command: <5>
+        - cat
+        - /tmp/healthy
+...
+----
+
+<1> The container name.
+<2> The container image to deploy.
+<3> A readiness probe.
+<4> A container command test.
+<5> The commands to execute on the container.
+
+.Sample container command startup probe and liveness probe with container command tests in a pod spec
+[source,yaml]
+----
+apiVersion: v1
+kind: Pod
+metadata:
+  labels:
+    test: health-check
+  name: my-application
+...
+spec:
+  containers:
+  - name: goproxy-app <1>
+    args:
+    image: k8s.gcr.io/goproxy:0.1 <2>
+    livenessProbe: <3>
+      httpGet: <4>
+        scheme: HTTPS <5>
        path: /healthz
-        port: 8080
+        port: 8080 <6>
        httpHeaders:
        - name: X-Custom-Header
          value: Awesome
-      initialDelaySeconds: 15  <4>
-      timeoutSeconds: 1   <5>
-    name: liveness   <6>
----
-<1> Specifies the image to use for the liveness probe.
-<2> Specifies the type of heath check.
-<3> Specifies the type of Liveness check:
-* HTTP Checks. Specify `httpGet`.
-* Container Execution Checks. Specify `exec`.
-* TCP Socket Check. Specify `tcpSocket`.
-<4> Specifies the number of seconds before performing the first probe after the container starts.
-<5> Specifies the number of seconds between probes.
-
-
-.Sample Liveness check output wth unhealthy container
-[source,terminal]
----
-$ oc describe pod pod1
+    startupProbe: <7>
+      httpGet: <8>
+        path: /healthz
+        port: 8080 <9>
+   failureThreshold: 30 <10>
+   periodSeconds: 10 <11>
+...
 ----

-.Example output
-[source,terminal]
----
-....
+<1> The container name.
+<2> Specify the container image to deploy.
+<3> A liveness probe.
+<4> An HTTP `GET` test.
+<5> The internet scheme: `HTTP` or `HTTPS`. The default value is `HTTP`.
+<6> The port on which the container is listening.
+<7> A startup probe.
+<8> An HTTP `GET` test.
+<9> The port on which the container is listening.
+<10> The number of times to try the probe after a failure.
+<11> The number of seconds to perform the probe.

-FirstSeen LastSeen    Count   From            SubobjectPath           Type        Reason      Message
--------- --------    -----   ----            -------------           --------    ------      -------
-37s       37s     1   {default-scheduler }                            Normal      Scheduled   Successfully assigned liveness-exec to worker0
-36s       36s     1   {kubelet worker0}   spec.containers{liveness}   Normal      Pulling     pulling image "k8s.gcr.io/busybox"
-36s       36s     1   {kubelet worker0}   spec.containers{liveness}   Normal      Pulled      Successfully pulled image "k8s.gcr.io/busybox"
-36s       36s     1   {kubelet worker0}   spec.containers{liveness}   Normal      Created     Created container with docker id 86849c15382e; Security:[seccomp=unconfined]
-36s       36s     1   {kubelet worker0}   spec.containers{liveness}   Normal      Started     Started container with docker id 86849c15382e
-2s        2s      1   {kubelet worker0}   spec.containers{liveness}   Warning     Unhealthy   Liveness probe failed: cat: can't open '/tmp/healthy': No such file or directory
+.Sample liveness probe with a container command test that uses a timeout in a pod spec
+[source,yaml]
+----
+apiVersion: v1
+kind: Pod
+metadata:
+  labels:
+    test: health-check
+  name: my-application
+...
+spec:
+  containers:
+  - name: goproxy-app <1>
+    args:
+    image: k8s.gcr.io/goproxy:0.1 <2>
+    livenessProbe: <3>
+      exec: <4>
+        command: <5>
+        - /bin/bash
+        - '-c'
+        - timeout 60 /opt/eap/bin/livenessProbe.sh 
+      periodSeconds: 10 <6>
+      successThreshold: 1 <7>
+      failureThreshold: 3 <8>
+...
 ----

-Startup Probe::
-Legacy applications can require additional startup time on their first initialization. This situation can make it difficult to set up liveness probe parameters without compromising the fast response to deadlocks that a liveness probe provides. To prevent containers from starting slowly, configure a startup probe using the same command, HTTP or TCP check, with a `failureThreshold * periodSeconds` value long enough to handle the worst case startup time.
+<1> The container name.
+<2> Specify the container image to deploy.
+<3> The liveness probe.
+<4> The type of probe, here a container command probe.
+<5> The command line to execute inside the container.
+<6> How often in seconds to perform the probe. 
+<7> The number of number of consecutive successes needed to show success after a failure.
+<8> The number of times to try the probe after a failure.

-For example, add the startup probe to the to the previous liveness check sample using a maximum of 5 minutes (30 * 10 = 300s). After the startup probe succeeds the first time, the liveness probe takes over to provide a fast response to container deadlocks. If the startup probe never succeeds within the specified time period, the container is killed and is subject to the pod's `restartPolicy`.
-
-
-.Sample Startup Check
-[source, yaml]
+.Sample readiness probe and liveness probe with a TCP socket test in a deployment
+[source,yaml]
 ----
- startupProbe:
-   httpGet:
-     path: /healthz
-     port: liveness-port <1>
-   failureThreshold: 30 <2>
-   periodSeconds: 10 <3>
+kind: Deployment
+apiVersion: apps/v1
+...
+spec:
+...
+  template:
+    spec:
+      containers:
+        - resources: {}
+          readinessProbe: <1>
+            tcpSocket:
+              port: 8080
+            timeoutSeconds: 1
+            periodSeconds: 10
+            successThreshold: 1
+            failureThreshold: 3
+          terminationMessagePath: /dev/termination-log
+          name: ruby-ex
+          livenessProbe: <2>
+            tcpSocket:
+              port: 8080
+            initialDelaySeconds: 15
+            timeoutSeconds: 1
+            periodSeconds: 10
+            successThreshold: 1
+            failureThreshold: 3
+...
 ----
-<1> Specifies the liveness port number.
-<2> Specifies the maximum number of startup attempts before the container is killed.
-<3> Specifies the number of seconds the application has to attempt starting.
-
-
-[id="application-health-about_types_{context}"]
-== Understanding the types of health checks
-
-Liveness checks and Readiness checks can be configured in three ways:
-
-HTTP Checks::
-The kubelet uses a web hook to determine the healthiness of the container. The
-check is deemed successful if the HTTP response code is between 200 and 399.
-
-A HTTP check is ideal for applications that return HTTP status codes
-when completely initialized.
-
-Container Execution Checks::
-The kubelet executes a command inside the container. Exiting the check with
-status 0 is considered a success.
-
-TCP Socket Checks::
-The kubelet attempts to open a socket to the container. The container is only
-considered healthy if the check can establish a connection. A TCP socket check is ideal for applications that do not start listening until
-initialization is complete.
+<1> The readiness probe.
+<2> The liveness probe.
+  
--- a/modules/application-health-configuring.adoc
+++ b/modules/application-health-configuring.adoc
@@ -5,15 +5,18 @@
 [id="application-health-configuring_{context}"]
 = Configuring health checks using the CLI

-To configure health checks, create a pod for each type of check you want.
+To configure readiness, liveness, and startup probes, add one or more probes to the specification for the pod that contains the container which you want to perform the health checks
+
+[NOTE]
+====
+If you want to add or edit health checks in an existing pod, you must edit the pod deployment configuration or use the *Developer* perspective in the web console. You cannot use the CLI to add or edit health checks for an existing pod.
+====

 .Procedure

-To create health checks:
+To add probes for a container:

-. Create a Liveness Container Execution Check:
-
-.. Create a YAML file similar to the following:
+. Create a `Pod` object to add one or more probes:
 +
 [source,yaml]
 ----
@@ -21,29 +24,67 @@ apiVersion: v1
 kind: Pod
 metadata:
  labels:
-    test: liveness
-  name: liveness-exec
+    test: health-check
+  name: my-application
 spec:
  containers:
-  - args:
-    image: k8s.gcr.io/liveness
-    livenessProbe:
-      exec:  <1>
-        command: <2>
+  - name: my-container <1>
+    args:
+    image: k8s.gcr.io/goproxy:0.1 <2>
+    livenessProbe: <3>
+      tcpSocket:  <4>
+        port: 8080 <5>
+      initialDelaySeconds: 15 <6>
+      timeoutSeconds: 1 <7>
+    readinessProbe: <8>
+      httpGet: <9>
+        host: my-host <10>
+        scheme: HTTPS <11>
+        path: /healthz
+        port: 8080 <12>
+    startupProbe: <13>
+      exec: <14>
+        command: <15>
        - cat
-        - /tmp/health
-      initialDelaySeconds: 15 <3>
-...
+        - /tmp/healthy
+      failureThreshold: 30 <16>
+      periodSeconds: 10 <17>
 ----
-<1> Specify a Liveness check and the type of Liveness check.
-<2> Specify the commands to use in the container.
-<3> Specify the number of seconds before performing the first probe after the container starts.
+<1> Specify the container name.
+<2> Specify the container image to deploy.
+<3> Optional: Create a Liveness probe.
+<4> Specify a test to perform, here a TCP Socket test.
+<5> Specify the port on which the container is listening.
+<6> Specify the number of seconds before performing the first probe after the container starts.
+<7> Specify the number of seconds between probes.
+<8> Optional: Create a Readiness probe.
+<9> Specify the type of test to perform, here an HTTP test.
+<10> Specify a host IP address. When `host` is not defined, the `PodIP` is used.
+<11> Specify `HTTP` or `HTTPS`. When `scheme` is not defined, the `HTTP` scheme is used.
+<12> Specify the port on which the container is listening.
+<13> Optional: Create a Startup probe.
+<14> Specify the type of test to perform, here an Container Execution probe.
+<15> Specify the commands to execute on the container.
+<16> Specify the number of times to try the probe after a failure.
+<17> Specify the number of seconds to perform the probe.
+
+[NOTE]
+====
+If the `initialDelaySeconds` value is lower than the `periodSeconds` value, the first Readiness probe occurs at some point between the two periods due to an issue with timers.
+====

-.. Verify the state of the health check pod:
+. Create the `Pod` object:
 +
 [source,terminal]
 ----
-$ oc describe pod liveness-exec
+$ oc create -f <file-name>.yaml
+----
+
+. Verify the state of the health check pod:
+
+[source,terminal]
+----
+$ oc describe pod health-check
 ----
 +
 .Example output
@@ -59,111 +100,31 @@ Events:
  Normal  Started    1s    kubelet, ip-10-0-143-40.ec2.internal  Started container
 ----
 +
-[NOTE]
-====
-The `timeoutSeconds` parameter has no effect on the Readiness and Liveness
-probes for Container Execution Checks. You can implement a timeout
-inside the probe itself, as {product-title} cannot time out on an exec call into
-the container. One way to implement a timeout in a probe is by using the `timeout` parameter to run your
-liveness or readiness probe:
-
-[source,yaml]
----
-spec:
-  containers:
-    livenessProbe:
-      exec:
-        command:
-          - /bin/bash
-          - '-c'
-          - timeout 60 /opt/eap/bin/livenessProbe.sh <1>
-      timeoutSeconds: 1
-      periodSeconds: 10
-      successThreshold: 1
-      failureThreshold: 3
----
-
-<1> Timeout value and path to the probe script.
-====
-
-.. Create the check:
+The following is the output of a failed probe that restarted a container:
 +
+.Sample Liveness check output with unhealthy container
 [source,terminal]
 ----
-$ oc create -f <file-name>.yaml
+$ oc describe pod pod1
 ----
-
-. Create a Liveness TCP Socket Check:
-
-.. Create a YAML file similar to the following:
-+
-[source,yaml]
----
-apiVersion: v1
-kind: Pod
-metadata:
-  labels:
-    test: liveness
-  name: liveness-tcp
-spec:
-  containers:
-  - name: contaier1 <1>
-    image: k8s.gcr.io/liveness
-    ports:
-    - containerPort: 8080 <1>
-    livenessProbe:  <2>
-      tcpSocket:
-        port: 8080
-      initialDelaySeconds: 15 <3>
-      timeoutSeconds: 1  <4>
----
-<1> Specify the container name and port for the check to connect to.
-<2> Specify the Liveness heath check and the type of Liveness check.
-<3> Specify the number of seconds before performing the first probe after the container starts.
-<4> Specify the number of seconds between probes.
-
-.. Create the check:
 +
+.Example output
 [source,terminal]
 ----
-$ oc create -f <file-name>.yaml
+....
+
+Events:
+  Type     Reason          Age                From                                               Message
+  ----     ------          ----               ----                                               -------
+  Normal   Scheduled       <unknown>                                                             Successfully assigned aaa/liveness-http to ci-ln-37hz77b-f76d1-wdpjv-worker-b-snzrj
+  Normal   AddedInterface  47s                multus                                             Add eth0 [10.129.2.11/23]
+  Normal   Pulled          46s                kubelet, ci-ln-37hz77b-f76d1-wdpjv-worker-b-snzrj  Successfully pulled image "k8s.gcr.io/liveness" in 773.406244ms
+  Normal   Pulled          28s                kubelet, ci-ln-37hz77b-f76d1-wdpjv-worker-b-snzrj  Successfully pulled image "k8s.gcr.io/liveness" in 233.328564ms
+  Normal   Created         10s (x3 over 46s)  kubelet, ci-ln-37hz77b-f76d1-wdpjv-worker-b-snzrj  Created container liveness
+  Normal   Started         10s (x3 over 46s)  kubelet, ci-ln-37hz77b-f76d1-wdpjv-worker-b-snzrj  Started container liveness
+  Warning  Unhealthy       10s (x6 over 34s)  kubelet, ci-ln-37hz77b-f76d1-wdpjv-worker-b-snzrj  Liveness probe failed: HTTP probe failed with statuscode: 500
+  Normal   Killing         10s (x2 over 28s)  kubelet, ci-ln-37hz77b-f76d1-wdpjv-worker-b-snzrj  Container liveness failed liveness probe, will be restarted
+  Normal   Pulling         10s (x3 over 47s)  kubelet, ci-ln-37hz77b-f76d1-wdpjv-worker-b-snzrj  Pulling image "k8s.gcr.io/liveness"
+  Normal   Pulled          10s                kubelet, ci-ln-37hz77b-f76d1-wdpjv-worker-b-snzrj  Successfully pulled image "k8s.gcr.io/liveness" in 244.116568ms
 ----

-. Create an Readiness HTTP Check:
-
-.. Create a YAML file similar to the following:
-+
-[source,yaml]
----
-apiVersion: v1
-kind: Pod
-metadata:
-  labels:
-    test: readiness
-  name: readiness-http
-spec:
-  containers:
-  - args:
-    image: k8s.gcr.io/readiness <1>
-    readinessProbe: <2>
-    httpGet:
-    # host: my-host <3>
-    # scheme: HTTPS <4>
-      path: /healthz
-      port: 8080
-    initialDelaySeconds: 15  <5>
-    timeoutSeconds: 1  <6>
----
-<1> Specify the image to use for the liveness probe.
-<2> Specify the Readiness heath check and the type of Readiness check.
-<3> Specify a host IP address. When `host` is not defined, the `PodIP` is used.
-<4> Specify `HTTP` or `HTTPS`. When `scheme` is not defined, the `HTTP` scheme is used.
-<5> Specify the number of seconds before performing the first probe after the container starts.
-<6> Specify the number of seconds between probes.
-
-.. Create the check:
-+
-[source,terminal]
----
-$ oc create -f <file-name>.yaml
----
--- a/modules/cluster-logging-collector-log-forward-project.adoc
+++ b/modules/cluster-logging-collector-log-forward-project.adoc
@@ -63,7 +63,7 @@ spec:
 <7> Configuration for an input to filter application logs from the specified projects.
 <8> Configuration for a pipeline to use the input to send project application logs to an external Fluentd instance.
 <9> The `my-app-logs` input.
-<10 The name of the output to use.
+<10> The name of the output to use.
 <11> Optional: A label to add to the logs.
 <12> Configuration for a pipeline to send logs to other log aggregators.
 ** Optional: Specify a name for the pipeline.