mirror of
https://github.com/openshift/openshift-docs.git
synced 2026-02-05 21:46:22 +01:00
CFE-254: Adds Node observability new section
This commit is contained in:
committed by
openshift-cherrypick-robot
parent
b299c4d479
commit
53413032a3
@@ -1230,7 +1230,7 @@ Topics:
|
||||
File: about-advertising-ipaddresspool
|
||||
- Name: Configuring MetalLB BGP peers
|
||||
File: metallb-configure-bgp-peers
|
||||
- Name: Advertising an IP address pool using the community alias
|
||||
- Name: Advertising an IP address pool using the community alias
|
||||
File: metallb-configure-community-alias
|
||||
- Name: Configuring MetalLB BFD profiles
|
||||
File: metallb-configure-bfd-profiles
|
||||
@@ -2047,7 +2047,7 @@ Topics:
|
||||
- Name: Enabling features using FeatureGates
|
||||
File: nodes-cluster-enabling-features
|
||||
Distros: openshift-enterprise,openshift-origin
|
||||
- Name: Improving cluster stability in high latency environments using worker latency profiles
|
||||
- Name: Improving cluster stability in high latency environments using worker latency profiles
|
||||
File: nodes-cluster-worker-latency-profiles
|
||||
Distros: openshift-enterprise,openshift-origin
|
||||
- Name: Remote worker nodes on the network edge
|
||||
@@ -2271,6 +2271,9 @@ Topics:
|
||||
- Name: Deploying distributed units at scale in a disconnected environment
|
||||
File: ztp-deploying-disconnected
|
||||
Distros: openshift-origin,openshift-enterprise
|
||||
- Name: Requesting CRI-O and Kubelet profiling data using the Node Observability Operator
|
||||
File: node-observability-operator
|
||||
Distros: openshift-origin,openshift-enterprise
|
||||
---
|
||||
Name: Specialized hardware and driver enablement
|
||||
Dir: hardware_enablement
|
||||
|
||||
97
modules/node-observability-create-custom-resource.adoc
Normal file
97
modules/node-observability-create-custom-resource.adoc
Normal file
@@ -0,0 +1,97 @@
|
||||
// Module included in the following assemblies:
|
||||
//
|
||||
// * scalability_and_performance/understanding-node-observability-operator.adoc
|
||||
|
||||
:_content-type: PROCEDURE
|
||||
[id="creating-node-observability-custom-resource_{context}"]
|
||||
= Creating the Node Observability custom resource
|
||||
|
||||
Before you run profiling queries, you must create a `NodeObservability` custom resource (CR).
|
||||
|
||||
[IMPORTANT]
|
||||
====
|
||||
Creating a `NodeObservability` CR reboots all the worker nodes. It might take 10 or more minutes to complete.
|
||||
====
|
||||
|
||||
When you apply the `NodeObservability` CR, it creates the necessary machine config and machine config pool CRs to enable the CRI-O profiling on the worker nodes.
|
||||
|
||||
[NOTE]
|
||||
====
|
||||
Kubelet profiling is enabled by default.
|
||||
====
|
||||
|
||||
The CRI-O unix socket of the node is mounted on the agent pod, which allows the agent to communicate with CRIO to run the pprof request. Similiarly, the `kubelet-serving-ca` certificate chain is mounted on the agent pod, which allows secure communication between the agent and node's kubelet endpoint.
|
||||
|
||||
.Prerequisites
|
||||
* You have installed the Node Observability Operator.
|
||||
* You have installed the OpenShift CLI (oc).
|
||||
* You have access to the cluster with `cluster-admin` privileges.
|
||||
|
||||
.Procedure
|
||||
|
||||
. Log in to the {product-title} CLI as a user with the `cluster-admin` role by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc login -u kubeadmin https://<HOSTNAME>:6443
|
||||
----
|
||||
|
||||
. Switch back to the `node-observability-operator` namespace by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc project node-observability-operator
|
||||
----
|
||||
|
||||
. Create a CR file named `nodeobservability.yaml` that contains the following text:
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: nodeobservability.olm.openshift.io/v1alpha1
|
||||
kind: NodeObservability
|
||||
metadata:
|
||||
name: cluster <1>
|
||||
spec:
|
||||
labels:
|
||||
node-role.kubernetes.io/worker: ""
|
||||
type: crio-kubelet
|
||||
----
|
||||
<1> You must specify the name as `cluster` because there should be only one `NodeObservability` CR per cluster.
|
||||
|
||||
. Run the `NodeObservability` CR:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
oc apply -f nodeobservability.yaml
|
||||
----
|
||||
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
nodeobservability.olm.openshift.io/cluster created
|
||||
----
|
||||
|
||||
. Review the status of the `NodeObservability` CR by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc get nob/cluster -o yaml | yq '.status.conditions'
|
||||
----
|
||||
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
conditions:
|
||||
conditions:
|
||||
- lastTransitionTime: "2022-07-05T07:33:54Z"
|
||||
message: 'DaemonSet node-observability-ds ready: true NodeObservabilityMachineConfig
|
||||
ready: true'
|
||||
reason: Ready
|
||||
status: "True"
|
||||
type: Ready
|
||||
----
|
||||
|
||||
+
|
||||
`NodeObservability` CR run is completed when the reason is `Ready` and the status is `True`.
|
||||
11
modules/node-observability-high-level-workflow.adoc
Normal file
11
modules/node-observability-high-level-workflow.adoc
Normal file
@@ -0,0 +1,11 @@
|
||||
// Module included in the following assemblies:
|
||||
//
|
||||
// * scalability_and_performance/understanding-node-observability-operator.adoc
|
||||
|
||||
:_content-type: CONCEPT
|
||||
[id="workflow-node-observability-operator_{context}"]
|
||||
= High level workflow of the Node Observability Operator
|
||||
|
||||
After you install the Node Observability Operator in the {product-title} cluster, you have to create a `NodeObservability` custom resource, which creates a DaemonSet to deploy a Node Observability agent on each worker node.
|
||||
|
||||
To request a profiling query, you have to create a `NodeObservabilityRun` resource that requests the deployed Node Observability agent to trigger the CRI-O and Kubelet profiling. After the profiling is completed, the Node Observability agent stores the profiling data inside the container file system `/run/node-observability` directory, which is available for query.
|
||||
119
modules/node-observability-install-cli.adoc
Normal file
119
modules/node-observability-install-cli.adoc
Normal file
@@ -0,0 +1,119 @@
|
||||
// Module included in the following assemblies:
|
||||
//
|
||||
// * scalability_and_performance/understanding-node-observability-operator.adoc
|
||||
|
||||
:_content-type: PROCEDURE
|
||||
[id="install-node-observability-using-cli_{context}"]
|
||||
= Installing the Node Observability Operator using the CLI
|
||||
|
||||
You can install the Node Observability Operator by using the OpenShift CLI (oc).
|
||||
|
||||
.Prerequisites
|
||||
|
||||
* You have installed the OpenShift CLI (oc).
|
||||
* You have access to the cluster with `cluster-admin` privileges.
|
||||
|
||||
.Procedure
|
||||
|
||||
. Confirm that the Node Observability Operator is available by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc get packagemanifests -n openshift-marketplace node-observability-operator
|
||||
----
|
||||
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
NAME CATALOG AGE
|
||||
node-observability-operator Red Hat Operators 9h
|
||||
----
|
||||
|
||||
. Create the `node-observability-operator` namespace by running the following command::
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc new-project node-observability-operator
|
||||
----
|
||||
|
||||
. Create an `OperatorGroup` object YAML file:
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
cat <<EOF | oc apply -f -
|
||||
apiVersion: operators.coreos.com/v1
|
||||
kind: OperatorGroup
|
||||
metadata:
|
||||
name: node-observability-operator
|
||||
namespace: node-observability-operator
|
||||
spec:
|
||||
targetNamespaces:
|
||||
- node-observability-operator
|
||||
EOF
|
||||
----
|
||||
|
||||
. Create a `Subscription` object YAML file to subscribe a namespace to an Operator:
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
cat <<EOF | oc apply -f -
|
||||
apiVersion: operators.coreos.com/v1alpha1
|
||||
kind: Subscription
|
||||
metadata:
|
||||
name: node-observability-operator
|
||||
namespace: node-observability-operator
|
||||
spec:
|
||||
channel: alpha
|
||||
name: node-observability-operator
|
||||
source: redhat-operators
|
||||
sourceNamespace: openshift-marketplace
|
||||
EOF
|
||||
----
|
||||
|
||||
.Verification
|
||||
|
||||
. View the install plan name by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc -n node-observability-operator get sub node-observability-operator -o yaml | yq '.status.installplan.name'
|
||||
----
|
||||
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
install-dt54w
|
||||
----
|
||||
|
||||
. Verify the install plan status by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc -n node-observability-operator get ip <install_plan_name> -o yaml | yq '.status.phase'
|
||||
----
|
||||
+
|
||||
`<install_plan_name>` is the install plan name that you obtained from the output of the previous command.
|
||||
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
COMPLETE
|
||||
----
|
||||
|
||||
. Verify that the Node Observability Operator is up and running:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc get deploy -n node-observability-operator
|
||||
----
|
||||
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
NAME READY UP-TO-DATE AVAILABLE AGE
|
||||
node-observability-operator-controller-manager 1/1 1 1 40h
|
||||
----
|
||||
31
modules/node-observability-install-web-console.adoc
Normal file
31
modules/node-observability-install-web-console.adoc
Normal file
@@ -0,0 +1,31 @@
|
||||
// Module included in the following assemblies:
|
||||
//
|
||||
// * scalability_and_performance/understanding-node-observability-operator.adoc
|
||||
|
||||
:_content-type: PROCEDURE
|
||||
[id="install-node-observability-using-web-console_{context}"]
|
||||
= Installing the Node Observability Operator using the web console
|
||||
|
||||
You can install the Node Observability Operator from the {product-title} web console.
|
||||
|
||||
.Prerequisites
|
||||
|
||||
* You have access to the cluster with `cluster-admin` privileges.
|
||||
* You have access to the {product-title} web console.
|
||||
|
||||
.Procedure
|
||||
|
||||
. Log in to the {product-title} web console.
|
||||
. In the Administrator's navigation panel, expand *Operators* → *OperatorHub*.
|
||||
. In the *All items* field, enter *Node Observability Operator* and select the *Node Observability Operator* tile.
|
||||
. Click *Install*.
|
||||
. On the *Install Operator* page, configure the following settings:
|
||||
.. In the *Update channel* area, click *alpha*.
|
||||
.. In the *Installation mode* area, click *A specific namespace on the cluster*.
|
||||
.. From the *Installed Namespace* list, select *node-observability-operator* from the list.
|
||||
.. In the *Update approval* area, select *Automatic*.
|
||||
.. Click *Install*.
|
||||
|
||||
.Verification
|
||||
. In the Administrator's navigation panel, expand *Operators* → *Installed Operators*.
|
||||
. Verify that the Node Observability Operator is listed in the Operators list.
|
||||
8
modules/node-observability-installation.adoc
Normal file
8
modules/node-observability-installation.adoc
Normal file
@@ -0,0 +1,8 @@
|
||||
// Module included in the following assemblies:
|
||||
//
|
||||
// * scalability_and_performance/understanding-node-observability-operator.adoc
|
||||
|
||||
:_content-type: CONCEPT
|
||||
[id="install-node-observability-operator_{context}"]
|
||||
= Installing the Node Observability Operator
|
||||
The Node Observability Operator is not installed in {product-title} by default. You can install the Node Observability Operator by using the {product-title} CLI or the web console.
|
||||
83
modules/node-observability-run-profiling-query.adoc
Normal file
83
modules/node-observability-run-profiling-query.adoc
Normal file
@@ -0,0 +1,83 @@
|
||||
// Module included in the following assemblies:
|
||||
//
|
||||
// * scalability_and_performance/understanding-node-observability-operator.adoc
|
||||
|
||||
:_content-type: PROCEDURE
|
||||
[id="running-profiling-query_{context}"]
|
||||
= Running profiling query
|
||||
|
||||
Profiling query is a blocking operation that fetches CRI-O and Kubelet profiling data for a duration of 30 seconds. The Node Observability Operator stores the profiling data inside the container file system `/run/node-observability` directory. To request profiling data query, you have to create a `NodeObservabilityRun` resource.
|
||||
|
||||
[IMPORTANT]
|
||||
====
|
||||
You can request only one profiling query at any point of time.
|
||||
====
|
||||
|
||||
.Prerequisites
|
||||
* You have installed the Node Observability Operator.
|
||||
* You have created the `NodeObservability` custom resource (CR).
|
||||
* You have access to the cluster with `cluster-admin` privileges.
|
||||
|
||||
.Procedure
|
||||
|
||||
. Create a `NodeObservabilityRun` resource file named `nodeobservabilityrun.yaml` that contains the following text:
|
||||
+
|
||||
[source,yaml]
|
||||
----
|
||||
apiVersion: nodeobservability.olm.openshift.io/v1alpha1
|
||||
kind: NodeObservabilityRun
|
||||
metadata:
|
||||
name: nodeobservabilityrun
|
||||
spec:
|
||||
nodeObservabilityRef:
|
||||
name: cluster
|
||||
----
|
||||
|
||||
. Run the `NodeObservabilityRun` to trigger the profiling:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc apply -f nodeobservabilityrun.yaml
|
||||
----
|
||||
|
||||
. Review the status of the `NodeObservabilityRun` by running the following command:
|
||||
+
|
||||
[source,terminal]
|
||||
----
|
||||
$ oc get nodeobservabilityrun -o yaml | yq '.status.conditions'
|
||||
----
|
||||
|
||||
+
|
||||
.Example output
|
||||
[source,terminal]
|
||||
----
|
||||
conditions:
|
||||
- lastTransitionTime: "2022-07-07T14:57:34Z"
|
||||
message: Ready to start profiling
|
||||
reason: Ready
|
||||
status: "True"
|
||||
type: Ready
|
||||
- lastTransitionTime: "2022-07-07T14:58:10Z"
|
||||
message: Profiling query done
|
||||
reason: Finished
|
||||
status: "True"
|
||||
type: Finished
|
||||
----
|
||||
|
||||
+
|
||||
Profiling query is complete when the status is `True` and type is `Finished`.
|
||||
|
||||
. Run the following bash script to retrieve the profiling data from container's `/run/node-observability` path:
|
||||
+
|
||||
[source,bash]
|
||||
----
|
||||
for a in $(oc get nodeobservabilityrun nodeobservabilityrun -o yaml | yq .status.agents[].name); do
|
||||
echo "agent ${a}"
|
||||
mkdir -p "/tmp/${a}"
|
||||
for p in $(oc exec "${a}" -c node-observability-agent -- bash -c "ls /run/node-observability/*.pprof"); do
|
||||
f="$(basename ${p})"
|
||||
echo "copying ${f} to /tmp/${a}/${f}"
|
||||
oc exec "${a}" -c node-observability-agent -- cat "${p}" > "/tmp/${a}/${f}"
|
||||
done
|
||||
done
|
||||
----
|
||||
25
scalability_and_performance/node-observability-operator.adoc
Normal file
25
scalability_and_performance/node-observability-operator.adoc
Normal file
@@ -0,0 +1,25 @@
|
||||
:_content-type: ASSEMBLY
|
||||
[id="using-node-observability-operator"]
|
||||
= Understanding the Node Observability Operator
|
||||
include::_attributes/common-attributes.adoc[]
|
||||
:context: node-observability-operator
|
||||
|
||||
toc::[]
|
||||
|
||||
|
||||
:FeatureName: The Node Observability Operator
|
||||
include::snippets/technology-preview.adoc[leveloffset=+0]
|
||||
|
||||
The Node Observability Operator collects and stores the CRI-O and Kubelet profiling data of worker nodes. You can use the profiling data to analyze the CRI-O and Kublet performance trends and debug the performance related issues.
|
||||
|
||||
include::modules/node-observability-high-level-workflow.adoc[leveloffset=+1]
|
||||
|
||||
include::modules/node-observability-installation.adoc[leveloffset=+1]
|
||||
|
||||
include::modules/node-observability-install-cli.adoc[leveloffset=+2]
|
||||
|
||||
include::modules/node-observability-install-web-console.adoc[leveloffset=+2]
|
||||
|
||||
include::modules/node-observability-create-custom-resource.adoc[leveloffset=+1]
|
||||
|
||||
include::modules/node-observability-run-profiling-query.adoc[leveloffset=+1]
|
||||
Reference in New Issue
Block a user