From 472af27b2ed9cf96a30f3c19df220b421fc7d7a7 Mon Sep 17 00:00:00 2001 From: Sara Thomas Date: Fri, 8 Dec 2023 11:30:36 -0500 Subject: [PATCH] no-1.5 integration with main no-1.5 integration with main OSDOCS-7593: Netobserv RTT OSDOCS-8465: Updates to Network Traffic Overview OSDOCS-8253: Improved LokiStack integration OSDOCS-8253: API version updates Dashboard enhancements for lokiless use OCPBUGS-22397: clarify netobserv network policy OSDOCS-9419: Adding zones to Overview Re-adding removed RTT overview info OSDOCS-8701: Update resource considerations table Network Observability API documentation updates Update to JSON flows format Network Observability 1.5 release notes no-1.5 integration with main --- _topic_maps/_topic_map.yml | 2 + .../network-observability-RTT-overview.adoc | 28 + modules/network-observability-RTT.adoc | 48 ++ ...ork-observability-SRIOV-configuration.adoc | 9 +- ...work-observability-auth-multi-tenancy.adoc | 17 - ...vability-configuring-options-overview.adoc | 21 +- ...vability-configuring-options-topology.adoc | 8 +- ...observability-disabling-health-alerts.adoc | 3 +- .../network-observability-dns-overview.adoc | 18 +- .../network-observability-dns-tracking.adoc | 19 +- .../network-observability-enriched-flows.adoc | 4 +- ...lity-flowcollector-api-specifications.adoc | 691 ++++++++++++---- ...ervability-flowcollector-kafka-config.adoc | 7 +- ...work-observability-flowcollector-view.adoc | 36 +- .../network-observability-flows-format.adoc | 742 +++++++----------- ...ork-observability-includelist-example.adoc | 42 + modules/network-observability-metrics.adoc | 55 ++ .../network-observability-multitenancy.adoc | 5 +- ...etwork-observability-operator-install.adoc | 21 +- .../network-observability-packet-drops.adoc | 6 +- ...etwork-observability-pktdrop-overview.adoc | 18 +- ...etwork-observability-rate-limit-alert.adoc | 25 +- ...network-observability-resources-table.adoc | 6 +- .../network-observability-viewing-alerts.adoc | 13 +- ...work-observability-viewing-dashboards.adoc | 29 + ...ervability-working-with-conversations.adoc | 16 +- ...work-observability-working-with-zones.adoc | 35 + ...network-observability-loki-empty-ring.adoc | 16 + .../installing-operators.adoc | 1 - .../metrics-alerts-dashboards.adoc | 17 + .../network-observability-network-policy.adoc | 2 +- ...-observability-operator-release-notes.adoc | 85 +- .../network-observability-overview.adoc | 23 +- .../observing-network-traffic.adoc | 18 +- ...troubleshooting-network-observability.adoc | 1 + 35 files changed, 1336 insertions(+), 751 deletions(-) create mode 100644 modules/network-observability-RTT-overview.adoc create mode 100644 modules/network-observability-RTT.adoc delete mode 100644 modules/network-observability-auth-multi-tenancy.adoc create mode 100644 modules/network-observability-includelist-example.adoc create mode 100644 modules/network-observability-metrics.adoc create mode 100644 modules/network-observability-viewing-dashboards.adoc create mode 100644 modules/network-observability-working-with-zones.adoc create mode 100644 modules/troubleshooting-network-observability-loki-empty-ring.adoc create mode 100644 network_observability/metrics-alerts-dashboards.adoc diff --git a/_topic_maps/_topic_map.yml b/_topic_maps/_topic_map.yml index 83ccf5af0b..c6efbae8ff 100644 --- a/_topic_maps/_topic_map.yml +++ b/_topic_maps/_topic_map.yml @@ -2858,6 +2858,8 @@ Topics: File: network-observability-network-policy - Name: Observing the network traffic File: observing-network-traffic +- Name: Using metrics with dashboards and alerts + File: metrics-alerts-dashboards - Name: Monitoring the Network Observability Operator File: network-observability-operator-monitoring - Name: API reference diff --git a/modules/network-observability-RTT-overview.adoc b/modules/network-observability-RTT-overview.adoc new file mode 100644 index 0000000000..6c42599c31 --- /dev/null +++ b/modules/network-observability-RTT-overview.adoc @@ -0,0 +1,28 @@ +// Module included in the following assemblies: +// +// network_observability/observing-network-traffic.adoc + +:_mod-docs-content-type: CONCEPT +[id="network-observability-RTT-overview_{context}"] += Round-Trip Time +You can use TCP handshake Round-Trip Time (RTT) to analyze network flows. You can use RTT captured from the `fentry/tcp_rcv_established` eBPF hookpoint to read SRTT from the TCP socket to help with the following: + + +* Network Monitoring: Gain insights into TCP handshakes, helping + network administrators identify unusual patterns, potential bottlenecks, or + performance issues. +* Troubleshooting: Debug TCP-related issues by tracking latency and identifying + misconfigurations. + +By default, when RTT is enabled, you can see the following TCP handshake RTT metrics represented in the *Overview*: + +* Top X 90th percentile TCP handshake Round Trip Time with overall +* Top X average TCP handshake Round Trip Time with overall +* Bottom X minimum TCP handshake Round Trip Time with overall + +Other RTT panels can be added in *Manage panels*: + +* Top X maximum TCP handshake Round Trip Time with overall +* Top X 99th percentile TCP handshake Round Trip Time with overall + +See the _Additional Resources_ in this section for more information about enabling and working with this view. \ No newline at end of file diff --git a/modules/network-observability-RTT.adoc b/modules/network-observability-RTT.adoc new file mode 100644 index 0000000000..f56205ba0e --- /dev/null +++ b/modules/network-observability-RTT.adoc @@ -0,0 +1,48 @@ +// Module included in the following assemblies: +// +// * network_observability/observing-network-traffic.adoc + +:_mod-docs-content-type: PROCEDURE +[id="network-observability-RTT_{context}"] += Working with RTT tracing +You can track RTT by editing the `FlowCollector` to the specifications in the following YAML example. + +.Procedure +. In the web console, navigate to *Operators* -> *Installed Operators*. +. In the *Provided APIs* heading for the *NetObserv Operator*, select *Flow Collector*. +. Select *cluster*, and then select the *YAML* tab. +. Configure the `FlowCollector` custom resource for RTT tracing, for example: ++ +[id="network-observability-flowcollector-configuring-RTT_{context}"] +.Example `FlowCollector` configuration +[source, yaml] +---- +apiVersion: flows.netobserv.io/v1beta2 +kind: FlowCollector +metadata: + name: cluster +spec: + namespace: netobserv + deploymentModel: Direct + agent: + type: eBPF + ebpf: + features: + - FlowRTT <1> +---- +<1> You can start tracing RTT network flows by listing the `FlowRTT` parameter in the `spec.agent.ebpf.features` specification list. + +.Verification +When you refresh the *Network Traffic* page, the *Overview*, *Traffic Flow*, and *Topology* views display new information about RTT: + +.. In the *Overview*, select new choices in *Manage panels* to choose which graphical visualizations of RTT to display. +.. In the *Traffic flows* table, the *Flow RTT* column can be seen, and you can manage display in *Manage columns*. +.. In the *Traffic Flows* view, you can also expand the side panel to view more information about RTT. ++ +.Example filtering +... Click the *Common* filters -> *Protocol*. +... Filter the network flow data based on *TCP*, *Ingress* direction, and look for *FlowRTT* values greater than 10,000,000 nanoseconds (10ms). +... Remove the *Protocol* filter. +... Filter for *Flow RTT* values greater than 0 in the *Common* filters. + +.. In the *Topology* view, click the Display option dropdown. Then click *RTT* in the *edge labels* drop-down list. \ No newline at end of file diff --git a/modules/network-observability-SRIOV-configuration.adoc b/modules/network-observability-SRIOV-configuration.adoc index c4cdacaf27..e6845c3392 100644 --- a/modules/network-observability-SRIOV-configuration.adoc +++ b/modules/network-observability-SRIOV-configuration.adoc @@ -16,20 +16,19 @@ In order to collect traffic from a cluster with a Single Root I/O Virtualization . Under the *Provided APIs* heading for the *NetObserv Operator*, select *Flow Collector*. . Select *cluster* and then select the *YAML* tab. . Configure the `FlowCollector` custom resource. A sample configuration is as follows: -+ -[id="network-observability-flowcollector-configuring-SRIOV-monitoring{context}"] + .Configure `FlowCollector` for SR-IOV monitoring [source,yaml] ---- -apiVersion: flows.netobserv.io/v1alpha1 +apiVersion: flows.netobserv.io/v1beta2 kind: FlowCollector metadata: name: cluster spec: namespace: netobserv - deploymentModel: DIRECT + deploymentModel: Direct agent: - type: EBPF + type: eBPF ebpf: privileged: true <1> ---- diff --git a/modules/network-observability-auth-multi-tenancy.adoc b/modules/network-observability-auth-multi-tenancy.adoc deleted file mode 100644 index 886295259a..0000000000 --- a/modules/network-observability-auth-multi-tenancy.adoc +++ /dev/null @@ -1,17 +0,0 @@ -// Module included in the following assemblies: - -// * networking/network_observability/installing-operators.adoc - -:_mod-docs-content-type: PROCEDURE -[id="network-observability-auth-mutli-tenancy_{context}"] -= Configuring authorization and multi-tenancy -Define `ClusterRole` and `ClusterRoleBinding`. The `netobserv-reader` `ClusterRole` enables multi-tenancy and allows individual user access, or group access, to the flows stored in Loki. You can create a YAML file to define these roles. - -.Procedure - -. Using the web console, click the Import icon, *+*. -. Drop your YAML file into the editor and click *Create*: -+ -include::snippets/network-observability-clusterrole-reader.adoc[] -include::snippets/network-observability-clusterrole-writer.adoc[] -include::snippets/network-observability-clusterrolebinding.adoc[] \ No newline at end of file diff --git a/modules/network-observability-configuring-options-overview.adoc b/modules/network-observability-configuring-options-overview.adoc index 15e4671414..73c0c7fe1f 100644 --- a/modules/network-observability-configuring-options-overview.adoc +++ b/modules/network-observability-configuring-options-overview.adoc @@ -5,12 +5,23 @@ :_mod-docs-content-type: REFERENCE [id="network-observability-configuring-options-overview_{context}"] = Configuring advanced options for the Overview view -You can customize the graphical view by using advanced options. To access the advanced options, click *Show advanced options*.You can configure the details in the graph by using the *Display options* drop-down menu. The options available are: +You can customize the graphical view by using advanced options. To access the advanced options, click *Show advanced options*. You can configure the details in the graph by using the *Display options* drop-down menu. The options available are as follows: -* *Metric type*: The metrics to be shown in *Bytes* or *Packets*. The default value is *Bytes*. -* *Scope*: To select the detail of components between which the network traffic flows. You can set the scope to *Node*, *Namespace*, *Owner*, or *Resource*. *Owner* is an aggregation of resources. *Resource* can be a pod, service, node, in case of host-network traffic, or an unknown IP address. The default value is *Namespace*. +* *Scope*: Select to view the components that network traffic flows between. You can set the scope to *Node*, *Namespace*, *Owner*, *Zones*, *Cluster* or *Resource*. *Owner* is an aggregation of resources. *Resource* can be a pod, service, node, in case of host-network traffic, or an unknown IP address. The default value is *Namespace*. * *Truncate labels*: Select the required width of the label from the drop-down list. The default value is *M*. [id="network-observability-cao-managing-panels-overview_{context}"] -== Managing panels -You can select the required statistics to be displayed, and reorder them. To manage columns, click *Manage panels*. \ No newline at end of file +== Managing panels and display +You can select the required panels to be displayed, reorder them, and focus on a specific panel. To add or remove panels, click *Manage panels*. + +The following panels are shown by default: + +* *Top X average bytes rates* +* *Top X bytes rates stacked with total* + +Other panels can be added in *Manage panels*: + +* *Top X average packets rates* +* *Top X packets rates stacked with total* + +*Query options* allows you to choose whether to show the *Top 5*, *Top 10*, or *Top 15* rates. \ No newline at end of file diff --git a/modules/network-observability-configuring-options-topology.adoc b/modules/network-observability-configuring-options-topology.adoc index 9f7822e812..b658c6936d 100644 --- a/modules/network-observability-configuring-options-topology.adoc +++ b/modules/network-observability-configuring-options-topology.adoc @@ -10,12 +10,14 @@ You can customize and export the view by using *Show advanced options*. The adva * *Find in view*: To search the required components in the view. * *Display options*: To configure the following options: + -** *Layout*: To select the layout of the graphical representation. The default value is *ColaNoForce*. +** *Edge labels*: To show the specified measurements as edge labels. The default is to show the *Average rate* in *Bytes*. ** *Scope*: To select the scope of components between which the network traffic flows. The default value is *Namespace*. -** *Groups*: To enchance the understanding of ownership by grouping the components. The default value is *None*. -** *Collapse groups*: To expand or collapse the groups. The groups are expanded by default. This option is disabled if *Groups* has value *None*. +** *Groups*: To enhance the understanding of ownership by grouping the components. The default value is *None*. + +** *Layout*: To select the layout of the graphical representation. The default value is *ColaNoForce*. ** *Show*: To select the details that need to be displayed. All the options are checked by default. The options available are: *Edges*, *Edges label*, and *Badges*. ** *Truncate labels*: To select the required width of the label from the drop-down list. The default value is *M*. +** *Collapse groups*: To expand or collapse the groups. The groups are expanded by default. This option is disabled if *Groups* has the value of *None*. [id="network-observability-cao-export-topology_{context}"] == Exporting the topology view diff --git a/modules/network-observability-disabling-health-alerts.adoc b/modules/network-observability-disabling-health-alerts.adoc index 256b3b5fd4..e062bcd046 100644 --- a/modules/network-observability-disabling-health-alerts.adoc +++ b/modules/network-observability-disabling-health-alerts.adoc @@ -11,9 +11,10 @@ You can opt out of health alerting by editing the `FlowCollector` resource: . Under the *Provided APIs* heading for the *NetObserv Operator*, select *Flow Collector*. . Select *cluster* then select the *YAML* tab. . Add `spec.processor.metrics.disableAlerts` to disable health alerts, as in the following YAML sample: ++ [source,yaml] ---- -apiVersion: flows.netobserv.io/v1alpha1 +apiVersion: flows.netobserv.io/v1beta2 kind: FlowCollector metadata: name: cluster diff --git a/modules/network-observability-dns-overview.adoc b/modules/network-observability-dns-overview.adoc index dac32c6303..8167eed525 100644 --- a/modules/network-observability-dns-overview.adoc +++ b/modules/network-observability-dns-overview.adoc @@ -13,10 +13,18 @@ You can configure graphical representation of Domain Name System (DNS) tracking * Troubleshooting: Debug DNS-related issues by tracing DNS resolution steps, tracking latency, and identifying misconfigurations. -When DNS tracking is enabled, you can see the following metrics represented in a chart in the *Overview*. See the _Additional Resources_ in this section for more information about enabling and working with this view. +By default, when DNS tracking is enabled, you can see the following non-empty metrics represented in a donut or line chart in the *Overview*: -* Top 5 average DNS latencies -* Top 5 DNS response code -* Top 5 DNS response code stacked with total +* Top X DNS Response Code +* Top X average DNS latencies with overall +* Top X 90th percentile DNS latencies -This feature is supported for IPv4 and IPv6 UDP protocol. \ No newline at end of file +Other DNS tracking panels can be added in *Manage panels*: + +* Bottom X minimum DNS latencies +* Top X maximum DNS latencies +* Top X 99th percentile DNS latencies + +This feature is supported for IPv4 and IPv6 UDP and TCP protocols. + +See the _Additional Resources_ in this section for more information about enabling and working with this view. \ No newline at end of file diff --git a/modules/network-observability-dns-tracking.adoc b/modules/network-observability-dns-tracking.adoc index 64f7ab8803..134cb6c09b 100644 --- a/modules/network-observability-dns-tracking.adoc +++ b/modules/network-observability-dns-tracking.adoc @@ -13,7 +13,7 @@ CPU and memory usage increases are observed in the eBPF agent when this feature ==== .Procedure . In the web console, navigate to *Operators* -> *Installed Operators*. -. Under the *Provided APIs* heading for the *NetObserv Operator*, select *Flow Collector*. +. Under the *Provided APIs* heading for *Network Observability*, select *Flow Collector*. . Select *cluster* then select the *YAML* tab. . Configure the `FlowCollector` custom resource. A sample configuration is as follows: + @@ -21,24 +21,29 @@ CPU and memory usage increases are observed in the eBPF agent when this feature .Configure `FlowCollector` for DNS tracking [source, yaml] ---- -apiVersion: flows.netobserv.io/v1alpha1 +apiVersion: flows.netobserv.io/v1beta2 kind: FlowCollector metadata: name: cluster spec: namespace: netobserv - deploymentModel: DIRECT + deploymentModel: Direct agent: - type: EBPF + type: eBPF ebpf: features: - DNSTracking <1> - privileged: true <2> + sampling: 1 <2> ---- <1> You can set the `spec.agent.ebpf.features` parameter list to enable DNS tracking of each network flow in the web console. -<2> Note that the `spec.agent.ebpf.privileged` specification value must be `true` for DNS tracking to be enabled. +<2> You can set `sampling` to a value of `1` for more accurate metrics. . When you refresh the *Network Traffic* page, there are new DNS representations you can choose to view in the *Overview* and *Traffic Flow* views and new filters you can apply. .. Select new DNS choices in *Manage panels* to display graphical visualizations and DNS metrics in the *Overview*. .. Select new choices in *Manage columns* to add DNS columns to the *Traffic Flows* view. -.. Filter on specific DNS metrics, such as *DNS Id*, *DNS Latency* and *DNS Response Code*, and see more information from the side panel. \ No newline at end of file +.. Filter on specific DNS metrics, such as *DNS Id*, *DNS Error* *DNS Latency* and *DNS Response Code*, and see more information from the side panel. The *DNS Latency* and *DNS Response Code* columns are shown by default. + +[NOTE] +==== +TCP handshake packets do not have DNS headers. TCP protocol flows without DNS headers are shown in the traffic flow data with *DNS Latency*, *ID*, and *Response code* values of "n/a". You can filter out flow data to view only flows that have DNS headers using the *Common* filter "DNSError" equal to "0". +==== \ No newline at end of file diff --git a/modules/network-observability-enriched-flows.adoc b/modules/network-observability-enriched-flows.adoc index 95bbed7bde..2bf90b76ca 100644 --- a/modules/network-observability-enriched-flows.adoc +++ b/modules/network-observability-enriched-flows.adoc @@ -20,13 +20,13 @@ You can send network flows to Kafka, IPFIX, or both at the same time. Any proces + [source,yaml] ---- -apiVersion: flows.netobserv.io/v1alpha1 +apiVersion: flows.netobserv.io/v1beta2 kind: FlowCollector metadata: name: cluster spec: exporters: - - type: KAFKA <3> + - type: Kafka <3> kafka: address: "kafka-cluster-kafka-bootstrap.netobserv" topic: netobserv-flows-export <1> diff --git a/modules/network-observability-flowcollector-api-specifications.adoc b/modules/network-observability-flowcollector-api-specifications.adoc index 69659b2728..9147244e6f 100644 --- a/modules/network-observability-flowcollector-api-specifications.adoc +++ b/modules/network-observability-flowcollector-api-specifications.adoc @@ -83,8 +83,8 @@ Type:: | `deploymentModel` | `string` | `deploymentModel` defines the desired type of deployment for flow processing. Possible values are: + - - `DIRECT` (default) to make the flow processor listening directly from the agents. + - - `KAFKA` to make flows sent to a Kafka pipeline before consumption by the processor. + + - `Direct` (default) to make the flow processor listening directly from the agents. + + - `Kafka` to make flows sent to a Kafka pipeline before consumption by the processor. + Kafka can provide better scalability, resiliency, and high availability (for more details, see https://www.redhat.com/en/topics/integration/what-is-apache-kafka). | `exporters` @@ -93,11 +93,11 @@ Type:: | `kafka` | `object` -| Kafka configuration, allowing to use Kafka as a broker as part of the flow collection pipeline. Available when the `spec.deploymentModel` is `KAFKA`. +| Kafka configuration, allowing to use Kafka as a broker as part of the flow collection pipeline. Available when the `spec.deploymentModel` is `Kafka`. | `loki` | `object` -| Loki, the flow store, client settings. +| `loki`, the flow store, client settings. | `namespace` | `string` @@ -127,7 +127,7 @@ Type:: | `ebpf` | `object` -| `ebpf` describes the settings related to the eBPF-based flow reporter when `spec.agent.type` is set to `EBPF`. +| `ebpf` describes the settings related to the eBPF-based flow reporter when `spec.agent.type` is set to `eBPF`. | `ipfix` | `object` @@ -136,16 +136,16 @@ Type:: | `type` | `string` | `type` selects the flows tracing agent. Possible values are: + - - `EBPF` (default) to use Network Observability eBPF agent. + + - `eBPF` (default) to use Network Observability eBPF agent. + - `IPFIX` [deprecated (*)] - to use the legacy IPFIX collector. + - `EBPF` is recommended as it offers better performances and should work regardless of the CNI installed on the cluster. `IPFIX` works with OVN-Kubernetes CNI (other CNIs could work if they support exporting IPFIX, but they would require manual configuration). + `eBPF` is recommended as it offers better performances and should work regardless of the CNI installed on the cluster. `IPFIX` works with OVN-Kubernetes CNI (other CNIs could work if they support exporting IPFIX, but they would require manual configuration). |=== == .spec.agent.ebpf Description:: + -- -`ebpf` describes the settings related to the eBPF-based flow reporter when `spec.agent.type` is set to `EBPF`. +`ebpf` describes the settings related to the eBPF-based flow reporter when `spec.agent.type` is set to `eBPF`. -- Type:: @@ -158,6 +158,10 @@ Type:: |=== | Property | Type | Description +| `advanced` +| `object` +| `advanced` allows setting some aspects of the internal configuration of the eBPF agent. This section is aimed mostly for debugging and fine-grained performance optimizations, such as `GOGC` and `GOMAXPROCS` env vars. Set these values at your own risk. + | `cacheActiveTimeout` | `string` | `cacheActiveTimeout` is the max period during which the reporter aggregates flows before sending. Increasing `cacheMaxFlows` and `cacheActiveTimeout` can decrease the network traffic overhead and the CPU load, however you can expect higher memory consumption and an increased latency in the flow collection. @@ -166,20 +170,16 @@ Type:: | `integer` | `cacheMaxFlows` is the max number of flows in an aggregate; when reached, the reporter sends the flows. Increasing `cacheMaxFlows` and `cacheActiveTimeout` can decrease the network traffic overhead and the CPU load, however you can expect higher memory consumption and an increased latency in the flow collection. -| `debug` -| `object` -| `debug` allows setting some aspects of the internal configuration of the eBPF agent. This section is aimed exclusively for debugging and fine-grained performance optimizations, such as GOGC and GOMAXPROCS env vars. Users setting its values do it at their own risk. - | `excludeInterfaces` | `array (string)` -| `excludeInterfaces` contains the interface names that are excluded from flow tracing. An entry is enclosed by slashes, such as `/br-/` and is matched as a regular expression. Otherwise it is matched as a case-sensitive string. +| `excludeInterfaces` contains the interface names that are excluded from flow tracing. An entry enclosed by slashes, such as `/br-/`, is matched as a regular expression. Otherwise it is matched as a case-sensitive string. | `features` | `array (string)` | List of additional features to enable. They are all disabled by default. Enabling additional features might have performance impacts. Possible values are: + - - `PacketDrop`: enable the packets drop flows logging feature. This feature requires mounting the kernel debug filesystem, so the eBPF pod has to run as privileged. If the `spec.agent.eBPF.privileged` parameter is not set, an error is reported. + - - `DNSTracking`: enable the DNS tracking feature. This feature requires mounting the kernel debug filesystem hence the eBPF pod has to run as privileged. If the `spec.agent.eBPF.privileged` parameter is not set, an error is reported. + - - `FlowRTT` [unsupported (*)]: enable flow latency (RTT) calculations in the eBPF agent during TCP handshakes. This feature better works with `sampling` set to 1. + + - `PacketDrop`: enable the packets drop flows logging feature. This feature requires mounting the kernel debug filesystem, so the eBPF pod has to run as privileged. If the `spec.agent.ebpf.privileged` parameter is not set, an error is reported. + + - `DNSTracking`: enable the DNS tracking feature. + + - `FlowRTT`: enable flow latency (RTT) calculations in the eBPF agent during TCP handshakes. This feature better works with `sampling` set to 1. + | `imagePullPolicy` @@ -188,7 +188,7 @@ Type:: | `interfaces` | `array (string)` -| `interfaces` contains the interface names from where flows are collected. If empty, the agent fetches all the interfaces in the system, excepting the ones listed in ExcludeInterfaces. An entry is enclosed by slashes, such as `/br-/`, is matched as a regular expression. Otherwise it is matched as a case-sensitive string. +| `interfaces` contains the interface names from where flows are collected. If empty, the agent fetches all the interfaces in the system, excepting the ones listed in ExcludeInterfaces. An entry enclosed by slashes, such as `/br-/`, is matched as a regular expression. Otherwise it is matched as a case-sensitive string. | `kafkaBatchSize` | `integer` @@ -200,7 +200,7 @@ Type:: | `privileged` | `boolean` -| Privileged mode for the eBPF Agent container. In general this setting can be ignored or set to false: in that case, the operator sets granular capabilities (BPF, PERFMON, NET_ADMIN, SYS_RESOURCE) to the container, to enable its correct operation. If for some reason these capabilities cannot be set, such as if an old kernel version not knowing CAP_BPF is in use, then you can turn on this mode for more global privileges. +| Privileged mode for the eBPF Agent container. When ignored or set to `false`, the operator sets granular capabilities (BPF, PERFMON, NET_ADMIN, SYS_RESOURCE) to the container. If for some reason these capabilities cannot be set, such as if an old kernel version not knowing CAP_BPF is in use, then you can turn on this mode for more global privileges. Some agent features require the privileged mode, such as packet drops tracking (see `features`) and SR-IOV support. | `resources` | `object` @@ -211,11 +211,11 @@ Type:: | Sampling rate of the flow reporter. 100 means one flow on 100 is sent. 0 or 1 means all flows are sampled. |=== -== .spec.agent.ebpf.debug +== .spec.agent.ebpf.advanced Description:: + -- -`debug` allows setting some aspects of the internal configuration of the eBPF agent. This section is aimed exclusively for debugging and fine-grained performance optimizations, such as GOGC and GOMAXPROCS env vars. Users setting its values do it at their own risk. +`advanced` allows setting some aspects of the internal configuration of the eBPF agent. This section is aimed mostly for debugging and fine-grained performance optimizations, such as `GOGC` and `GOMAXPROCS` env vars. Set these values at your own risk. -- Type:: @@ -230,7 +230,7 @@ Type:: | `env` | `object (string)` -| `env` allows passing custom environment variables to underlying components. Useful for passing some very concrete performance-tuning options, such as GOGC and GOMAXPROCS, that should not be publicly exposed as part of the FlowCollector descriptor, as they are only useful in edge debug or support scenarios. +| `env` allows passing custom environment variables to underlying components. Useful for passing some very concrete performance-tuning options, such as `GOGC` and `GOMAXPROCS`, that should not be publicly exposed as part of the FlowCollector descriptor, as they are only useful in edge debug or support scenarios. |=== == .spec.agent.ebpf.resources @@ -290,7 +290,7 @@ Type:: | `forceSampleAll` | `boolean` -| `forceSampleAll` allows disabling sampling in the IPFIX-based flow reporter. It is not recommended to sample all the traffic with IPFIX, as it might generate cluster instability. If you REALLY want to do that, set this flag to true. Use at your own risk. When it is set to true, the value of `sampling` is ignored. +| `forceSampleAll` allows disabling sampling in the IPFIX-based flow reporter. It is not recommended to sample all the traffic with IPFIX, as it might generate cluster instability. If you REALLY want to do that, set this flag to `true`. Use at your own risk. When it is set to `true`, the value of `sampling` is ignored. | `ovnKubernetes` | `object` @@ -370,13 +370,17 @@ Type:: |=== | Property | Type | Description +| `advanced` +| `object` +| `advanced` allows setting some aspects of the internal configuration of the console plugin. This section is aimed mostly for debugging and fine-grained performance optimizations, such as `GOGC` and `GOMAXPROCS` env vars. Set these values at your own risk. + | `autoscaler` | `object` | `autoscaler` spec of a horizontal pod autoscaler to set up for the plugin Deployment. Refer to HorizontalPodAutoscaler documentation (autoscaling/v2). | `enable` | `boolean` -| enable the console plugin deployment. spec.Loki.enable must also be true +| Enables the console plugin deployment. `spec.loki.enable` must also be `true` | `imagePullPolicy` | `string` @@ -386,10 +390,6 @@ Type:: | `string` | `logLevel` for the console plugin backend -| `port` -| `integer` -| `port` is the plugin service port. Do not use 9002, which is reserved for metrics. - | `portNaming` | `object` | `portNaming` defines the configuration of the port-to-service name translation @@ -398,10 +398,6 @@ Type:: | `array` | `quickFilters` configures quick filter presets for the Console plugin -| `register` -| `boolean` -| `register` allows, when set to true, to automatically register the provided console plugin with the {product-title} Console operator. When set to false, you can still register it manually by editing console.operator.openshift.io/cluster with the following command: `oc patch console.operator.openshift.io cluster --type='json' -p '[{"op": "add", "path": "/spec/plugins/-", "value": "netobserv-plugin"}]'` - | `replicas` | `integer` | `replicas` defines the number of replicas (pods) to start. @@ -410,6 +406,40 @@ Type:: | `object` | `resources`, in terms of compute resources, required by this container. More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ +|=== +== .spec.consolePlugin.advanced +Description:: ++ +-- +`advanced` allows setting some aspects of the internal configuration of the console plugin. This section is aimed mostly for debugging and fine-grained performance optimizations, such as `GOGC` and `GOMAXPROCS` env vars. Set these values at your own risk. +-- + +Type:: + `object` + + + + +[cols="1,1,1",options="header"] +|=== +| Property | Type | Description + +| `args` +| `array (string)` +| `args` allows passing custom arguments to underlying components. Useful for overriding some parameters, such as an url or a configuration path, that should not be publicly exposed as part of the FlowCollector descriptor, as they are only useful in edge debug or support scenarios. + +| `env` +| `object (string)` +| `env` allows passing custom environment variables to underlying components. Useful for passing some very concrete performance-tuning options, such as `GOGC` and `GOMAXPROCS`, that should not be publicly exposed as part of the FlowCollector descriptor, as they are only useful in edge debug or support scenarios. + +| `port` +| `integer` +| `port` is the plugin service port. Do not use 9002, which is reserved for metrics. + +| `register` +| `boolean` +| `register` allows, when set to `true`, to automatically register the provided console plugin with the {product-title} Console operator. When set to `false`, you can still register it manually by editing console.operator.openshift.io/cluster with the following command: `oc patch console.operator.openshift.io cluster --type='json' -p '[{"op": "add", "path": "/spec/plugins/-", "value": "netobserv-plugin"}]'` + |=== == .spec.consolePlugin.autoscaler Description:: @@ -564,7 +594,7 @@ Required:: | `type` | `string` -| `type` selects the type of exporters. The available options are `KAFKA` and `IPFIX`. +| `type` selects the type of exporters. The available options are `Kafka` and `IPFIX`. |=== == .spec.exporters[].ipfix @@ -664,7 +694,7 @@ Type:: | `type` | `string` -| Type of SASL authentication to use, or `DISABLED` if SASL is not used +| Type of SASL authentication to use, or `Disabled` if SASL is not used |=== == .spec.exporters[].kafka.sasl.clientIDReference @@ -762,7 +792,7 @@ Type:: | `insecureSkipVerify` | `boolean` -| `insecureSkipVerify` allows skipping client-side verification of the server certificate. If set to true, the `caCert` field is ignored. +| `insecureSkipVerify` allows skipping client-side verification of the server certificate. If set to `true`, the `caCert` field is ignored. | `userCert` | `object` @@ -849,7 +879,7 @@ Type:: Description:: + -- -Kafka configuration, allowing to use Kafka as a broker as part of the flow collection pipeline. Available when the `spec.deploymentModel` is `KAFKA`. +Kafka configuration, allowing to use Kafka as a broker as part of the flow collection pipeline. Available when the `spec.deploymentModel` is `Kafka`. -- Type:: @@ -879,7 +909,7 @@ Required:: | `topic` | `string` -| Kafka topic to use. It must exist, Network Observability does not create it. +| Kafka topic to use. It must exist. Network Observability does not create it. |=== == .spec.kafka.sasl @@ -909,7 +939,7 @@ Type:: | `type` | `string` -| Type of SASL authentication to use, or `DISABLED` if SASL is not used +| Type of SASL authentication to use, or `Disabled` if SASL is not used |=== == .spec.kafka.sasl.clientIDReference @@ -1007,7 +1037,7 @@ Type:: | `insecureSkipVerify` | `boolean` -| `insecureSkipVerify` allows skipping client-side verification of the server certificate. If set to true, the `caCert` field is ignored. +| `insecureSkipVerify` allows skipping client-side verification of the server certificate. If set to `true`, the `caCert` field is ignored. | `userCert` | `object` @@ -1094,7 +1124,134 @@ Type:: Description:: + -- -Loki, the flow store, client settings. +`loki`, the flow store, client settings. +-- + +Type:: + `object` + + + + +[cols="1,1,1",options="header"] +|=== +| Property | Type | Description + +| `advanced` +| `object` +| `advanced` allows setting some aspects of the internal configuration of the Loki clients. This section is aimed mostly for debugging and fine-grained performance optimizations. + +| `enable` +| `boolean` +| Set `enable` to `true` to store flows in Loki. It is required for the {product-title} Console plugin installation. + +| `lokiStack` +| `object` +| Loki configuration for `LokiStack` mode. This is useful for an easy loki-operator configuration. It is ignored for other modes. + +| `manual` +| `object` +| Loki configuration for `Manual` mode. This is the most flexible configuration. It is ignored for other modes. + +| `microservices` +| `object` +| Loki configuration for `Microservices` mode. Use this option when Loki is installed using the microservices deployment mode (https://grafana.com/docs/loki/latest/fundamentals/architecture/deployment-modes/#microservices-mode). It is ignored for other modes. + +| `mode` +| `string` +| `mode` must be set according to the installation mode of Loki: + + - Use `LokiStack` when Loki is managed using the Loki Operator + + - Use `Monolithic` when Loki is installed as a monolithic workload + + - Use `Microservices` when Loki is installed as microservices, but without Loki Operator + + - Use `Manual` if none of the options above match your setup + + + +| `monolithic` +| `object` +| Loki configuration for `Monolithic` mode. Use this option when Loki is installed using the monolithic deployment mode (https://grafana.com/docs/loki/latest/fundamentals/architecture/deployment-modes/#monolithic-mode). It is ignored for other modes. + +| `readTimeout` +| `string` +| `readTimeout` is the maximum console plugin loki query total time limit. A timeout of zero means no timeout. + +| `writeBatchSize` +| `integer` +| `writeBatchSize` is the maximum batch size (in bytes) of Loki logs to accumulate before sending. + +| `writeBatchWait` +| `string` +| `writeBatchWait` is the maximum time to wait before sending a Loki batch. + +| `writeTimeout` +| `string` +| `writeTimeout` is the maximum Loki time connection / request limit. A timeout of zero means no timeout. + +|=== +== .spec.loki.advanced +Description:: ++ +-- +`advanced` allows setting some aspects of the internal configuration of the Loki clients. This section is aimed mostly for debugging and fine-grained performance optimizations. +-- + +Type:: + `object` + + + + +[cols="1,1,1",options="header"] +|=== +| Property | Type | Description + +| `staticLabels` +| `object (string)` +| `staticLabels` is a map of common labels to set on each flow in Loki storage. + +| `writeMaxBackoff` +| `string` +| `writeMaxBackoff` is the maximum backoff time for Loki client connection between retries. + +| `writeMaxRetries` +| `integer` +| `writeMaxRetries` is the maximum number of retries for Loki client connections. + +| `writeMinBackoff` +| `string` +| `writeMinBackoff` is the initial backoff time for Loki client connection between retries. + +|=== +== .spec.loki.lokiStack +Description:: ++ +-- +Loki configuration for `LokiStack` mode. This is useful for an easy loki-operator configuration. It is ignored for other modes. +-- + +Type:: + `object` + + + + +[cols="1,1,1",options="header"] +|=== +| Property | Type | Description + +| `name` +| `string` +| Name of an existing LokiStack resource to use. + +| `namespace` +| `string` +| Namespace where this `LokiStack` resource is located. If omited, it is assumed to be the same as `spec.namespace`. + +|=== +== .spec.loki.manual +Description:: ++ +-- +Loki configuration for `Manual` mode. This is the most flexible configuration. It is ignored for other modes. -- Type:: @@ -1110,42 +1267,18 @@ Type:: | `authToken` | `string` | `authToken` describes the way to get a token to authenticate to Loki. + - - `DISABLED` does not send any token with the request. + - - `FORWARD` forwards the user token for authorization. + - - `HOST` [deprecated (*)] - uses the local pod service account to authenticate to Loki. + - When using the Loki Operator, this must be set to `FORWARD`. + - `Disabled` does not send any token with the request. + + - `Forward` forwards the user token for authorization. + + - `Host` [deprecated (*)] - uses the local pod service account to authenticate to Loki. + + When using the Loki Operator, this must be set to `Forward`. -| `batchSize` -| `integer` -| `batchSize` is the maximum batch size (in bytes) of logs to accumulate before sending. - -| `batchWait` +| `ingesterUrl` | `string` -| `batchWait` is the maximum time to wait before sending a batch. - -| `enable` -| `boolean` -| Set to `enable` to store flows to Loki. It is required for the {product-title} Console plugin installation. - -| `maxBackoff` -| `string` -| `maxBackoff` is the maximum backoff time for client connection between retries. - -| `maxRetries` -| `integer` -| `maxRetries` is the maximum number of retries for client connections. - -| `minBackoff` -| `string` -| `minBackoff` is the initial backoff time for client connection between retries. +| `ingesterUrl` is the address of an existing Loki ingester service to push the flows to. When using the Loki Operator, set it to the Loki gateway service with the `network` tenant set in path, for example https://loki-gateway-http.netobserv.svc:8080/api/logs/v1/network. | `querierUrl` | `string` -| `querierURL` specifies the address of the Loki querier service, in case it is different from the Loki ingester URL. If empty, the URL value is used (assuming that the Loki ingester and querier are in the same server). When using the Loki Operator, do not set it, since ingestion and queries use the Loki gateway. - -| `staticLabels` -| `object (string)` -| `staticLabels` is a map of common labels to set on each flow. +| `querierUrl` specifies the address of the Loki querier service. When using the Loki Operator, set it to the Loki gateway service with the `network` tenant set in path, for example https://loki-gateway-http.netobserv.svc:8080/api/logs/v1/network. | `statusTls` | `object` @@ -1153,26 +1286,18 @@ Type:: | `statusUrl` | `string` -| `statusURL` specifies the address of the Loki `/ready`, `/metrics` and `/config` endpoints, in case it is different from the Loki querier URL. If empty, the `querierURL` value is used. This is useful to show error messages and some context in the frontend. When using the Loki Operator, set it to the Loki HTTP query frontend service, for example https://loki-query-frontend-http.netobserv.svc:3100/. `statusTLS` configuration is used when `statusUrl` is set. +| `statusUrl` specifies the address of the Loki `/ready`, `/metrics` and `/config` endpoints, in case it is different from the Loki querier URL. If empty, the `querierUrl` value is used. This is useful to show error messages and some context in the frontend. When using the Loki Operator, set it to the Loki HTTP query frontend service, for example https://loki-query-frontend-http.netobserv.svc:3100/. `statusTLS` configuration is used when `statusUrl` is set. | `tenantID` | `string` | `tenantID` is the Loki `X-Scope-OrgID` that identifies the tenant for each request. When using the Loki Operator, set it to `network`, which corresponds to a special tenant mode. -| `timeout` -| `string` -| `timeout` is the maximum time connection / request limit. A timeout of zero means no timeout. - | `tls` | `object` | TLS client configuration for Loki URL. -| `url` -| `string` -| `url` is the address of an existing Loki service to push the flows to. When using the Loki Operator, set it to the Loki gateway service with the `network` tenant set in path, for example https://loki-gateway-http.netobserv.svc:8080/api/logs/v1/network. - |=== -== .spec.loki.statusTls +== .spec.loki.manual.statusTls Description:: + -- @@ -1199,14 +1324,14 @@ Type:: | `insecureSkipVerify` | `boolean` -| `insecureSkipVerify` allows skipping client-side verification of the server certificate. If set to true, the `caCert` field is ignored. +| `insecureSkipVerify` allows skipping client-side verification of the server certificate. If set to `true`, the `caCert` field is ignored. | `userCert` | `object` | `userCert` defines the user certificate reference and is used for mTLS (you can ignore it when using one-way TLS) |=== -== .spec.loki.statusTls.caCert +== .spec.loki.manual.statusTls.caCert Description:: + -- @@ -1244,7 +1369,7 @@ Type:: | Type for the certificate reference: `configmap` or `secret` |=== -== .spec.loki.statusTls.userCert +== .spec.loki.manual.statusTls.userCert Description:: + -- @@ -1282,7 +1407,7 @@ Type:: | Type for the certificate reference: `configmap` or `secret` |=== -== .spec.loki.tls +== .spec.loki.manual.tls Description:: + -- @@ -1309,14 +1434,14 @@ Type:: | `insecureSkipVerify` | `boolean` -| `insecureSkipVerify` allows skipping client-side verification of the server certificate. If set to true, the `caCert` field is ignored. +| `insecureSkipVerify` allows skipping client-side verification of the server certificate. If set to `true`, the `caCert` field is ignored. | `userCert` | `object` | `userCert` defines the user certificate reference and is used for mTLS (you can ignore it when using one-way TLS) |=== -== .spec.loki.tls.caCert +== .spec.loki.manual.tls.caCert Description:: + -- @@ -1354,7 +1479,291 @@ Type:: | Type for the certificate reference: `configmap` or `secret` |=== -== .spec.loki.tls.userCert +== .spec.loki.manual.tls.userCert +Description:: ++ +-- +`userCert` defines the user certificate reference and is used for mTLS (you can ignore it when using one-way TLS) +-- + +Type:: + `object` + + + + +[cols="1,1,1",options="header"] +|=== +| Property | Type | Description + +| `certFile` +| `string` +| `certFile` defines the path to the certificate file name within the config map or secret + +| `certKey` +| `string` +| `certKey` defines the path to the certificate private key file name within the config map or secret. Omit when the key is not necessary. + +| `name` +| `string` +| Name of the config map or secret containing certificates + +| `namespace` +| `string` +| Namespace of the config map or secret containing certificates. If omitted, the default is to use the same namespace as where Network Observability is deployed. If the namespace is different, the config map or the secret is copied so that it can be mounted as required. + +| `type` +| `string` +| Type for the certificate reference: `configmap` or `secret` + +|=== +== .spec.loki.microservices +Description:: ++ +-- +Loki configuration for `Microservices` mode. Use this option when Loki is installed using the microservices deployment mode (https://grafana.com/docs/loki/latest/fundamentals/architecture/deployment-modes/#microservices-mode). It is ignored for other modes. +-- + +Type:: + `object` + + + + +[cols="1,1,1",options="header"] +|=== +| Property | Type | Description + +| `ingesterUrl` +| `string` +| `ingesterUrl` is the address of an existing Loki ingester service to push the flows to. + +| `querierUrl` +| `string` +| `querierURL` specifies the address of the Loki querier service. + +| `tenantID` +| `string` +| `tenantID` is the Loki `X-Scope-OrgID` header that identifies the tenant for each request. + +| `tls` +| `object` +| TLS client configuration for Loki URL. + +|=== +== .spec.loki.microservices.tls +Description:: ++ +-- +TLS client configuration for Loki URL. +-- + +Type:: + `object` + + + + +[cols="1,1,1",options="header"] +|=== +| Property | Type | Description + +| `caCert` +| `object` +| `caCert` defines the reference of the certificate for the Certificate Authority + +| `enable` +| `boolean` +| Enable TLS + +| `insecureSkipVerify` +| `boolean` +| `insecureSkipVerify` allows skipping client-side verification of the server certificate. If set to `true`, the `caCert` field is ignored. + +| `userCert` +| `object` +| `userCert` defines the user certificate reference and is used for mTLS (you can ignore it when using one-way TLS) + +|=== +== .spec.loki.microservices.tls.caCert +Description:: ++ +-- +`caCert` defines the reference of the certificate for the Certificate Authority +-- + +Type:: + `object` + + + + +[cols="1,1,1",options="header"] +|=== +| Property | Type | Description + +| `certFile` +| `string` +| `certFile` defines the path to the certificate file name within the config map or secret + +| `certKey` +| `string` +| `certKey` defines the path to the certificate private key file name within the config map or secret. Omit when the key is not necessary. + +| `name` +| `string` +| Name of the config map or secret containing certificates + +| `namespace` +| `string` +| Namespace of the config map or secret containing certificates. If omitted, the default is to use the same namespace as where Network Observability is deployed. If the namespace is different, the config map or the secret is copied so that it can be mounted as required. + +| `type` +| `string` +| Type for the certificate reference: `configmap` or `secret` + +|=== +== .spec.loki.microservices.tls.userCert +Description:: ++ +-- +`userCert` defines the user certificate reference and is used for mTLS (you can ignore it when using one-way TLS) +-- + +Type:: + `object` + + + + +[cols="1,1,1",options="header"] +|=== +| Property | Type | Description + +| `certFile` +| `string` +| `certFile` defines the path to the certificate file name within the config map or secret + +| `certKey` +| `string` +| `certKey` defines the path to the certificate private key file name within the config map or secret. Omit when the key is not necessary. + +| `name` +| `string` +| Name of the config map or secret containing certificates + +| `namespace` +| `string` +| Namespace of the config map or secret containing certificates. If omitted, the default is to use the same namespace as where Network Observability is deployed. If the namespace is different, the config map or the secret is copied so that it can be mounted as required. + +| `type` +| `string` +| Type for the certificate reference: `configmap` or `secret` + +|=== +== .spec.loki.monolithic +Description:: ++ +-- +Loki configuration for `Monolithic` mode. Use this option when Loki is installed using the monolithic deployment mode (https://grafana.com/docs/loki/latest/fundamentals/architecture/deployment-modes/#monolithic-mode). It is ignored for other modes. +-- + +Type:: + `object` + + + + +[cols="1,1,1",options="header"] +|=== +| Property | Type | Description + +| `tenantID` +| `string` +| `tenantID` is the Loki `X-Scope-OrgID` header that identifies the tenant for each request. + +| `tls` +| `object` +| TLS client configuration for Loki URL. + +| `url` +| `string` +| `url` is the unique address of an existing Loki service that points to both the ingester and the querier. + +|=== +== .spec.loki.monolithic.tls +Description:: ++ +-- +TLS client configuration for Loki URL. +-- + +Type:: + `object` + + + + +[cols="1,1,1",options="header"] +|=== +| Property | Type | Description + +| `caCert` +| `object` +| `caCert` defines the reference of the certificate for the Certificate Authority + +| `enable` +| `boolean` +| Enable TLS + +| `insecureSkipVerify` +| `boolean` +| `insecureSkipVerify` allows skipping client-side verification of the server certificate. If set to `true`, the `caCert` field is ignored. + +| `userCert` +| `object` +| `userCert` defines the user certificate reference and is used for mTLS (you can ignore it when using one-way TLS) + +|=== +== .spec.loki.monolithic.tls.caCert +Description:: ++ +-- +`caCert` defines the reference of the certificate for the Certificate Authority +-- + +Type:: + `object` + + + + +[cols="1,1,1",options="header"] +|=== +| Property | Type | Description + +| `certFile` +| `string` +| `certFile` defines the path to the certificate file name within the config map or secret + +| `certKey` +| `string` +| `certKey` defines the path to the certificate private key file name within the config map or secret. Omit when the key is not necessary. + +| `name` +| `string` +| Name of the config map or secret containing certificates + +| `namespace` +| `string` +| Namespace of the config map or secret containing certificates. If omitted, the default is to use the same namespace as where Network Observability is deployed. If the namespace is different, the config map or the secret is copied so that it can be mounted as required. + +| `type` +| `string` +| Type for the certificate reference: `configmap` or `secret` + +|=== +== .spec.loki.monolithic.tls.userCert Description:: + -- @@ -1409,38 +1818,18 @@ Type:: |=== | Property | Type | Description +| `addZone` +| `boolean` +| `addZone` allows availability zone awareness by labelling flows with their source and destination zones. This feature requires the "topology.kubernetes.io/zone" label to be set on nodes. + +| `advanced` +| `object` +| `advanced` allows setting some aspects of the internal configuration of the flow processor. This section is aimed mostly for debugging and fine-grained performance optimizations, such as `GOGC` and `GOMAXPROCS` env vars. Set these values at your own risk. + | `clusterName` | `string` | `clusterName` is the name of the cluster to appear in the flows data. This is useful in a multi-cluster context. When using {product-title}, leave empty to make it automatically determined. -| `conversationEndTimeout` -| `string` -| `conversationEndTimeout` is the time to wait after a network flow is received, to consider the conversation ended. This delay is ignored when a FIN packet is collected for TCP flows (see `conversationTerminatingTimeout` instead). - -| `conversationHeartbeatInterval` -| `string` -| `conversationHeartbeatInterval` is the time to wait between "tick" events of a conversation - -| `conversationTerminatingTimeout` -| `string` -| `conversationTerminatingTimeout` is the time to wait from detected FIN flag to end a conversation. Only relevant for TCP flows. - -| `debug` -| `object` -| `debug` allows setting some aspects of the internal configuration of the flow processor. This section is aimed exclusively for debugging and fine-grained performance optimizations, such as GOGC and GOMAXPROCS env vars. Users setting its values do it at their own risk. - -| `dropUnusedFields` -| `boolean` -| `dropUnusedFields` allows, when set to true, to drop fields that are known to be unused by OVS, to save storage space. - -| `enableKubeProbes` -| `boolean` -| `enableKubeProbes` is a flag to enable or disable Kubernetes liveness and readiness probes - -| `healthPort` -| `integer` -| `healthPort` is a collector HTTP port in the Pod that exposes the health check API - | `imagePullPolicy` | `string` | `imagePullPolicy` is the Kubernetes pull policy for the image defined above @@ -1468,34 +1857,30 @@ Type:: | `logTypes` | `string` | `logTypes` defines the desired record types to generate. Possible values are: + - - `FLOWS` (default) to export regular network flows + - - `CONVERSATIONS` to generate events for started conversations, ended conversations as well as periodic "tick" updates + - - `ENDED_CONVERSATIONS` to generate only ended conversations events + - - `ALL` to generate both network flows and all conversations events + + - `Flows` (default) to export regular network flows + + - `Conversations` to generate events for started conversations, ended conversations as well as periodic "tick" updates + + - `EndedConversations` to generate only ended conversations events + + - `All` to generate both network flows and all conversations events + | `metrics` | `object` | `Metrics` define the processor configuration regarding metrics -| `port` -| `integer` -| Port of the flow collector (host port). By convention, some values are forbidden. It must be greater than 1024 and different from 4500, 4789 and 6081. - -| `profilePort` -| `integer` -| `profilePort` allows setting up a Go pprof profiler listening to this port +| `multiClusterDeployment` +| `boolean` +| Set `multiClusterDeployment` to `true` to enable multi clusters feature. This adds `clusterName` label to flows data | `resources` | `object` | `resources` are the compute resources required by this container. More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ |=== -== .spec.processor.debug +== .spec.processor.advanced Description:: + -- -`debug` allows setting some aspects of the internal configuration of the flow processor. This section is aimed exclusively for debugging and fine-grained performance optimizations, such as GOGC and GOMAXPROCS env vars. Users setting its values do it at their own risk. +`advanced` allows setting some aspects of the internal configuration of the flow processor. This section is aimed mostly for debugging and fine-grained performance optimizations, such as `GOGC` and `GOMAXPROCS` env vars. Set these values at your own risk. -- Type:: @@ -1508,9 +1893,41 @@ Type:: |=== | Property | Type | Description +| `conversationEndTimeout` +| `string` +| `conversationEndTimeout` is the time to wait after a network flow is received, to consider the conversation ended. This delay is ignored when a FIN packet is collected for TCP flows (see `conversationTerminatingTimeout` instead). + +| `conversationHeartbeatInterval` +| `string` +| `conversationHeartbeatInterval` is the time to wait between "tick" events of a conversation + +| `conversationTerminatingTimeout` +| `string` +| `conversationTerminatingTimeout` is the time to wait from detected FIN flag to end a conversation. Only relevant for TCP flows. + +| `dropUnusedFields` +| `boolean` +| `dropUnusedFields` allows, when set to `true`, to drop fields that are known to be unused by OVS, to save storage space. + +| `enableKubeProbes` +| `boolean` +| `enableKubeProbes` is a flag to enable or disable Kubernetes liveness and readiness probes + | `env` | `object (string)` -| `env` allows passing custom environment variables to underlying components. Useful for passing some very concrete performance-tuning options, such as GOGC and GOMAXPROCS, that should not be publicly exposed as part of the FlowCollector descriptor, as they are only useful in edge debug or support scenarios. +| `env` allows passing custom environment variables to underlying components. Useful for passing some very concrete performance-tuning options, such as `GOGC` and `GOMAXPROCS`, that should not be publicly exposed as part of the FlowCollector descriptor, as they are only useful in edge debug or support scenarios. + +| `healthPort` +| `integer` +| `healthPort` is a collector HTTP port in the Pod that exposes the health check API + +| `port` +| `integer` +| Port of the flow collector (host port). By convention, some values are forbidden. It must be greater than 1024 and different from 4500, 4789 and 6081. + +| `profilePort` +| `integer` +| `profilePort` allows setting up a Go pprof profiler listening to this port |=== == .spec.processor.kafkaConsumerAutoscaler @@ -1550,9 +1967,9 @@ Type:: `NetObservLokiError`, which is triggered when flows are being dropped due to Loki errors. + -| `ignoreTags` +| `includeList` | `array (string)` -| `ignoreTags` is a list of tags to specify which metrics to ignore. Each metric is associated with a list of tags. More details in https://github.com/netobserv/network-observability-operator/tree/main/controllers/flowlogspipeline/metrics_definitions . Available tags are: `egress`, `ingress`, `flows`, `bytes`, `packets`, `namespaces`, `nodes`, `workloads`, `nodes-flows`, `namespaces-flows`, `workloads-flows`. Namespace-based metrics are covered by both `workloads` and `namespaces` tags, hence it is recommended to always ignore one of them (`workloads` offering a finer granularity). +| `includeList` is a list of metric names to specify which ones to generate. The names correspond to the names in Prometheus without the prefix. For example, `namespace_egress_packets_total` shows up as `netobserv_namespace_egress_packets_total` in Prometheus. Note that the more metrics you add, the bigger is the impact on Prometheus workload resources. Metrics enabled by default are: `namespace_flows_total`, `node_ingress_bytes_total`, `workload_ingress_bytes_total`, `namespace_drop_packets_total` (when `PacketDrop` feature is enabled), `namespace_rtt_seconds` (when `FlowRTT` feature is enabled), `namespace_dns_latency_seconds` (when `DNSTracking` feature is enabled). More information, with full list of available metrics: https://github.com/netobserv/network-observability-operator/blob/main/docs/Metrics.md | `server` | `object` @@ -1604,27 +2021,27 @@ Type:: | `insecureSkipVerify` | `boolean` -| `insecureSkipVerify` allows skipping client-side verification of the provided certificate. If set to true, the `providedCaFile` field is ignored. +| `insecureSkipVerify` allows skipping client-side verification of the provided certificate. If set to `true`, the `providedCaFile` field is ignored. | `provided` | `object` -| TLS configuration when `type` is set to `PROVIDED`. +| TLS configuration when `type` is set to `Provided`. | `providedCaFile` | `object` -| Reference to the CA file when `type` is set to `PROVIDED`. +| Reference to the CA file when `type` is set to `Provided`. | `type` | `string` | Select the type of TLS configuration: + - - `DISABLED` (default) to not configure TLS for the endpoint. - `PROVIDED` to manually provide cert file and a key file. - `AUTO` to use {product-title} auto generated certificate using annotations. + - `Disabled` (default) to not configure TLS for the endpoint. - `Provided` to manually provide cert file and a key file. - `Auto` to use {product-title} auto generated certificate using annotations. |=== == .spec.processor.metrics.server.tls.provided Description:: + -- -TLS configuration when `type` is set to `PROVIDED`. +TLS configuration when `type` is set to `Provided`. -- Type:: @@ -1662,7 +2079,7 @@ Type:: Description:: + -- -Reference to the CA file when `type` is set to `PROVIDED`. +Reference to the CA file when `type` is set to `Provided`. -- Type:: diff --git a/modules/network-observability-flowcollector-kafka-config.adoc b/modules/network-observability-flowcollector-kafka-config.adoc index 7d76038898..7d2a51fc2f 100644 --- a/modules/network-observability-flowcollector-kafka-config.adoc +++ b/modules/network-observability-flowcollector-kafka-config.adoc @@ -20,22 +20,21 @@ You can configure the `FlowCollector` resource to use Kafka for high-throughput . Modify the `FlowCollector` resource for {product-title} Network Observability Operator to use Kafka, as shown in the following sample YAML: .Sample Kafka configuration in `FlowCollector` resource -[id="network-observability-flowcollector-configuring-kafka-sample_{context}"] [source, yaml] ---- -apiVersion: flows.netobserv.io/v1beta1 +apiVersion: flows.netobserv.io/v1beta2 kind: FlowCollector metadata: name: cluster spec: - deploymentModel: KAFKA <1> + deploymentModel: Kafka <1> kafka: address: "kafka-cluster-kafka-bootstrap.netobserv" <2> topic: network-flows <3> tls: enable: false <4> ---- -<1> Set `spec.deploymentModel` to `KAFKA` instead of `DIRECT` to enable the Kafka deployment model. +<1> Set `spec.deploymentModel` to `Kafka` instead of `Direct` to enable the Kafka deployment model. <2> `spec.kafka.address` refers to the Kafka bootstrap server address. You can specify a port if needed, for instance `kafka-cluster-kafka-bootstrap.netobserv:9093` for using TLS on port 9093. <3> `spec.kafka.topic` should match the name of a topic created in Kafka. <4> `spec.kafka.tls` can be used to encrypt all communications to and from Kafka with TLS or mTLS. When enabled, the Kafka CA certificate must be available as a ConfigMap or a Secret, both in the namespace where the `flowlogs-pipeline` processor component is deployed (default: `netobserv`) and where the eBPF agents are deployed (default: `netobserv-privileged`). It must be referenced with `spec.kafka.tls.caCert`. When using mTLS, client secrets must be available in these namespaces as well (they can be generated for instance using the AMQ Streams User Operator) and referenced with `spec.kafka.tls.userCert`. \ No newline at end of file diff --git a/modules/network-observability-flowcollector-view.adoc b/modules/network-observability-flowcollector-view.adoc index 87857cebcc..dcc7fbf5d1 100644 --- a/modules/network-observability-flowcollector-view.adoc +++ b/modules/network-observability-flowcollector-view.adoc @@ -17,15 +17,15 @@ The following example shows a sample `FlowCollector` resource for {product-title .Sample `FlowCollector` resource [source, yaml] ---- -apiVersion: flows.netobserv.io/v1beta1 +apiVersion: flows.netobserv.io/v1beta2 kind: FlowCollector metadata: name: cluster spec: namespace: netobserv - deploymentModel: DIRECT + deploymentModel: Direct agent: - type: EBPF <1> + type: eBPF <1> ebpf: sampling: 50 <2> logLevel: info @@ -36,7 +36,7 @@ spec: cpu: 100m limits: memory: 800Mi - processor: + processor: <3> logLevel: info resources: requests: @@ -44,20 +44,12 @@ spec: cpu: 100m limits: memory: 800Mi - conversationEndTimeout: 10s - logTypes: FLOWS <3> - conversationHeartbeatInterval: 30s - loki: <4> - url: 'https://loki-gateway-http.netobserv.svc:8080/api/logs/v1/network' - statusUrl: 'https://loki-query-frontend-http.netobserv.svc:3100/' - authToken: FORWARD - tls: - enable: true - caCert: - type: configmap - name: loki-gateway-ca-bundle - certFile: service-ca.crt - namespace: loki-namespace # <5> + logTypes: Flows + advanced: + conversationEndTimeout: 10s + conversationHeartbeatInterval: 30s + loki: <4> + mode: LokiStack <5> consolePlugin: register: true logLevel: info @@ -65,7 +57,7 @@ spec: enable: true portNames: "3100": loki - quickFilters: <6> + quickFilters: <6> - name: Applications filter: src_namespace!: 'openshift-,netobserv' @@ -86,7 +78,7 @@ spec: ---- <1> The Agent specification, `spec.agent.type`, must be `EBPF`. eBPF is the only {product-title} supported option. <2> You can set the Sampling specification, `spec.agent.ebpf.sampling`, to manage resources. Lower sampling values might consume a large amount of computational, memory and storage resources. You can mitigate this by specifying a sampling ratio value. A value of 100 means 1 flow every 100 is sampled. A value of 0 or 1 means all flows are captured. The lower the value, the increase in returned flows and the accuracy of derived metrics. By default, eBPF sampling is set to a value of 50, so 1 flow every 50 is sampled. Note that more sampled flows also means more storage needed. It is recommend to start with default values and refine empirically, to determine which setting your cluster can manage. -<3> The optional specifications `spec.processor.logTypes`, `spec.processor.conversationHeartbeatInterval`, and `spec.processor.conversationEndTimeout` can be set to enable conversation tracking. When enabled, conversation events are queryable in the web console. The values for `spec.processor.logTypes` are as follows: `FLOWS` `CONVERSATIONS`, `ENDED_CONVERSATIONS`, or `ALL`. Storage requirements are highest for `ALL` and lowest for `ENDED_CONVERSATIONS`. -<4> The Loki specification, `spec.loki`, specifies the Loki client. The default values match the Loki install paths mentioned in the Installing the {loki-op} section. If you used another installation method for Loki, specify the appropriate client information for your install. -<5> The original certificates are copied to the Network Observability instance namespace and watched for updates. When not provided, the namespace defaults to be the same as "spec.namespace". If you chose to install Loki in a different namespace, you must specify it in the `spec.loki.tls.caCert.namespace` field. Similarly, the `spec.exporters.kafka.tls.caCert.namespace` field is available for Kafka installed in a different namespace. +<3> The Processor specification `spec.processor.` can be set to enable conversation tracking. When enabled, conversation events are queryable in the web console. The `spec.processor.logTypes` value is `Flows`. The `spec.processor.advanced` values are `Conversations`, `EndedConversations`, or `ALL`. Storage requirements are highest for `All` and lowest for `EndedConversations`. +<4> The Loki specification, `spec.loki`, specifies the Loki client. The default values match the Loki install paths mentioned in the Installing the Loki Operator section. If you used another installation method for Loki, specify the appropriate client information for your install. +<5> The `LokiStack` mode automatically sets a few configurations: `querierUrl`, `ingesterUrl` and `statusUrl`, `tenantID`, and corresponding TLS configuration. Cluster roles and a cluster role binding are created for reading and writing logs to Loki. And `authToken` is set to `Forward`. You can set these manually using the `Manual` mode. <6> The `spec.quickFilters` specification defines filters that show up in the web console. The `Application` filter keys,`src_namespace` and `dst_namespace`, are negated (`!`), so the `Application` filter shows all traffic that _does not_ originate from, or have a destination to, any `openshift-` or `netobserv` namespaces. For more information, see Configuring quick filters below. diff --git a/modules/network-observability-flows-format.adoc b/modules/network-observability-flows-format.adoc index 46217a6e0b..47367884ea 100644 --- a/modules/network-observability-flows-format.adoc +++ b/modules/network-observability-flows-format.adoc @@ -3,471 +3,277 @@ [id="network-observability-flows-format_{context}"] = Network Flows format reference -This is the specification of the network flows format, used both internally and when exporting flows to Kafka. - -The document is organized in two main categories: _Labels_ and regular _Fields_. This distinction only matters when querying Loki. This is because _Labels_, unlike _Fields_, must be used in link:https://grafana.com/docs/loki/latest/logql/log_queries/#log-stream-selector[stream selectors]. - -If you are reading this specification as a reference for the Kafka export feature, you must treat all _Labels_ and _Fields_ as regular fields and ignore any distinctions between them that are specific to Loki. - - -== Labels - - -SrcK8S_Namespace:: - -• `Optional` *SrcK8S_Namespace*: `string` - -Source namespace - -''' - -DstK8S_Namespace:: - -• `Optional` *DstK8S_Namespace*: `string` - -Destination namespace - -''' - -SrcK8S_OwnerName:: - -• `Optional` *SrcK8S_OwnerName*: `string` - -Source owner, such as Deployment, StatefulSet, etc. - -''' - -DstK8S_OwnerName:: - -• `Optional` *DstK8S_OwnerName*: `string` - -Destination owner, such as Deployment, StatefulSet, etc. - -''' - -FlowDirection:: - -• *FlowDirection*: `FlowDirection` (see the following section, Enumeration: FlowDirection) - -Flow direction from the node observation point - -''' - -_RecordType:: - -• `Optional` *_RecordType*: `RecordType` - -Type of record: 'flowLog' for regular flow logs, or 'allConnections', -'newConnection', 'heartbeat', 'endConnection' for conversation tracking - - -== Fields - - -SrcAddr:: - -• *SrcAddr*: `string` - -Source IP address (ipv4 or ipv6) - -''' - -DstAddr:: - -• *DstAddr*: `string` - -Destination IP address (ipv4 or ipv6) - -''' - -SrcMac:: - -• *SrcMac*: `string` - -Source MAC address - -''' - -DstMac:: - -• *DstMac*: `string` - -Destination MAC address - -''' - -SrcK8S_Name:: - -• `Optional` *SrcK8S_Name*: `string` - -Name of the source matched Kubernetes object, such as Pod name, Service name, etc. - -''' - -DstK8S_Name:: - -• `Optional` *DstK8S_Name*: `string` - -Name of the destination matched Kubernetes object, such as Pod name, Service name, etc. - -''' - -SrcK8S_Type:: - -• `Optional` *SrcK8S_Type*: `string` - -Kind of the source matched Kubernetes object, such as Pod, Service, etc. - -''' - -DstK8S_Type:: - -• `Optional` *DstK8S_Type*: `string` - -Kind of the destination matched Kubernetes object, such as Pod name, Service name, etc. - -''' - -SrcPort:: - -• `Optional` *SrcPort*: `number` - -Source port - -''' - -DstPort:: - -• `Optional` *DstPort*: `number` - -Destination port - -''' - -SrcK8S_OwnerType:: - -• `Optional` *SrcK8S_OwnerType*: `string` - -Kind of the source Kubernetes owner, such as Deployment, StatefulSet, etc. - -''' - -DstK8S_OwnerType:: - -• `Optional` *DstK8S_OwnerType*: `string` - -Kind of the destination Kubernetes owner, such as Deployment, StatefulSet, etc. - -''' - -SrcK8S_HostIP:: - -• `Optional` *SrcK8S_HostIP*: `string` - -Source node IP - -''' - -DstK8S_HostIP:: - -• `Optional` *DstK8S_HostIP*: `string` - -Destination node IP - -''' - -SrcK8S_HostName:: - -• `Optional` *SrcK8S_HostName*: `string` - -Source node name - -''' - -DstK8S_HostName:: - -• `Optional` *DstK8S_HostName*: `string` - -Destination node name - -''' - -Proto:: - -• *Proto*: `number` - -L4 protocol - -''' - -Interface:: - -• `Optional` *Interface*: `string` - -Network interface - -''' - -IfDirection:: - -• `Optional` *IfDirection*: `InterfaceDirection` (see the following section, Enumeration: InterfaceDirection) - -Flow direction from the network interface observation point - -''' - -Flags:: - -• `Optional` *Flags*: `number` - -TCP flags - -''' - -Packets:: - -• `Optional` *Packets*: `number` - -Number of packets - -''' - -Packets_AB:: - -• `Optional` *Packets_AB*: `number` - -In conversation tracking, A to B packets counter per conversation - -''' - -Packets_BA:: - -• `Optional` *Packets_BA*: `number` - -In conversation tracking, B to A packets counter per conversation - -''' - -Bytes:: - -• `Optional` *Bytes*: `number` - -Number of bytes - -''' - -Bytes_AB:: - -• `Optional` *Bytes_AB*: `number` - -In conversation tracking, A to B bytes counter per conversation - -''' - -Bytes_BA:: - -• `Optional` *Bytes_BA*: `number` - -In conversation tracking, B to A bytes counter per conversation - -''' - -IcmpType:: - -• `Optional` *IcmpType*: `number` - -ICMP type - -''' - -IcmpCode:: - -• `Optional` *IcmpCode*: `number` - -ICMP code - -''' - -PktDropLatestState:: - -• `Optional` *PktDropLatestState*: `string` - -Pkt TCP state for drops - -''' - -PktDropLatestDropCause:: - -• `Optional` *PktDropLatestDropCause*: `string` - -Pkt cause for drops - -''' - -PktDropLatestFlags:: - -• `Optional` *PktDropLatestFlags*: `number` - -Pkt TCP flags for drops - -''' - -PktDropPackets:: - -• `Optional` *PktDropPackets*: `number` - -Number of packets dropped by the kernel - -''' - -PktDropPackets_AB:: - -• `Optional` *PktDropPackets_AB*: `number` - -In conversation tracking, A to B packets dropped counter per conversation - -''' - -PktDropPackets_BA:: - -• `Optional` *PktDropPackets_BA*: `number` - -In conversation tracking, B to A packets dropped counter per conversation - -''' - -PktDropBytes:: - -• `Optional` *PktDropBytes*: `number` - -Number of bytes dropped by the kernel - -''' - -PktDropBytes_AB:: - -• `Optional` *PktDropBytes_AB*: `number` - -In conversation tracking, A to B bytes dropped counter per conversation - -''' - -PktDropBytes_BA:: - -• `Optional` *PktDropBytes_BA*: `number` - -In conversation tracking, B to A bytes dropped counter per conversation - -''' - -DnsId:: - -• `Optional` *DnsId*: `number` - -DNS record id - -''' - -DnsFlags:: - -• `Optional` *DnsFlags*: `number` - -DNS flags for DNS record - -''' - -DnsFlagsResponseCode:: - -• `Optional` *DnsFlagsResponseCode*: `string` - -Parsed DNS header RCODEs name - -''' - -DnsLatencyMs:: - -• `Optional` *DnsLatencyMs*: `number` - -Calculated time between response and request, in milliseconds - -''' - -TimeFlowStartMs:: - -• *TimeFlowStartMs*: `number` - -Start timestamp of this flow, in milliseconds - -''' - -TimeFlowEndMs:: - -• *TimeFlowEndMs*: `number` - -End timestamp of this flow, in milliseconds - -''' - -TimeReceived:: - -• *TimeReceived*: `number` - -Timestamp when this flow was received and processed by the flow collector, in seconds - -''' - -TimeFlowRttNs:: - -• `Optional` *TimeFlowRttNs*: `number` - -Flow Round Trip Time (RTT) in nanoseconds - -''' - -_HashId:: - -• `Optional` *_HashId*: `string` - -In conversation tracking, the conversation identifier - -''' - -_IsFirst:: - -• `Optional` *_IsFirst*: `string` - -In conversation tracking, a flag identifying the first flow - -''' - -numFlowLogs:: - -• `Optional` *numFlowLogs*: `number` - -In conversation tracking, a counter of flow logs per conversation - - -== Enumeration: FlowDirection - - -Ingress:: - -• *Ingress* = `"0"` - -Incoming traffic, from the node observation point - -''' - -Egress:: - -• *Egress* = `"1"` - -Outgoing traffic, from the node observation point - -''' - -Inner:: - -• *Inner* = `"2"` - -Inner traffic, with the same source and destination node \ No newline at end of file +This is the specification of the network flows format. That format is used when a Kafka exporter is configured, for Prometheus metrics labels as well as internally for the Loki store. + +The "Filter ID" column shows which related name to use when defining Quick Filters (see `spec.consolePlugin.quickFilters` in the `FlowCollector` specification). + +The "Loki label" column is useful when querying Loki directly: label fields need to be selected using link:https://grafana.com/docs/loki/latest/logql/log_queries/#log-stream-selector[stream selectors]. + + +[cols="1,1,3,1,1",options="header"] +|=== +| Name | Type | Description | Filter ID | Loki label +| `Bytes` +| number +| Number of bytes +| n/a +| no +| `DnsErrno` +| number +| Error number returned from DNS tracker ebpf hook function +| `dns_errno` +| no +| `DnsFlags` +| number +| DNS flags for DNS record +| n/a +| no +| `DnsFlagsResponseCode` +| string +| Parsed DNS header RCODEs name +| `dns_flag_response_code` +| no +| `DnsId` +| number +| DNS record id +| `dns_id` +| no +| `DnsLatencyMs` +| number +| Time between a DNS request and response, in milliseconds +| `dns_latency` +| no +| `Dscp` +| number +| Differentiated Services Code Point (DSCP) value +| `dscp` +| no +| `DstAddr` +| string +| Destination IP address (ipv4 or ipv6) +| `dst_address` +| no +| `DstK8S_HostIP` +| string +| Destination node IP +| `dst_host_address` +| no +| `DstK8S_HostName` +| string +| Destination node name +| `dst_host_name` +| no +| `DstK8S_Name` +| string +| Name of the destination Kubernetes object, such as Pod name, Service name or Node name. +| `dst_name` +| no +| `DstK8S_Namespace` +| string +| Destination namespace +| `dst_namespace` +| yes +| `DstK8S_OwnerName` +| string +| Name of the destination owner, such as Deployment name, StatefulSet name, etc. +| `dst_owner_name` +| yes +| `DstK8S_OwnerType` +| string +| Kind of the destination owner, such as Deployment, StatefulSet, etc. +| `dst_kind` +| no +| `DstK8S_Type` +| string +| Kind of the destination Kubernetes object, such as Pod, Service or Node. +| `dst_kind` +| yes +| `DstK8S_Zone` +| string +| Destination availability zone +| `dst_zone` +| yes +| `DstMac` +| string +| Destination MAC address +| `dst_mac` +| no +| `DstPort` +| number +| Destination port +| `dst_port` +| no +| `Duplicate` +| boolean +| Indicates if this flow was also captured from another interface on the same host +| n/a +| yes +| `Flags` +| number +| Logical OR combination of unique TCP flags comprised in the flow, as per RFC-9293, with additional custom flags to represent the following per-packet combinations: + +- SYN+ACK (0x100) + +- FIN+ACK (0x200) + +- RST+ACK (0x400) +| n/a +| no +| `FlowDirection` +| number +| Flow direction from the node observation point. Can be one of: + +- 0: Ingress (incoming traffic, from the node observation point) + +- 1: Egress (outgoing traffic, from the node observation point) + +- 2: Inner (with the same source and destination node) +| `direction` +| yes +| `IcmpCode` +| number +| ICMP code +| `icmp_code` +| no +| `IcmpType` +| number +| ICMP type +| `icmp_type` +| no +| `IfDirection` +| number +| Flow direction from the network interface observation point. Can be one of: + +- 0: Ingress (interface incoming traffic) + +- 1: Egress (interface outgoing traffic) +| n/a +| no +| `Interface` +| string +| Network interface +| `interface` +| no +| `K8S_ClusterName` +| string +| Cluster name or identifier +| `cluster_name` +| yes +| `K8S_FlowLayer` +| string +| Flow layer: 'app' or 'infra' +| `flow_layer` +| no +| `Packets` +| number +| Number of packets +| n/a +| no +| `PktDropBytes` +| number +| Number of bytes dropped by the kernel +| n/a +| no +| `PktDropLatestDropCause` +| string +| Latest drop cause +| `pkt_drop_cause` +| no +| `PktDropLatestFlags` +| number +| TCP flags on last dropped packet +| n/a +| no +| `PktDropLatestState` +| string +| TCP state on last dropped packet +| `pkt_drop_state` +| no +| `PktDropPackets` +| number +| Number of packets dropped by the kernel +| n/a +| no +| `Proto` +| number +| L4 protocol +| `protocol` +| no +| `SrcAddr` +| string +| Source IP address (ipv4 or ipv6) +| `src_address` +| no +| `SrcK8S_HostIP` +| string +| Source node IP +| `src_host_address` +| no +| `SrcK8S_HostName` +| string +| Source node name +| `src_host_name` +| no +| `SrcK8S_Name` +| string +| Name of the source Kubernetes object, such as Pod name, Service name or Node name. +| `src_name` +| no +| `SrcK8S_Namespace` +| string +| Source namespace +| `src_namespace` +| yes +| `SrcK8S_OwnerName` +| string +| Name of the source owner, such as Deployment name, StatefulSet name, etc. +| `src_owner_name` +| yes +| `SrcK8S_OwnerType` +| string +| Kind of the source owner, such as Deployment, StatefulSet, etc. +| `src_kind` +| no +| `SrcK8S_Type` +| string +| Kind of the source Kubernetes object, such as Pod, Service or Node. +| `src_kind` +| yes +| `SrcK8S_Zone` +| string +| Source availability zone +| `src_zone` +| yes +| `SrcMac` +| string +| Source MAC address +| `src_mac` +| no +| `SrcPort` +| number +| Source port +| `src_port` +| no +| `TimeFlowEndMs` +| number +| End timestamp of this flow, in milliseconds +| n/a +| no +| `TimeFlowRttNs` +| number +| TCP Smoothed Round Trip Time (SRTT), in nanoseconds +| `time_flow_rtt` +| no +| `TimeFlowStartMs` +| number +| Start timestamp of this flow, in milliseconds +| n/a +| no +| `TimeReceived` +| number +| Timestamp when this flow was received and processed by the flow collector, in seconds +| n/a +| no +| `_HashId` +| string +| In conversation tracking, the conversation identifier +| `id` +| no +| `_RecordType` +| string +| Type of record: 'flowLog' for regular flow logs, or 'newConnection', 'heartbeat', 'endConnection' for conversation tracking +| `type` +| yes +|=== \ No newline at end of file diff --git a/modules/network-observability-includelist-example.adoc b/modules/network-observability-includelist-example.adoc new file mode 100644 index 0000000000..7b941c5de6 --- /dev/null +++ b/modules/network-observability-includelist-example.adoc @@ -0,0 +1,42 @@ +// Module included in the following assemblies: +// * network_observability/metrics-alerts-dashboards.adoc + +:_mod-docs-content-type: PROCEDURE +[id="network-observability-netobserv-dashboard-high-traffic-alert_{context}"] += Creating alerts +You can create custom Prometheus rules for the Netobserv dashboard metrics to trigger alerts when some defined conditions are met. + +.Prerequisites + +* You have access to the cluster as a user with the cluster-admin role or with view permissions for all projects. +* You have the Network Observability Operator installed. + +.Procedure + +. Create a YAML file by clicking the import icon, *+*. +. Add an alerting rule configuration to the YAML file. In the YAML sample that follows, an alert is created for when the cluster ingress traffic reaches a given threshold of 10 MBps per destination workload. ++ +[source,yaml] +---- +apiVersion: monitoring.coreos.com/v1 +kind: PrometheusRule +metadata: + name: netobserv-alerts + namespace: openshift-netobserv-operator +spec: + groups: + - name: NetObservAlerts + rules: + - alert: NetObservIncomingBandwidth + annotations: + message: |- + {{ $labels.job }}: incoming traffic exceeding 10 MBps for 30s on {{ $labels.DstK8S_OwnerType }} {{ $labels.DstK8S_OwnerName }} ({{ $labels.DstK8S_Namespace }}). + summary: "High incoming traffic." + expr: sum(rate(netobserv_workload_ingress_bytes_total {SrcK8S_Namespace="openshift-ingress"}[1m])) by (job, DstK8S_Namespace, DstK8S_OwnerName, DstK8S_OwnerType) > 10000000 <1> + for: 30s + labels: + severity: warning +---- +<1> The `netobserv_workload_ingress_bytes_total` metric is enabled by default in `spec.processor.metrics.includeList`. + +. Click *Create* to apply the configuration file to the cluster. \ No newline at end of file diff --git a/modules/network-observability-metrics.adoc b/modules/network-observability-metrics.adoc new file mode 100644 index 0000000000..9ecab20728 --- /dev/null +++ b/modules/network-observability-metrics.adoc @@ -0,0 +1,55 @@ +// Module included in the following assemblies: +// +// network_observability/metrics-alerts-dashboards.adoc + +:_mod-docs-content-type: REFERENCE +[id="network-observability-metrics_{context}"] += Network Observability metrics +Metrics generated by the `flowlogs-pipeline` are configurable in the `spec.processor.metrics.includeList` of the `FlowCollector` custom resource to add or remove metrics. + +You can also create alerts by using the `includeList` metrics in Prometheus rules, as shown in the example "Creating alerts". + +When looking for these metrics in Prometheus, such as in the Console through Observe -> Metrics, or when defining alerts, all the metrics names are prefixed with `netobserv_. For example, `netobserv_namespace_flows_total. Available metrics names are as follows. + +== includeList metrics names +Names followed by an asterisk `*` are enabled by default. + +* `namespace_egress_bytes_total` +* `namespace_egress_packets_total` +* `namespace_ingress_bytes_total` +* `namespace_ingress_packets_total` +* `namespace_flows_total` * +* `node_egress_bytes_total` +* `node_egress_packets_total` +* `node_ingress_bytes_total` * +* `node_ingress_packets_total` +* `node_flows_total` +* `workload_egress_bytes_total` +* `workload_egress_packets_total` +* `workload_ingress_bytes_total` * +* `workload_ingress_packets_total` +* `workload_flows_total` + +=== PacketDrop metrics names +When the `PacketDrop` feature is enabled in `spec.agent.ebpf.features` (with `privileged` mode), the following additional metrics are available: + +* `namespace_drop_bytes_total` +* `namespace_drop_packets_total` * +* `node_drop_bytes_total` +* `node_drop_packets_total` +* `workload_drop_bytes_total` +* `workload_drop_packets_total` + +=== DNS metrics names +When the `DNSTracking` feature is enabled in `spec.agent.ebpf.features`, the following additional metrics are available: + +* `namespace_dns_latency_seconds` * +* `node_dns_latency_seconds` +* `workload_dns_latency_seconds` + +=== FlowRTT metrics names +When the `FlowRTT` feature is enabled in `spec.agent.ebpf.features`, the following additional metrics are available: + +* `namespace_rtt_seconds` * +* `node_rtt_seconds` +* `workload_rtt_seconds` \ No newline at end of file diff --git a/modules/network-observability-multitenancy.adoc b/modules/network-observability-multitenancy.adoc index 2db482202e..ef462e6952 100644 --- a/modules/network-observability-multitenancy.adoc +++ b/modules/network-observability-multitenancy.adoc @@ -1,6 +1,6 @@ // Module included in the following assemblies: // -// network_observability/observing-network-traffic.adoc +// network_observability/installing-operators.adoc :_mod-docs-content-type: PROCEDURE [id="network-observability-multi-tenancy{context}"] @@ -8,8 +8,7 @@ Multi-tenancy in the Network Observability Operator allows and restricts individual user access, or group access, to the flows stored in Loki. Access is enabled for project admins. Project admins who have limited access to some namespaces can access flows for only those namespaces. .Prerequisite -* You have installed link:https://catalog.redhat.com/software/containers/openshift-logging/loki-rhel8-operator/622b46bcae289285d6fcda39[{loki-op} version 5.7] -* The `FlowCollector` `spec.loki.authToken` configuration must be set to `FORWARD`. +* You have installed at least link:https://catalog.redhat.com/software/containers/openshift-logging/loki-rhel8-operator/622b46bcae289285d6fcda39[Loki Operator version 5.7] * You must be logged in as a project administrator .Procedure diff --git a/modules/network-observability-operator-install.adoc b/modules/network-observability-operator-install.adoc index f1aefc4fc4..b680d35369 100644 --- a/modules/network-observability-operator-install.adoc +++ b/modules/network-observability-operator-install.adoc @@ -22,7 +22,7 @@ The actual memory consumption of the Operator depends on your cluster size and t [NOTE] ==== -This documentation assumes that your `LokiStack` instance name is `loki`. Using a different name requires additional configuration. +Additionally, this installation example uses the `netobserv` namespace, which is used across all components. You can optionally use a different namespace. ==== .Procedure @@ -34,25 +34,14 @@ This documentation assumes that your `LokiStack` instance name is `loki`. Using . Navigate to the *Flow Collector* tab, and click *Create FlowCollector*. Make the following selections in the form view: .. *spec.agent.ebpf.Sampling*: Specify a sampling size for flows. Lower sampling sizes will have higher impact on resource utilization. For more information, see the "FlowCollector API reference", `spec.agent.ebpf`. .. If you are using Loki, set the following specifications: -... *spec.loki.enable*: Select the check box to enable storing flows in Loki. -... *spec.loki.url*: Since authentication is specified separately, this URL needs to be updated to `https://loki-gateway-http.netobserv.svc:8080/api/logs/v1/network`. The first part of the URL, "loki", must match the name of your `LokiStack`. -... *spec.loki.authToken*: Select the `FORWARD` value. -... *spec.loki.statusUrl*: Set this to `https://loki-query-frontend-http.netobserv.svc:3100/`. The first part of the URL, "loki", must match the name of your `LokiStack`. -... *spec.loki.tls.enable*: Select the checkbox to enable TLS. -... *spec.loki.statusTls*: The `enable` value is false by default. -+ -For the first part of the certificate reference names: `loki-gateway-ca-bundle`, `loki-ca-bundle`, and `loki-query-frontend-http`,`loki`, must match the name of your `LokiStack`. +... *spec.loki.mode*: Set this to the `LokiStack` mode, which automatically sets URLs, TLS, cluster roles and a cluster role binding, as well as the `authToken` value. Alternatively, the `Manual` mode allows more control over configuration of these settings. +... *spec.loki.lokistack.name*: Set this to the name of your `LokiStack` resource. In this documentation, `loki` is used. .. Optional: If you are in a large-scale environment, consider configuring the `FlowCollector` with Kafka for forwarding data in a more resilient, scalable way. See "Configuring the Flow Collector resource with Kafka storage" in the "Important Flow Collector configuration considerations" section. .. Optional: Configure other optional settings before the next step of creating the `FlowCollector`. For example, if you choose not to use Loki, then you can configure exporting flows to Kafka or IPFIX. See "Export enriched network flow data to Kafka and IPFIX" and more in the "Important Flow Collector configuration considerations" section. -.. Click *Create*. +. Click *Create*. .Verification To confirm this was successful, when you navigate to *Observe* you should see *Network Traffic* listed in the options. -In the absence of *Application Traffic* within the {product-title} cluster, default filters might show that there are "No results", which results in no visual flow. Beside the filter selections, select *Clear all filters* to see the flow. - -[IMPORTANT] -==== -If you installed Loki using the {loki-op}, it is advised not to use `querierUrl`, as it can break the console access to Loki. If you installed Loki using another type of Loki installation, this does not apply. -==== +In the absence of *Application Traffic* within the {product-title} cluster, default filters might show that there are "No results", which results in no visual flow. Beside the filter selections, select *Clear all filters* to see the flow. \ No newline at end of file diff --git a/modules/network-observability-packet-drops.adoc b/modules/network-observability-packet-drops.adoc index 39db08507e..c56c7e108d 100644 --- a/modules/network-observability-packet-drops.adoc +++ b/modules/network-observability-packet-drops.adoc @@ -22,15 +22,15 @@ CPU and memory usage increases when this feature is enabled. .Example `FlowCollector` configuration [source, yaml] ---- -apiVersion: flows.netobserv.io/v1alpha1 +apiVersion: flows.netobserv.io/v1beta2 kind: FlowCollector metadata: name: cluster spec: namespace: netobserv - deploymentModel: DIRECT + deploymentModel: Direct agent: - type: EBPF + type: eBPF ebpf: features: - PacketDrop <1> diff --git a/modules/network-observability-pktdrop-overview.adoc b/modules/network-observability-pktdrop-overview.adoc index 3fab301e9a..da5bfdeb29 100644 --- a/modules/network-observability-pktdrop-overview.adoc +++ b/modules/network-observability-pktdrop-overview.adoc @@ -13,13 +13,19 @@ You can configure graphical representation of network flow records with packet l * Performance optimization: With a clearer picture of packet drops, you can take steps to optimize network performance, such as adjust buffer sizes, reconfigure routing paths, or implement Quality of Service (QoS) measures. -When packet drop tracking is enabled, you can see the following metrics represented in a chart in the *Overview*. +When packet drop tracking is enabled, you can see the following panels in the *Overview* by default: -* Top X flow dropped rates stacked -* Total dropped rate -* Top X dropped state -* Top X dropped cause -* Top X flow dropped rates stacked with total +* *Top X packet dropped state stacked with total* +* *Top X packet dropped cause stacked with total* +* *Top X average dropped packets rates* +* *Top X dropped packets rates stacked with total* + +Other packet drop panels are available to add in *Manage panels*: + +* *Top X average dropped bytes rates* +* *Top X dropped bytes rates stacked with total* + +== Types of packet drops Two kinds of packet drops are detected by Network Observability: host drops and OVS drops. Host drops are prefixed with `SKB_DROP` and OVS drops are prefixed with `OVS_DROP`. Dropped flows are shown in the side panel of the *Traffic flows* table along with a link to a description of each drop type. Examples of host drop reasons are as follows: diff --git a/modules/network-observability-rate-limit-alert.adoc b/modules/network-observability-rate-limit-alert.adoc index c0e7725e98..4fe20c6422 100644 --- a/modules/network-observability-rate-limit-alert.adoc +++ b/modules/network-observability-rate-limit-alert.adoc @@ -1,19 +1,28 @@ -// -// network_observability/configuring-operator.adoc +// Module included in the following assemblies: +// * network_observability/network-observability-operator-monitoring.adoc -:_mod-docs-content-type: CONCEPT +:_mod-docs-content-type: PROCEDURE [id="network-observability-netobserv-dashboard-rate-limit-alerts_{context}"] = Creating Loki rate limit alerts for the NetObserv dashboard -You can create custom rules for the *Netobserv* dashboard metrics to trigger alerts when Loki rate limits have been reached. +You can create custom Prometheus rules for the *Netobserv* dashboard metrics to trigger alerts when Loki rate limits have been reached. -An example of an alerting rule configuration YAML file is as follows: +.Prerequisites + +* You have access to the cluster as a user with the cluster-admin role or with view permissions for all projects. +* You have the Network Observability Operator installed. + +.Procedure + +. Create a YAML file by clicking the import icon, *+*. +. Add an alerting rule configuration to the YAML file. In the YAML sample that follows, an alert is created for when Loki rate limits have been reached: ++ [source,yaml] ---- apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: name: loki-alerts - namespace: openshift-operators-redhat + namespace: openshift-netobserv-operator spec: groups: - name: LokiRateLimitAlerts @@ -27,4 +36,6 @@ spec: for: 10s labels: severity: warning ----- \ No newline at end of file +---- + +. Click *Create* to apply the configuration file to the cluster. diff --git a/modules/network-observability-resources-table.adoc b/modules/network-observability-resources-table.adoc index 6667c08cff..3d9f64ee27 100644 --- a/modules/network-observability-resources-table.adoc +++ b/modules/network-observability-resources-table.adoc @@ -17,10 +17,10 @@ The examples outlined in the table demonstrate scenarios that are tailored to sp | | Extra small (10 nodes) | Small (25 nodes) | Medium (65 nodes) ^[2]^ | Large (120 nodes) ^[2]^ | *Worker Node vCPU and memory* | 4 vCPUs\| 16GiB mem ^[1]^ | 16 vCPUs\| 64GiB mem ^[1]^ | 16 vCPUs\| 64GiB mem ^[1]^ |16 vCPUs\| 64GiB Mem ^[1]^ | *LokiStack size* | `1x.extra-small` | `1x.small` | `1x.small` | `1x.medium` -| *Network Observability controller memory limit* | 400Mi (default) | 400Mi (default) | 400Mi (default) | 800Mi +| *Network Observability controller memory limit* | 400Mi (default) | 400Mi (default) | 400Mi (default) | 400Mi (default) | *eBPF sampling rate* | 50 (default) | 50 (default) | 50 (default) | 50 (default) -| *eBPF memory limit* | 800Mi (default) | 800Mi (default) | 2000Mi | 800Mi (default) -| *FLP memory limit* | 800Mi (default) | 800Mi (default) | 800Mi (default) | 800Mi (default) +| *eBPF memory limit* | 800Mi (default) | 800Mi (default) | 800Mi (default) | 1600Mi +| *FLP memory limit* | 800Mi (default) | 800Mi (default) | 800Mi (default) | 800Mi (default) | *FLP Kafka partitions* | N/A | 48 | 48 | 48 | *Kafka consumer replicas* | N/A | 24 | 24 | 24 | *Kafka brokers* | N/A | 3 (default) | 3 (default) | 3 (default) diff --git a/modules/network-observability-viewing-alerts.adoc b/modules/network-observability-viewing-alerts.adoc index 1e1ab80514..9418f8acd9 100644 --- a/modules/network-observability-viewing-alerts.adoc +++ b/modules/network-observability-viewing-alerts.adoc @@ -11,6 +11,17 @@ You can access metrics about health and resource usage of the Network Observabil * The `NetObservLokiError` alert occurs if the `flowlogs-pipeline` workload is dropping flows because of Loki errors, such as if the Loki ingestion rate limit has been reached. * The `NetObservNoFlows` alert occurs if no flows are ingested for a certain amount of time. +You can also view metrics about the health of the Operator in the following categories: ++ +* *Flows* +* *Flows Overhead* +* *Top flow rates per source and destination nodes* +* *Top flow rates per source and destination namespaces* +* *Top flow rates per source and destination workloads* +* *Agents* +* *Processor* +* *Operator* + .Prerequisites * You have the Network Observability Operator installed. @@ -20,4 +31,4 @@ You can access metrics about health and resource usage of the Network Observabil . From the *Administrator* perspective in the web console, navigate to *Observe* → *Dashboards*. . From the *Dashboards* dropdown, select *Netobserv/Health*. -Metrics about the health of the Operator are displayed on the page. \ No newline at end of file +. View the metrics about the health of the Operator that are displayed on the page. \ No newline at end of file diff --git a/modules/network-observability-viewing-dashboards.adoc b/modules/network-observability-viewing-dashboards.adoc new file mode 100644 index 0000000000..512f5218d8 --- /dev/null +++ b/modules/network-observability-viewing-dashboards.adoc @@ -0,0 +1,29 @@ +// Module included in the following assemblies: +// +// network_observability/network-observability-overview.adoc + +:_mod-docs-content-type: PROCEDURE +[id="network-observability-viewing-dashboards_{context}"] += Viewing Network Observability metrics dashboards +On the *Overview* tab in the {product-title} console, you can view the overall aggregated metrics of the network traffic flow on the cluster. You can choose to display the information by node, namespace, owner, pod, and service. You can also use filters and display options to further refine the metrics. + +.Procedure +. In the web console *Observe* -> *Dashboards*, select the *Netobserv* dashboard. +. View network traffic metrics in the following categories, with each having the subset per node, namespace, source, and destination: + + * *Byte rates* + * *Packet drops* + * *DNS* + * *RTT* + +. Select the *Netobserv/Health* dashboard. +. View metrics about the health of the Operator in the following categories, with each having the subset per node, namespace, source, and destination. + +* *Flows* +* *Flows Overhead* +* *Flow rates* +* *Agents* +* *Processor* +* *Operator* + +*Infrastructure* and *Application* metrics are shown in a split-view for namespace and workloads. \ No newline at end of file diff --git a/modules/network-observability-working-with-conversations.adoc b/modules/network-observability-working-with-conversations.adoc index bad355acdf..81ce32e470 100644 --- a/modules/network-observability-working-with-conversations.adoc +++ b/modules/network-observability-working-with-conversations.adoc @@ -19,27 +19,27 @@ As an administrator, you can you can group network flows that are part of the sa . Select *cluster* then select the *YAML* tab. . Configure the `FlowCollector` custom resource so that `spec.processor.logTypes`, `conversationEndTimeout`, and `conversationHeartbeatInterval` parameters are set according to your observation needs. A sample configuration is as follows: + -[id="network-observability-flowcollector-configuring-conversations_{context}"] .Configure `FlowCollector` for conversation tracking [source, yaml] ---- -apiVersion: flows.netobserv.io/v1alpha1 +apiVersion: flows.netobserv.io/v1beta2 kind: FlowCollector metadata: name: cluster spec: processor: - conversationEndTimeout: 10s <1> - logTypes: FLOWS <2> - conversationHeartbeatInterval: 30s <3> + logTypes: Flows <1> + advanced: + conversationEndTimeout: 10s <2> + conversationHeartbeatInterval: 30s <3> ---- -<1> The *Conversation end* event represents the point when the `conversationEndTimeout` is reached or the TCP flag is intercepted. -<2> When `logTypes` is set to `FLOWS`, only the *Flow* event is exported. If you set the value to `ALL`, both conversation and flow events are exported and visible in the *Network Traffic* page. To focus only on conversation events, you can specify `CONVERSATIONS` which exports the *Conversation start*, *Conversation tick* and *Conversation end* events; or `ENDED_CONVERSATIONS` exports only the *Conversation end* events. Storage requirements are highest for `ALL` and lowest for `ENDED_CONVERSATIONS`. +<1> When `logTypes` is set to `Flows`, only the *Flow* event is exported. If you set the value to `All`, both conversation and flow events are exported and visible in the *Network Traffic* page. To focus only on conversation events, you can specify `Conversations` which exports the *Conversation start*, *Conversation tick* and *Conversation end* events; or `EndedConversations` exports only the *Conversation end* events. Storage requirements are highest for `All` and lowest for `EndedConversations`. +<2> The *Conversation end* event represents the point when the `conversationEndTimeout` is reached or the TCP flag is intercepted. <3> The *Conversation tick* event represents each specified interval defined in the `FlowCollector` `conversationHeartbeatInterval` parameter while the network connection is active. + [NOTE] ==== -If you update the `logType` option, the flows from the previous selection do not clear from the console plugin. For example, if you initially set `logType` to `CONVERSATIONS` for a span of time until 10 AM and then move to `ENDED_CONVERSATIONS`, the console plugin shows all conversation events before 10 AM and only ended conversations after 10 AM. +If you update the `logType` option, the flows from the previous selection do not clear from the console plugin. For example, if you initially set `logType` to `Conversations` for a span of time until 10 AM and then move to `EndedConversations`, the console plugin shows all conversation events before 10 AM and only ended conversations after 10 AM. ==== . Refresh the *Network Traffic* page on the *Traffic flows* tab. Notice there are two new columns, *Event/Type* and *Conversation Id*. All the *Event/Type* fields are `Flow` when *Flow* is the selected query option. . Select *Query Options* and choose the *Log Type*, *Conversation*. Now the *Event/Type* shows all of the desired conversation events. diff --git a/modules/network-observability-working-with-zones.adoc b/modules/network-observability-working-with-zones.adoc new file mode 100644 index 0000000000..374a5cfe2a --- /dev/null +++ b/modules/network-observability-working-with-zones.adoc @@ -0,0 +1,35 @@ +// Module included in the following assemblies: +// +// network_observability/observing-network-traffic.adoc + +:_mod-docs-content-type: PROCEDURE +[id="network-observability-zones{context}"] += Working with availability zones +You can configure the `FlowCollector` to collect information about the cluster availability zones. This allows you to enrich network flow data with the link:https://kubernetes.io/docs/reference/labels-annotations-taints/#topologykubernetesiozone[`topology.kubernetes.io/zone`] label value applied to the nodes. + +.Procedure +. In the web console, go to *Operators* -> *Installed Operators*. +. Under the *Provided APIs* heading for the *NetObserv Operator*, select *Flow Collector*. +. Select *cluster* then select the *YAML* tab. +. Configure the `FlowCollector` custom resource so that the `spec.processor.addZone` parameter is set to `true`. A sample configuration is as follows: ++ +.Configure `FlowCollector` for availability zones collection +[source, yaml] +---- +apiVersion: flows.netobserv.io/v1beta2 +kind: FlowCollector +metadata: + name: cluster +spec: +# ... + processor: + addZone: true +# ... +---- + +.Verification +When you refresh the *Network Traffic* page, the *Overview*, *Traffic Flow*, and *Topology* views display new information about availability zones: + +. In the *Overview* tab, you can see *Zones* as an available *Scope*. +. In *Network Traffic* -> *Traffic flows*, *Zones* are viewable under the SrcK8S_Zone and DstK8S_Zone fields. +. In the *Topology* view, you can set *Zones* as *Scope* or *Group*. \ No newline at end of file diff --git a/modules/troubleshooting-network-observability-loki-empty-ring.adoc b/modules/troubleshooting-network-observability-loki-empty-ring.adoc new file mode 100644 index 0000000000..f5e8ca5d29 --- /dev/null +++ b/modules/troubleshooting-network-observability-loki-empty-ring.adoc @@ -0,0 +1,16 @@ +// Module included in the following assemblies: + +// * networking/network_observability/troubleshooting-network-observability.adoc + +:_mod-docs-content-type: PROCEDURE +[id="network-observability-troubleshooting-loki-empty-ring_{context}"] += Loki empty ring error + +The Loki "empty ring" error results in flows not being stored in Loki and not showing up in the web console. This error might happen in various situations. A single workaround to address them all does not exist. There are some actions you can take to investigate the logs in your Loki pods, and verify that the `LokiStack` is healthy and ready. + +Some of the situations where this error is observed are as follows: + +* After a `LokiStack` is uninstalled and reinstalled in the same namespace, old PVCs are not removed, which can cause this error. +** *Action*: You can try removing the `LokiStack` again, removing the PVC, then reinstalling the `LokiStack`. +* After a certificate rotation, this error can prevent communication with the `flowlogs-pipeline` and `console-plugin` pods. +** *Action*: You can restart the pods to restore the connectivity. \ No newline at end of file diff --git a/network_observability/installing-operators.adoc b/network_observability/installing-operators.adoc index 1573349a08..0b8928a687 100644 --- a/network_observability/installing-operators.adoc +++ b/network_observability/installing-operators.adoc @@ -36,7 +36,6 @@ include::modules/network-observability-lokistack-create.adoc[leveloffset=+2] include::modules/loki-deployment-sizing.adoc[leveloffset=+2] include::modules/network-observability-lokistack-ingestion-query.adoc[leveloffset=+2] -include::modules/network-observability-auth-multi-tenancy.adoc[leveloffset=+2] include::modules/network-observability-multitenancy.adoc[leveloffset=+2] include::modules/network-observability-operator-install.adoc[leveloffset=+1] diff --git a/network_observability/metrics-alerts-dashboards.adoc b/network_observability/metrics-alerts-dashboards.adoc new file mode 100644 index 0000000000..2c880161e2 --- /dev/null +++ b/network_observability/metrics-alerts-dashboards.adoc @@ -0,0 +1,17 @@ +:_mod-docs-content-type: ASSEMBLY +[id="metrics-dashboards-alerts"] += Using metrics with dashboards and alerts +include::_attributes/common-attributes.adoc[] +:context: metrics-dashboards-alerts + +toc::[] + +The Network Observability Operator uses the `flowlogs-pipeline` to generate metrics from flow logs. You can utilize these metrics by setting custom alerts and viewing dashboards. + +include::modules/network-observability-viewing-dashboards.adoc[leveloffset=+1] +include::modules/network-observability-metrics.adoc[leveloffset=+1] +include::modules/network-observability-includelist-example.adoc[leveloffset=+1] + +[role="_additional-resources"] +.Additional resources +* For more information about creating alerts that you can see on the dashboard, see xref:../monitoring/managing-alerts.adoc#creating-alerting-rules-for-user-defined-projects_managing-alerts[Creating alerting rules for user-defined projects]. diff --git a/network_observability/network-observability-network-policy.adoc b/network_observability/network-observability-network-policy.adoc index 5d875bf09e..3cd8e3361c 100644 --- a/network_observability/network-observability-network-policy.adoc +++ b/network_observability/network-observability-network-policy.adoc @@ -6,7 +6,7 @@ include::_attributes/common-attributes.adoc[] toc::[] -As a user with the `admin` role, you can create a network policy for the `netobserv` namespace. +As a user with the `admin` role, you can create a network policy for the `netobserv` namespace to secure inbound access to the Network Observability Operator. include::modules/network-observability-create-network-policy.adoc[leveloffset=+1] include::modules/network-observability-sample-network-policy-YAML.adoc[leveloffset=+1] diff --git a/network_observability/network-observability-operator-release-notes.adoc b/network_observability/network-observability-operator-release-notes.adoc index 5bfa8e3d34..a2c4cdd75a 100644 --- a/network_observability/network-observability-operator-release-notes.adoc +++ b/network_observability/network-observability-operator-release-notes.adoc @@ -12,6 +12,90 @@ The Network Observability Operator enables administrators to observe and analyze These release notes track the development of the Network Observability Operator in the {product-title}. For an overview of the Network Observability Operator, see xref:../network_observability/network-observability-overview.adoc#dependency-network-observability[About Network Observability Operator]. + +[id="network-observability-operator-release-notes-1-5"] +== Network Observability Operator 1.5.0 +The following advisory is available for the Network Observability Operator 1.5.0: + +* link:https://access.redhat.com/errata/RHSA-2024:0853[Network Observability Operator 1.5.0] + +[id="network-observability-operator-1.5.0-features-enhancements"] +=== New features and enhancements + +[id="network-observability-dns-enhancements-1.5"] +==== DNS tracking enhancements +In 1.5, the TCP protocol is now supported in addition to UDP. New dashboards are also added to the *Overview* view of the Network Traffic page. For more information, see xref:../network_observability/observing-network-traffic.adoc#network-observability-dns-overview_nw-observe-network-traffic[Configuring DNS tracking] and xref:../network_observability/observing-network-traffic.adoc#network-observability-dns-tracking_nw-observe-network-traffic[Working with DNS tracking]. + +[id="network-observability-RTT-1.5"] +==== Round-trip time (RTT) +You can use TCP handshake Round-Trip Time (RTT) captured from the `fentry/tcp_rcv_established` Extended Berkeley Packet Filter (eBPF) hookpoint to read smoothed round-trip time (SRTT) and analyze network flows. In the *Overview*, *Network Traffic*, and *Topology* pages in web console, you can monitor network traffic and troubleshoot with RTT metrics, filtering, and edge labeling. For more information, see xref:../network_observability/observing-network-traffic.adoc#network-observability-RTT-overview_nw-observe-network-traffic[RTT Overview] and xref:../network_observability/observing-network-traffic.adoc#network-observability-RTT_nw-observe-network-traffic[Working with RTT]. + +[id="network-observability-metrics-dashboard-enhancements"] +==== Metrics, dashboards, and alerts enhancements +The Network Observability metrics dashboards in *Observe* → *Dashboards* → *NetObserv* have new metrics types you can use to create Prometheus alerts. You can now define available metrics in the `includeList` specification. In previous releases, these metrics were defined in the `ignoreTags` specification. For a complete list of these metrics, see xref:../network_observability/metrics-alerts-dashboards.adoc#network-observability-metrics_metrics-dashboards-alerts[Network Observability Metrics]. + +[id="network-observability-improved-lokistack-integration"] +==== Improvements for Network Observability without Loki +You can create Prometheus alerts for the *Netobserv* dashboard using DNS, Packet drop, and RTT metrics, even if you don't use Loki. In the previous version of Network Observability, 1.4, these metrics were only available for querying and analysis in the *Network Traffic*, *Overview*, and *Topology* views, which are not available without Loki. For more information, see xref:../network_observability/metrics-alerts-dashboards.adoc#network-observability-metrics_metrics-dashboards-alerts[Network Observability Metrics]. + +[id="network-observability-zones"] +==== Availability zones +You can configure the `FlowCollector` resource to collect information about the cluster availability zones. This configuration enriches the network flow data with the link:https://kubernetes.io/docs/reference/labels-annotations-taints/#topologykubernetesiozone[`topology.kubernetes.io/zone`] label value applied to the nodes. For more information, see xref:../network_observability/observing-network-traffic.adoc#network-observability-zonesnw-observe-network-traffic[Working with availability zones]. + +[id="network-observability-enhanced-configuration-and-ui-1.5"] +==== Notable enhancements +The 1.5 release of the Network Observability Operator adds improvements and new capabilities to the {product-title} web console plugin and the Operator configuration. + +[discrete] +[id="performance-enhancements-1.5"] +===== Performance enhancements +* The `spec.agent.ebpf.kafkaBatchSize` default is changed from `10MB` to `1MB` to enhance eBPF performance when using Kafka. ++ +[IMPORTANT] +==== +When upgrading from an existing installation, this new value is not set automatically in the configuration. If you monitor a performance regression with the eBPF Agent memory consumption after upgrading, you might consider reducing the `kafkaBatchSize` to the new value. +==== + +[discrete] +[id="web-console-enhancements-1.5"] +===== Web console enhancements: + +* There are new panels added to the *Overview* view for DNS and RTT: Min, Max, P90, P99. +* There are new panel display options added: +** Focus on one panel while keeping others viewable but with smaller focus. +** Switch graph type. +** Show *Top* and *Overall*. +* A collection latency warning is shown in the *Custom time range* pop-up window. +* There is enhanced visibility for the contents of the *Manage panels* and *Manage columns* pop-up windows. +* The Differentiated Services Code Point (DSCP) field for egress QoS is available for filtering QoS DSCP in the web console *Network Traffic* page. + +[discrete] +[id="configuration-enhancements-1.5"] +===== Configuration enhancements: + +* The `LokiStack` mode in the `spec.loki.mode` specification simplifies installation by automatically setting URLs, TLS, cluster roles and a cluster role binding, as well as the `authToken` value. The `Manual` mode allows more control over configuration of these settings. +* The API version changes from `flows.netobserv.io/v1beta1` to `flows.netobserv.io/v1beta2`. + +[id="network-observability-operator-1.5.0-bug-fixes"] +=== Bug fixes + +* Previously, it was not possible to register the console plugin manually in the web console interface if the automatic registration of the console plugin was disabled. If the `spec.console.register` value was set to `false` in the `FlowCollector` resource, the Operator would override and erase the plugin registration. +With this fix, setting the `spec.console.register` value to `false` does not impact the console plugin registration or registration removal. As a result, the plugin can be safely registered manually. (link:https://issues.redhat.com/browse/NETOBSERV-1134[*NETOBSERV-1134*]) +* Previously, using the default metrics settings, the *NetObserv/Health* dashboard was showing an empty graph named *Flows Overhead*. This metric was only available by removing "namespaces-flows" and "namespaces" from the `ignoreTags` list. With this fix, this metric is visible when you use the default metrics setting. (link:https://issues.redhat.com/browse/NETOBSERV-1351[*NETOBSERV-1351*]) +* Previously, the node on which the eBPF Agent was running would not resolve with a specific cluster configuration. This resulted in cascading consequences that culminated in a failure to provide some of the traffic metrics. With this fix, the eBPF agent's node IP is safely provided by the Operator, inferred from the pod status. Now, the missing metrics are restored. (link:https://issues.redhat.com/browse/NETOBSERV-1430[*NETOBSERV-1430*]) +* Previously, the Loki error 'Input size too long' error for the Loki Operator did not include additional information to troubleshoot the problem. +With this fix, help is directly displayed in the web console next to the error with a direct link for more guidance. (link:https://issues.redhat.com/browse/NETOBSERV-1464[*NETOBSERV-1464*]) +* Previously, the console plugin read timeout was forced to 30s. +With the `FlowCollector` `v1beta2` API update, you can configure the `spec.loki.readTimeout` specification to update this value according to the Loki Operator `queryTimeout` limit. (link:https://issues.redhat.com/browse/NETOBSERV-1443[*NETOBSERV-1443*]) +* Previously, the Operator bundle did not display some of the supported features by CSV annotations as expected, such as `features.operators.openshift.io/...` +With this fix, these annotations are set in the CSV as expected. (link:https://issues.redhat.com/browse/NETOBSERV-1305[*NETOBSERV-1305*]) +* Previously, the `FlowCollector` status sometimes oscillated between `DeploymentInProgress` and `Ready` states during reconciliation. +With this fix, the status only becomes `Ready` when all the underlying components are fully ready.(link:https://issues.redhat.com/browse/NETOBSERV-1293[NETOBSERV-1293]) + +[id="network-observability-operator-1.5.0-known-issue"] +=== Known issues +* When trying to access the web console, cache issues on OCP 4.14.10 prevent access to the *Observe* view. The web console shows the error message: `Failed to get a valid plugin manifest from /api/plugins/monitoring-plugin/`. The recommended workaround is to update the cluster to the latest minor version. If this does not work, you need to apply the workarounds described in this link:https://access.redhat.com/solutions/7052408[Red Hat Knowledgebase article].(link:https://issues.redhat.com/browse/NETOBSERV-1493[*NETOBSERV-1493*]) + [id="network-observability-operator-release-notes-1-4-2"] == Network Observability Operator 1.4.2 The following advisory is available for the Network Observability Operator 1.4.2: @@ -94,7 +178,6 @@ In 1.4, the Network Observability Operator makes use of eBPF tracepoint hooks to For more information, see xref:../network_observability/observing-network-traffic.adoc#network-observability-dns-overview_nw-observe-network-traffic[Configuring DNS tracking] and xref:../network_observability/observing-network-traffic.adoc#network-observability-dns-tracking_nw-observe-network-traffic[Working with DNS tracking]. -//Packet drops needs separate RN PR that doesn't cherrypick to 4.10+ since its only supported in 4.13+. This PR will go to 4.10+ [id="SR-IOV-configuration-1.4"] ==== SR-IOV support You can now collect traffic from a cluster with Single Root I/O Virtualization (SR-IOV) device. For more information, see xref:../network_observability/configuring-operator.adoc#network-observability-SR-IOV-config_network_observability[Configuring the monitoring of SR-IOV interface traffic]. diff --git a/network_observability/network-observability-overview.adoc b/network_observability/network-observability-overview.adoc index 9b847d5fdb..2fed3d7e5b 100644 --- a/network_observability/network-observability-overview.adoc +++ b/network_observability/network-observability-overview.adoc @@ -28,29 +28,10 @@ The Network Observability Operator provides the Flow Collector API custom resour [id="network-observability-dashboards"] === Network Observability metrics dashboards -On the *Overview* tab in the {product-title} console, you can view the overall aggregated metrics of the network traffic flow on the cluster. You can choose to display the information by node, namespace, owner, pod, and service. Filters and display options can further refine the metrics. +On the *Overview* tab in the {product-title} console, you can view the overall aggregated metrics of the network traffic flow on the cluster. You can choose to display the information by node, namespace, owner, pod, zone, and service. Filters and display options can further refine the metrics. For more information, see xref:../network_observability/observing-network-traffic.adoc#network-observability-overview_nw-observe-network-traffic[Observing the network traffic from the Overview view]. -In *Observe* -> *Dashboards*, the *Netobserv* dashboard provides a quick overview of the network flows in your {product-title} cluster. You can view distillations of the network traffic metrics in the following categories: +In *Observe* -> *Dashboards*, the *Netobserv* dashboard provides a quick overview of the network flows in your {product-title} cluster. The *Netobserv/Health* dashboard provides metrics about the health of the Operator. For more information, see xref:../network_observability/metrics-alerts-dashboards.adoc#network-observability-metrics_metrics-dashboards-alerts[Network Observability Metrics] and xref:../network_observability/network-observability-operator-monitoring.adoc#network-observability-alert-dashboard_network_observability[Viewing health information]. - * *Top byte rates received per source and destination nodes* - * *Top byte rates received per source and destination namespaces* - * *Top byte rates received per source and destination workloads* - -*Infrastructure* and *Application* metrics are shown in a split-view for namespace and workloads. -You can configure the `FlowCollector` `spec.processor.metrics` to add or remove metrics by changing the `ignoreTags` list. For more information about available tags, see the xref:../network_observability/flowcollector-api.adoc#network-observability-flowcollector-api-specifications_network_observability[Flow Collector API Reference] - -Also in *Observe* -> *Dashboards*, the *Netobserv/Health* dashboard provides metrics about the health of the Operator in the following categories. - -* *Flows* -* *Flows Overhead* -* *Top flow rates per source and destination nodes* -* *Top flow rates per source and destination namespaces* -* *Top flow rates per source and destination workloads* -* *Agents* -* *Processor* -* *Operator* - -*Infrastructure* and *Application* metrics are shown in a split-view for namespace and workloads. [id="network-observability-topology-views"] === Network Observability topology views diff --git a/network_observability/observing-network-traffic.adoc b/network_observability/observing-network-traffic.adoc index 6272dc1a91..f7549b4428 100644 --- a/network_observability/observing-network-traffic.adoc +++ b/network_observability/observing-network-traffic.adoc @@ -12,17 +12,25 @@ As an administrator, you can observe the network traffic in the {product-title} include::modules/network-observability-overview.adoc[leveloffset=+1] include::modules/network-observability-working-with-overview.adoc[leveloffset=+2] include::modules/network-observability-configuring-options-overview.adoc[leveloffset=+2] -include::modules/network-observability-pktdrop-overview.adoc[leveloffset=+3] +include::modules/network-observability-pktdrop-overview.adoc[leveloffset=+2] [role="_additional-resources"] .Additional resources -* For more information about configuring packet drops in the `FlowCollector`, see xref:../network_observability/observing-network-traffic.adoc#network-observability-packet-drops_nw-observe-network-traffic[Working with packet drops]. +* xref:../network_observability/observing-network-traffic.adoc#network-observability-packet-drops_nw-observe-network-traffic[Working with packet drops] +* xref:../network_observability/metrics-alerts-dashboards.adoc#network-observability-metrics_metrics-dashboards-alerts[Network Observability metrics] -include::modules/network-observability-dns-overview.adoc[leveloffset=+3] +include::modules/network-observability-dns-overview.adoc[leveloffset=+2] [role="_additional-resources"] .Additional resources -* For more information about configuring DNS in the `FlowCollector`, see xref:../network_observability/observing-network-traffic.adoc#network-observability-dns-tracking_nw-observe-network-traffic[Working with DNS tracking]. +* xref:../network_observability/observing-network-traffic.adoc#network-observability-dns-tracking_nw-observe-network-traffic[Working with DNS tracking] +* xref:../network_observability/metrics-alerts-dashboards.adoc#network-observability-metrics_metrics-dashboards-alerts[Network Observability metrics] + +include::modules/network-observability-RTT-overview.adoc[leveloffset=+2] + +[role="_additional-resources"] +.Additional resources +* xref:../network_observability/observing-network-traffic.adoc#network-observability-RTT_nw-observe-network-traffic[Working with RTT tracing] //Traffic flows include::modules/network-observability-trafficflow.adoc[leveloffset=+1] @@ -31,7 +39,9 @@ include::modules/network-observability-configuring-options-trafficflow.adoc[leve include::modules/network-observability-working-with-conversations.adoc[leveloffset=+2] include::modules/network-observability-packet-drops.adoc[leveloffset=+2] include::modules/network-observability-dns-tracking.adoc[leveloffset=+2] +include::modules/network-observability-RTT.adoc[leveloffset=+2] include::modules/network-observability-histogram-trafficflow.adoc[leveloffset=+2] +include::modules/network-observability-working-with-zones.adoc[leveloffset=+2] //Topology include::modules/network-observability-topology.adoc[leveloffset=+1] diff --git a/network_observability/troubleshooting-network-observability.adoc b/network_observability/troubleshooting-network-observability.adoc index bb0b08eef4..fdb03d8ebd 100644 --- a/network_observability/troubleshooting-network-observability.adoc +++ b/network_observability/troubleshooting-network-observability.adoc @@ -23,6 +23,7 @@ include::modules/troubleshooting-network-observability-controller-manager-pod-ou * xref:../network_observability/configuring-operator.adoc#network-observability-resources-table_network_observability[Resource considerations] include::modules/troubleshooting-network-observability-loki-resource-exhausted.adoc[leveloffset=+1] +include::modules/troubleshooting-network-observability-loki-empty-ring.adoc[leveloffset=+1] == Resource troubleshooting