1
0
mirror of https://github.com/openshift/openshift-docs.git synced 2026-02-05 12:46:18 +01:00

OSDOCS-10171: Deduper merge mode

Integrate NetObserv 1.6 feature branch with OCP docs main branch

OSDOCS-10211: eBPF flow rule filtering

OSDOCS-9959: NetObserv Health dashboard updates

Fixes xref error

Flow format reference regeneration

Update DNS example to include sampling>1 note

OSDOCS-9553: Netobserv Lokiless enhancements

OSDOCS-10790: Update NetObserv Operator Install prereqs

OSDOCS-10747: Adding FlowMetric API Reference

Changing FlowMetrics to FlowMetric

Netobserv API doc regeneration

OSDOCS-9969: netobserv cli

Network Observability 1.6 release notes
This commit is contained in:
Sara Thomas
2024-04-09 16:46:57 -04:00
committed by openshift-cherrypick-robot
parent f6277ef0ea
commit 4815076502
45 changed files with 2569 additions and 339 deletions

View File

@@ -2912,9 +2912,22 @@ Topics:
File: metrics-alerts-dashboards
- Name: Monitoring the Network Observability Operator
File: network-observability-operator-monitoring
- Name: API reference
- Name: Scheduling resources
File: network-observability-scheduling-resources
- Name: Network Observability CLI
Dir: netobserv_cli
Topics:
- Name: Installing the Network Observability CLI
File: netobserv-cli-install
- Name: Using the Network Observability CLI
File: netobserv-cli-using
- Name: Network Observability CLI reference
File: netobserv-cli-reference
- Name: FlowCollector API reference
File: flowcollector-api
- Name: JSON flows format reference
- Name: FlowMetric API reference
File: flowmetric-api
- Name: Flows format reference
File: json-flows-format-reference
- Name: Troubleshooting Network Observability
File: troubleshooting-network-observability

View File

@@ -5,24 +5,24 @@
:_mod-docs-content-type: CONCEPT
[id="network-observability-RTT-overview_{context}"]
= Round-Trip Time
You can use TCP handshake Round-Trip Time (RTT) to analyze network flows. You can use RTT captured from the `fentry/tcp_rcv_established` eBPF hookpoint to read SRTT from the TCP socket to help with the following:
You can use TCP smoothed Round-Trip Time (sRTT) to analyze network flow latencies. You can use RTT captured from the `fentry/tcp_rcv_established` eBPF hookpoint to read sRTT from the TCP socket to help with the following:
* Network Monitoring: Gain insights into TCP handshakes, helping
* Network Monitoring: Gain insights into TCP latencies, helping
network administrators identify unusual patterns, potential bottlenecks, or
performance issues.
* Troubleshooting: Debug TCP-related issues by tracking latency and identifying
misconfigurations.
By default, when RTT is enabled, you can see the following TCP handshake RTT metrics represented in the *Overview*:
By default, when RTT is enabled, you can see the following TCP RTT metrics represented in the *Overview*:
* Top X 90th percentile TCP handshake Round Trip Time with overall
* Top X average TCP handshake Round Trip Time with overall
* Bottom X minimum TCP handshake Round Trip Time with overall
* Top X 90th percentile TCP Round Trip Time with overall
* Top X average TCP Round Trip Time with overall
* Bottom X minimum TCP Round Trip Time with overall
Other RTT panels can be added in *Manage panels*:
* Top X maximum TCP handshake Round Trip Time with overall
* Top X 99th percentile TCP handshake Round Trip Time with overall
* Top X maximum TCP Round Trip Time with overall
* Top X 99th percentile TCP Round Trip Time with overall
See the _Additional Resources_ in this section for more information about enabling and working with this view.

View File

@@ -23,7 +23,6 @@ metadata:
name: cluster
spec:
namespace: netobserv
deploymentModel: Direct
agent:
type: eBPF
ebpf:

View File

@@ -0,0 +1,88 @@
//Module included in the following assemblies:
//
// observability/network_observability/netobserv_cli/netobserv-cli-using.adoc
:_mod-docs-content-type: PROCEDURE
[id="network-observability-cli-capturing-flows_{context}"]
= Capturing flows
You can capture flows and filter on any resource or zone in the data to solve use cases, such as displaying Round-Trip Time (RTT) between two zones. Table visualization in the CLI provides viewing and flow search capabilities.
.Prerequisites
* Install the {oc-first}.
* Install the Network Observability CLI (`oc netobserv`) plugin.
.Procedure
. Capture flows with filters enabled by running the following command:
+
[source,terminal]
----
$ oc netobserv flows --enable_filter=true --action=Accept --cidr=0.0.0.0/0 --protocol=TCP --port=49051
----
. Add filters to the `live table filter` prompt in the terminal to further refine the incoming flows. For example:
+
[source,terminal]
----
live table filter: [SrcK8S_Zone:us-west-1b] press enter to match multiple regular expressions at once
----
. To stop capturing, press kbd:[Ctrl+C]. The data that was captured is written to two separate files in an `./output` directory located in the same path used to install the CLI.
. View the captured data in the `./output/flow/<capture_date_time>.json` JSON file, which contains JSON arrays of the captured data.
+
.Example JSON file
[source,json]
----
{
"AgentIP": "10.0.1.76",
"Bytes": 561,
"DnsErrno": 0,
"Dscp": 20,
"DstAddr": "f904:ece9:ba63:6ac7:8018:1e5:7130:0",
"DstMac": "0A:58:0A:80:00:37",
"DstPort": 9999,
"Duplicate": false,
"Etype": 2048,
"Flags": 16,
"FlowDirection": 0,
"IfDirection": 0,
"Interface": "ens5",
"K8S_FlowLayer": "infra",
"Packets": 1,
"Proto": 6,
"SrcAddr": "3e06:6c10:6440:2:a80:37:b756:270f",
"SrcMac": "0A:58:0A:80:00:01",
"SrcPort": 46934,
"TimeFlowEndMs": 1709741962111,
"TimeFlowRttNs": 121000,
"TimeFlowStartMs": 1709741962111,
"TimeReceived": 1709741964
}
----
. You can use SQLite to inspect the `./output/flow/<capture_date_time>.db` database file. For example:
.. Open the file by running the following command:
+
[source,terminal]
----
$ sqlite3 ./output/flow/<capture_date_time>.db
----
.. Query the data by running a SQLite `SELECT` statement, for example:
+
[source,terminal]
----
sqlite> SELECT DnsLatencyMs, DnsFlagsResponseCode, DnsId, DstAddr, DstPort, Interface, Proto, SrcAddr, SrcPort, Bytes, Packets FROM flow WHERE DnsLatencyMs >10 LIMIT 10;
----
+
.Example output
[source,terminal]
----
12|NoError|58747|10.128.0.63|57856||17|172.30.0.10|53|284|1
11|NoError|20486|10.128.0.52|56575||17|169.254.169.254|53|225|1
11|NoError|59544|10.128.0.103|51089||17|172.30.0.10|53|307|1
13|NoError|32519|10.128.0.52|55241||17|169.254.169.254|53|254|1
12|NoError|32519|10.0.0.3|55241||17|169.254.169.254|53|254|1
15|NoError|57673|10.128.0.19|59051||17|172.30.0.10|53|313|1
13|NoError|35652|10.0.0.3|46532||17|169.254.169.254|53|183|1
32|NoError|37326|10.0.0.3|52718||17|169.254.169.254|53|169|1
14|NoError|14530|10.0.0.3|58203||17|169.254.169.254|53|246|1
15|NoError|40548|10.0.0.3|45933||17|169.254.169.254|53|174|1
----

View File

@@ -0,0 +1,29 @@
//Module included in the following assemblies:
//
// observability/network_observability/netobserv_cli/netobserv-cli-using.adoc
:_mod-docs-content-type: PROCEDURE
[id="network-observability-cli-capturing-packets_{context}"]
= Capturing packets
You can capture packets using the Network Observability CLI.
.Prerequisites
* Install the {oc-first}.
* Install the Network Observability CLI (`oc netobserv`) plugin.
.Procedure
. Run the packet capture with filters enabled:
+
[source,terminal]
----
$ oc netobserv packets --filter=tcp,80
----
. Add filters to the `live table filter` prompt in the terminal to refine the incoming packets. An example filter is as follows:
+
[source,terminal]
----
live table filter: [SrcK8S_Zone:us-west-1b] press enter to match multiple regular expressions at once
----
. To stop capturing, press kbd:[Ctrl+C].
. View the captured data, which is written to a single file in an `./output/pcap` directory located in the same path that was used to install the CLI:
.. The `./output/pcap/<capture_date_time>.pcap` file can be opened with Wireshark.

View File

@@ -0,0 +1,85 @@
// Module included in the following assemblies:
//
// network_observability/metrics-alerts-dashboards.adoc
:_mod-docs-content-type: PROCEDURE
[id="network-observability-configuring-custom-metrics_{context}"]
= Configuring custom metrics by using FlowMetric API
You can configure the `FlowMetric` API to create custom metrics by using flowlogs data fields as Prometheus labels. You can add multiple `FlowMetric` resources to a project to see multiple dashboard views.
.Procedure
. In the web console, navigate to *Operators* -> *Installed Operators*.
. In the *Provided APIs* heading for the *NetObserv Operator*, select *FlowMetric*.
. In the *Project:* dropdown list, select the project of the Network Observability Operator instance.
. Click *Create FlowMetric*.
. Configure the `FlowMetric` resource, similar to the following sample configurations:
+
.Generate a metric that tracks ingress bytes received from cluster external sources
[%collapsible]
====
[source,yaml]
----
apiVersion: flows.netobserv.io/v1alpha1
kind: FlowMetric
metadata:
name: flowmetric-cluster-external-ingress-traffic
namespace: netobserv <1>
spec:
metricName: cluster_external_ingress_bytes_total <2>
type: Counter <3>
valueField: Bytes
direction: Ingress <4>
labels: [DstK8S_HostName,DstK8S_Namespace,DstK8S_OwnerName,DstK8S_OwnerType] <5>
filters: <6>
- field: SrcSubnetLabel
matchType: Absence
----
<1> The `FlowMetric` resources need to be created in the namespace defined in the `FlowCollector` `spec.namespace`, which is `netobserv` by default.
<2> The name of the Prometheus metric, which in the web console appears with the prefix `netobserv-<metricName>`.
<3> The `type` specifies the type of metric. The `Counter` `type` is useful for counting bytes or packets.
<4> The direction of traffic to capture. If not specified, both ingress and egress are captured, which can lead to duplicated counts.
<5> Labels define what the metrics look like and the relationship between the different entities and also define the metrics cardinality. For example, `SrcK8S_Name` is a high cardinality metric.
<6> Refines results based on the listed criteria. In this example, selecting only the cluster external traffic is done by matching only flows where `SrcSubnetLabel` is absent. This assumes the subnet labels feature is enabled (via `spec.processor.subnetLabels`), which is done by default.
.Verification
. Once the pods refresh, navigate to *Observe* -> *Metrics*.
. In the *Expression* field, type the metric name to view the corresponding result. You can also enter an expression, such as `topk(5, sum(rate(netobserv_cluster_external_ingress_bytes_total{DstK8S_Namespace="my-namespace"}[2m])) by (DstK8S_HostName, DstK8S_OwnerName, DstK8S_OwnerType))`
====
+
.Show RTT latency for cluster external ingress traffic
[%collapsible]
====
[source,yaml]
----
apiVersion: flows.netobserv.io/v1alpha1
kind: FlowMetric
metadata:
name: flowmetric-cluster-external-ingress-rtt
namespace: netobserv <1>
spec:
metricName: cluster_external_ingress_rtt_seconds
type: Histogram <2>
valueField: TimeFlowRttNs
direction: Ingress
labels: [DstK8S_HostName,DstK8S_Namespace,DstK8S_OwnerName,DstK8S_OwnerType]
filters:
- field: SrcSubnetLabel
matchType: Absence
- field: TimeFlowRttNs
matchType: Presence
divider: "1000000000" <3>
buckets: [".001", ".005", ".01", ".02", ".03", ".04", ".05", ".075", ".1", ".25", "1"] <4>
----
<1> The `FlowMetric` resources need to be created in the namespace defined in the `FlowCollector` `spec.namespace`, which is `netobserv` by default.
<2> The `type` specifies the type of metric. The `Histogram` `type` is useful for a latency value (`TimeFlowRttNs`).
<3> Since the Round-trip time (RTT) is provided as nanos in flows, use a divider of 1 billion to convert into seconds, which is standard in Prometheus guidelines.
<4> The custom buckets specify precision on RTT, with optimal precision ranging between 5ms and 250ms.
.Verification
. Once the pods refresh, navigate to *Observe* -> *Metrics*.
. In the *Expression* field, you can type the metric name to view the corresponding result.
====

View File

@@ -0,0 +1,8 @@
// Module included in the following assemblies:
//
// network_observability/metrics-alerts-dashboards.adoc
:_mod-docs-content-type: CONCEPT
[id="network-observability-custom-metrics_{context}"]
= Custom metrics
You can create custom metrics out of the flowlogs data using the `FlowMetric` API. In every flowlogs data that is collected, there are a number of fields labeled per log, such as source name and destination name. These fields can be leveraged as Prometheus labels to enable the customization of cluster information on your dashboard.

View File

@@ -27,7 +27,6 @@ metadata:
name: cluster
spec:
namespace: netobserv
deploymentModel: Direct
agent:
type: eBPF
ebpf:
@@ -36,7 +35,7 @@ spec:
sampling: 1 <2>
----
<1> You can set the `spec.agent.ebpf.features` parameter list to enable DNS tracking of each network flow in the web console.
<2> You can set `sampling` to a value of `1` for more accurate metrics.
<2> You can set `sampling` to a value of `1` for more accurate metrics and to capture *DNS latency*. For a `sampling` value greater than 1, you can observe flows with *DNS Response Code* and *DNS Id*, and it is unlikely that *DNS Latency* can be observed.
. When you refresh the *Network Traffic* page, there are new DNS representations you can choose to view in the *Overview* and *Traffic Flow* views and new filters you can apply.
.. Select new DNS choices in *Manage panels* to display graphical visualizations and DNS metrics in the *Overview*.

View File

@@ -0,0 +1,38 @@
// Module included in the following assemblies:
// * network_observability/network-observability-operator-monitoring.adoc
:_mod-docs-content-type: PROCEDURE
[id="network-observability-netobserv-dashboard-ebpf-agent-alerts_{context}"]
= Using the eBPF agent alert
An alert, `NetObservAgentFlowsDropped`, is triggered when the Network Observability eBPF agent hashmap table is full or when the capacity limiter is triggered. If you see this alert, consider increasing the `cacheMaxFlows` in the `FlowCollector`, as shown in the following example.
[NOTE]
====
Increasing the `cacheMaxFlows` might increase the memory usage of the eBPF agent.
====
.Procedure
. In the web console, navigate to *Operators* -> *Installed Operators*.
. Under the *Provided APIs* heading for the *Network Observability Operator*, select *Flow Collector*.
. Select *cluster*, and then select the *YAML* tab.
. Increase the `spec.agent.ebpf.cacheMaxFlows` value, as shown in the following YAML sample:
[source,yaml]
----
apiVersion: flows.netobserv.io/v1beta2
kind: FlowCollector
metadata:
name: cluster
spec:
namespace: netobserv
deploymentModel: Direct
agent:
type: eBPF
ebpf:
cacheMaxFlows: 200000 <1>
----
<1> Increase the `cacheMaxFlows` value from its value at the time of the `NetObservAgentFlowsDropped` alert.

View File

@@ -0,0 +1,18 @@
// Module included in the following assemblies:
//
// network_observability/observing-network-traffic.adoc
:_mod-docs-content-type: CONCEPT
[id="network-observability-ebpf-flow-rule-filter_{context}"]
= eBPF flow rule filter
You can use rule-based filtering to control the volume of packets cached in the eBPF flow table. For example, a filter can specify that only packets coming from port 100 should be recorded. Then only the packets that match the filter are cached and the rest are not cached.
[id="ingress-and-egress-traffic-filtering_{context}"]
== Ingress and egress traffic filtering
CIDR notation efficiently represents IP address ranges by combining the base IP address with a prefix length. For both ingress and egress traffic, the source IP address is first used to match filter rules configured with CIDR notation. If there is a match, then the filtering proceeds. If there is no match, then the destination IP is used to match filter rules configured with CIDR notation.
After matching either the source IP or the destination IP CIDR, you can pinpoint specific endpoints using the `peerIP` to differentiate the destination IP address of the packet. Based on the provisioned action, the flow data is either cached in the eBPF flow table or not cached.
[id="dashboard-and-metrics-integrations_{context}"]
== Dashboard and metrics integrations
When this option is enabled, the *Netobserv/Health* dashboard for *eBPF agent statistics* now has the *Filtered flows rate* view. Additionally, in *Observe* -> *Metrics* you can query `netobserv_agent_filtered_flows_total` to observe metrics with the reason in *FlowFilterAcceptCounter*, *FlowFilterNoMatchCounter* or *FlowFilterRecjectCounter*.

View File

@@ -0,0 +1,74 @@
// Module included in the following assemblies:
//
// network_observability/observing-network-traffic.adoc
:_mod-docs-content-type: PROCEDURE
[id="network-observability-filtering-ebpf-rule_{context}"]
= Filtering eBPF flow data using a global rule
You can configure the `FlowCollector` to filter eBPF flows using a global rule to control the flow of packets cached in the eBPF flow table.
.Procedure
. In the web console, navigate to *Operators* -> *Installed Operators*.
. Under the *Provided APIs* heading for *Network Observability*, select *Flow Collector*.
. Select *cluster*, then select the *YAML* tab.
. Configure the `FlowCollector` custom resource, similar to the following sample configurations:
+
[%collapsible]
.Filter Kubernetes service traffic to a specific Pod IP endpoint
====
[source, yaml]
----
apiVersion: flows.netobserv.io/v1beta2
kind: FlowCollector
metadata:
name: cluster
spec:
namespace: netobserv
deploymentModel: Direct
agent:
type: eBPF
ebpf:
flowFilter:
action: Accept <1>
cidr: 172.210.150.1/24 <2>
protocol: SCTP
direction: Ingress
destPortRange: 80-100
peerIP: 10.10.10.10
enable: true <3>
----
<1> The required `action` parameter describes the action that is taken for the flow filter rule. Possible values are `Accept` or `Reject`.
<2> The required `cidr` parameter provides the IP address and CIDR mask for the flow filter rule and supports IPv4 and IPv6 address formats. If you want to match against any IP address, you can use `0.0.0.0/0` for IPv4 or `::/0` for IPv6.
<3> You must set `spec.agent.ebpf.flowFilter.enable` to `true` to enable this feature.
====
+
[%collapsible]
.See flows to any addresses outside the cluster
====
[source, yaml]
----
apiVersion: flows.netobserv.io/v1beta2
kind: FlowCollector
metadata:
name: cluster
spec:
namespace: netobserv
deploymentModel: Direct
agent:
type: eBPF
ebpf:
flowFilter:
action: Accept <1>
cidr: 0.0.0.0/0 <2>
protocol: TCP
direction: Egress
sourcePort: 100
peerIP: 192.168.127.12 <3>
enable: true <4>
----
<1> You can `Accept` flows based on the criteria in the `flowFilter` specification.
<2> The `cidr` value of `0.0.0.0/0` matches against any IP address.
<3> See flows after `peerIP` is configured with `192.168.127.12`.
<4> You must set `spec.agent.ebpf.flowFilter.enable` to `true` to enable the feature.
====

View File

@@ -0,0 +1,59 @@
:_mod-docs-content-type: REFERENCE
// Module included in the following assemblies:
//
// network_observability/observing-network-traffic.adoc
[id="network-observability-flowcollector-flowfilter-parameters_{context}"]
= Flow filter configuration parameters
The flow filter rules consist of required and optional parameters.
.Required configuration parameters
[cols="3a,8a",options="header"]
|===
|Parameter |Description
|`enable`
| Set `enable` to `true` to enable the eBPF flow filtering feature.
|`cidr`
| Provides the IP address and CIDR mask for the flow filter rule. Supports both IPv4 and IPv6 address format. If you want to match against any IP, you can use `0.0.0.0/0` for IPv4 or `::/0` for IPv6.
|`action`
| Describes the action that is taken for the flow filter rule. The possible values are `Accept` or `Reject`.
* For the `Accept` action matching rule, the flow data is cached in the eBPF table and updated with the global metric, `FlowFilterAcceptCounter`.
* For the `Reject` action matching rule, the flow data is dropped and not cached in the eBPF table. The flow data is updated with the global metric, `FlowFilterRejectCounter`.
* If the rule is not matched, the flow is cached in the eBPF table and updated with the global metric, `FlowFilterNoMatchCounter`.
|===
.Optional configuration parameters
[cols="3a,8a",options="header"]
|===
|Parameter |Description
|`direction`
| Defines the direction of the flow filter rule. Possible values are `Ingress` or `Egress`.
|`protocol`
| Defines the protocol of the flow filter rule. Possible values are `TCP`, `UDP`, `SCTP`, `ICMP`, and `ICMPv6`.
| `ports`
| Defines the ports to use for filtering flows. It can be used for either source or destination ports. To filter a single port, set a single port as an integer value. For example `ports: 80`. To filter a range of ports, use a "start-end" range in string format. For example `ports: "80-100"`
|`sourcePorts`
| Defines the source port to use for filtering flows. To filter a single port, set a single port as an integer value, for example `sourcePorts: 80`. To filter a range of ports, use a "start-end" range, string format, for example `sourcePorts: "80-100"`.
| `destPorts`
| DestPorts defines the destination ports to use for filtering flows. To filter a single port, set a single port as an integer value, for example `destPorts: 80`. To filter a range of ports, use a "start-end" range in string format, for example `destPorts: "80-100"`.
| `icmpType`
| Defines the ICMP type to use for filtering flows.
| `icmpCode`
| Defines the ICMP code to use for filtering flows.
| `peerIP`
| Defines the IP address to use for filtering flows, for example: `10.10.10.10`.
|===

View File

@@ -0,0 +1,298 @@
// Automatically generated by 'openshift-apidocs-gen'. Do not edit.
:_mod-docs-content-type: REFERENCE
[id="flowmetric-flows-netobserv-io-v1alpha1"]
= FlowMetric [flows.netobserv.io/v1alpha1]
Description::
+
--
FlowMetric is the API allowing to create custom metrics from the collected flow logs.
--
Type::
`object`
[cols="1,1,1",options="header"]
|===
| Property | Type | Description
| `apiVersion`
| `string`
| APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and might reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
| `kind`
| `string`
| Kind is a string value representing the REST resource this object represents. Servers might infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
| `metadata`
| `object`
| Standard object's metadata. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata
| `spec`
| `object`
| FlowMetricSpec defines the desired state of FlowMetric
The provided API allows you to customize these metrics according to your needs. +
When adding new metrics or modifying existing labels, you must carefully monitor the memory
usage of Prometheus workloads as this could potentially have a high impact. Cf https://rhobs-handbook.netlify.app/products/openshiftmonitoring/telemetry.md/#what-is-the-cardinality-of-a-metric +
To check the cardinality of all Network Observability metrics, run as `promql`: `count({__name__=~"netobserv.*"}) by (__name__)`.
|===
== .metadata
Description::
+
--
Standard object's metadata. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata
--
Type::
`object`
== .spec
Description::
+
--
FlowMetricSpec defines the desired state of FlowMetric
The provided API allows you to customize these metrics according to your needs. +
When adding new metrics or modifying existing labels, you must carefully monitor the memory
usage of Prometheus workloads as this could potentially have a high impact. Cf https://rhobs-handbook.netlify.app/products/openshiftmonitoring/telemetry.md/#what-is-the-cardinality-of-a-metric +
To check the cardinality of all Network Observability metrics, run as `promql`: `count({__name__=~"netobserv.*"}) by (__name__)`.
--
Type::
`object`
Required::
- `metricName`
- `type`
[cols="1,1,1",options="header"]
|===
| Property | Type | Description
| `buckets`
| `array (string)`
| A list of buckets to use when `type` is "Histogram". The list must be parsable as floats. When not set, Prometheus default buckets are used.
| `charts`
| `array`
| Charts configuration, for the {product-title} Console in the administrator view, Dashboards menu.
| `direction`
| `string`
| Filter for ingress, egress or any direction flows.
When set to `Ingress`, it is equivalent to adding the regular expression filter on `FlowDirection`: `0\|2`.
When set to `Egress`, it is equivalent to adding the regular expression filter on `FlowDirection`: `1\|2`.
| `divider`
| `string`
| When nonzero, scale factor (divider) of the value. Metric value = Flow value / Divider.
| `filters`
| `array`
| `filters` is a list of fields and values used to restrict which flows are taken into account. Oftentimes, these filters must
be used to eliminate duplicates: `Duplicate != "true"` and `FlowDirection = "0"`.
Refer to the documentation for the list of available fields: https://docs.openshift.com/container-platform/latest/observability/network_observability/json-flows-format-reference.html.
| `labels`
| `array (string)`
| `labels` is a list of fields that should be used as Prometheus labels, also known as dimensions.
From choosing labels results the level of granularity of this metric, and the available aggregations at query time.
It must be done carefully as it impacts the metric cardinality (cf https://rhobs-handbook.netlify.app/products/openshiftmonitoring/telemetry.md/#what-is-the-cardinality-of-a-metric).
In general, avoid setting very high cardinality labels such as IP or MAC addresses.
"SrcK8S_OwnerName" or "DstK8S_OwnerName" should be preferred over "SrcK8S_Name" or "DstK8S_Name" as much as possible.
Refer to the documentation for the list of available fields: https://docs.openshift.com/container-platform/latest/observability/network_observability/json-flows-format-reference.html.
| `metricName`
| `string`
| Name of the metric. In Prometheus, it is automatically prefixed with "netobserv_".
| `type`
| `string`
| Metric type: "Counter" or "Histogram".
Use "Counter" for any value that increases over time and on which you can compute a rate, such as Bytes or Packets.
Use "Histogram" for any value that must be sampled independently, such as latencies.
| `valueField`
| `string`
| `valueField` is the flow field that must be used as a value for this metric. This field must hold numeric values.
Leave empty to count flows rather than a specific value per flow.
Refer to the documentation for the list of available fields: https://docs.openshift.com/container-platform/latest/observability/network_observability/json-flows-format-reference.html.
|===
== .spec.charts
Description::
+
--
Charts configuration, for the {product-title} Console in the administrator view, Dashboards menu.
--
Type::
`array`
== .spec.charts[]
Description::
+
--
Configures charts / dashboard generation associated to a metric
--
Type::
`object`
Required::
- `dashboardName`
- `queries`
- `title`
- `type`
[cols="1,1,1",options="header"]
|===
| Property | Type | Description
| `dashboardName`
| `string`
| Name of the containing dashboard. If this name does not refer to an existing dashboard, a new dashboard is created.
| `queries`
| `array`
| List of queries to be displayed on this chart. If `type` is `SingleStat` and multiple queries are provided,
this chart is automatically expanded in several panels (one per query).
| `sectionName`
| `string`
| Name of the containing dashboard section. If this name does not refer to an existing section, a new section is created.
If `sectionName` is omitted or empty, the chart is placed in the global top section.
| `title`
| `string`
| Title of the chart.
| `type`
| `string`
| Type of the chart.
| `unit`
| `string`
| Unit of this chart. Only a few units are currently supported. Leave empty to use generic number.
|===
== .spec.charts[].queries
Description::
+
--
List of queries to be displayed on this chart. If `type` is `SingleStat` and multiple queries are provided,
this chart is automatically expanded in several panels (one per query).
--
Type::
`array`
== .spec.charts[].queries[]
Description::
+
--
Configures PromQL queries
--
Type::
`object`
Required::
- `legend`
- `promQL`
- `top`
[cols="1,1,1",options="header"]
|===
| Property | Type | Description
| `legend`
| `string`
| The query legend that applies to each timeseries represented in this chart. When multiple timeseries are displayed, you should set a legend
that distinguishes each of them. It can be done with the following format: `{{ Label }}`. For example, if the `promQL` groups timeseries per
label such as: `sum(rate($METRIC[2m])) by (Label1, Label2)`, you might write as the legend: `Label1={{ Label1 }}, Label2={{ Label2 }}`.
| `promQL`
| `string`
| The `promQL` query to be run against Prometheus. If the chart `type` is `SingleStat`, this query should only return
a single timeseries. For other types, a top 7 is displayed.
You can use `$METRIC` to refer to the metric defined in this resource. For example: `sum(rate($METRIC[2m]))`.
To learn more about `promQL`, refer to the Prometheus documentation: https://prometheus.io/docs/prometheus/latest/querying/basics/
| `top`
| `integer`
| Top N series to display per timestamp. Does not apply to `SingleStat` chart type.
|===
== .spec.filters
Description::
+
--
`filters` is a list of fields and values used to restrict which flows are taken into account. Oftentimes, these filters must
be used to eliminate duplicates: `Duplicate != "true"` and `FlowDirection = "0"`.
Refer to the documentation for the list of available fields: https://docs.openshift.com/container-platform/latest/observability/network_observability/json-flows-format-reference.html.
--
Type::
`array`
== .spec.filters[]
Description::
+
--
--
Type::
`object`
Required::
- `field`
- `matchType`
[cols="1,1,1",options="header"]
|===
| Property | Type | Description
| `field`
| `string`
| Name of the field to filter on
| `matchType`
| `string`
| Type of matching to apply
| `value`
| `string`
| Value to filter on. When `matchType` is `Equal` or `NotEqual`, you can use field injection with `$(SomeField)` to refer to any other field of the flow.
|===

View File

@@ -0,0 +1,112 @@
// Module included in the following assemblies:
//
// network_observability/metrics-alerts-dashboards.adoc
:_mod-docs-content-type: PROCEDURE
[id="network-observability-custom-charts-flowmetrics_{context}"]
= Configuring custom charts using FlowMetric API
You can generate charts for dashboards in the {product-title} web console, which you can view as an administrator in the *Dashboard* menu by defining the `charts` section of the `FlowMetric` resource.
.Procedure
. In the web console, navigate to *Operators* -> *Installed Operators*.
. In the *Provided APIs* heading for the *NetObserv Operator*, select *FlowMetric*.
. In the *Project:* dropdown list, select the project of the Network Observability Operator instance.
. Click *Create FlowMetric*.
. Configure the `FlowMetric` resource, similar to the following sample configurations:
.Chart for tracking ingress bytes received from cluster external sources
[%collapsible]
====
[source,yaml]
----
apiVersion: flows.netobserv.io/v1alpha1
kind: FlowMetric
metadata:
name: flowmetric-cluster-external-ingress-traffic
namespace: netobserv <1>
# ...
charts:
- dashboardName: Main <2>
title: External ingress traffic
unit: Bps
type: SingleStat
queries:
- promQL: "sum(rate($METRIC[2m]))"
legend: ""
- dashboardName: Main <2>
sectionName: External
title: Top external ingress traffic per workload
unit: Bps
type: StackArea
queries:
- promQL: "sum(rate($METRIC{DstK8S_Namespace!=\"\"}[2m])) by (DstK8S_Namespace, DstK8S_OwnerName)"
legend: "{{DstK8S_Namespace}} / {{DstK8S_OwnerName}}"
# ...
----
<1> The `FlowMetric` resources need to be created in the namespace defined in the `FlowCollector` `spec.namespace`, which is `netobserv` by default.
.Verification
. Once the pods refresh, navigate to *Observe* -> *Dashboards*.
. Search for the *NetObserv / Main* dashboard. View two panels under the *NetObserv / Main* dashboard, or optionally a dashboard name that you create:
* A textual single statistic showing the global external ingress rate summed across all dimensions
* A timeseries graph showing the same metric per destination workload
For more information about the query language, refer to the link:https://prometheus.io/docs/prometheus/latest/querying/basics/[Prometheus documentation].
====
.Chart for RTT latency for cluster external ingress traffic
[%collapsible]
====
[source,yaml]
----
apiVersion: flows.netobserv.io/v1alpha1
kind: FlowMetric
metadata:
name: flowmetric-cluster-external-ingress-traffic
namespace: netobserv <1>
# ...
charts:
- dashboardName: Main <2>
title: External ingress TCP latency
unit: seconds
type: SingleStat
queries:
- promQL: "histogram_quantile(0.99, sum(rate($METRIC_bucket[2m])) by (le)) > 0"
legend: "p99"
- dashboardName: Main <2>
sectionName: External
title: "Top external ingress sRTT per workload, p50 (ms)"
unit: seconds
type: Line
queries:
- promQL: "histogram_quantile(0.5, sum(rate($METRIC_bucket{DstK8S_Namespace!=\"\"}[2m])) by (le,DstK8S_Namespace,DstK8S_OwnerName))*1000 > 0"
legend: "{{DstK8S_Namespace}} / {{DstK8S_OwnerName}}"
- dashboardName: Main <2>
sectionName: External
title: "Top external ingress sRTT per workload, p99 (ms)"
unit: seconds
type: Line
queries:
- promQL: "histogram_quantile(0.99, sum(rate($METRIC_bucket{DstK8S_Namespace!=\"\"}[2m])) by (le,DstK8S_Namespace,DstK8S_OwnerName))*1000 > 0"
legend: "{{DstK8S_Namespace}} / {{DstK8S_OwnerName}}"
# ...
----
<1> The `FlowMetric` resources need to be created in the namespace defined in the `FlowCollector` `spec.namespace`, which is `netobserv` by default.
<2> Using a different `dashboardName` creates a new dashboard that is prefixed with `Netobserv`. For example, *Netobserv / <dashboard_name>*.
This example uses the `histogram_quantile` function to show `p50` and `p99`.
You can show averages of histograms by dividing the metric, `$METRIC_sum`, by the metric ,`$METRIC_count`, which are automatically generated when you create a histogram. With the preceding example, the Prometheus query to do this is as follows:
[source,yaml]
----
promQL: "(sum(rate($METRIC_sum{DstK8S_Namespace!=\"\"}[2m])) by (DstK8S_Namespace,DstK8S_OwnerName) / sum(rate($METRIC_count{DstK8S_Namespace!=\"\"}[2m])) by (DstK8S_Namespace,DstK8S_OwnerName))*1000"
----
.Verification
. Once the pods refresh, navigate to *Observe* -> *Dashboards*.
. Search for the *NetObserv / Main* dashboard. View the new panel under the *NetObserv / Main* dashboard, or optionally a dashboard name that you create.
For more information about the query language, refer to the link:https://prometheus.io/docs/prometheus/latest/querying/basics/[Prometheus documentation].
====

View File

@@ -9,105 +9,131 @@ The "Filter ID" column shows which related name to use when defining Quick Filte
The "Loki label" column is useful when querying Loki directly: label fields need to be selected using link:https://grafana.com/docs/loki/latest/logql/log_queries/#log-stream-selector[stream selectors].
The "Cardinality" column gives information about the implied metric cardinality if this field was to be used as a Prometheus label with the `FlowMetric` API. For more information, see the "FlowMetric API reference".
[cols="1,1,3,1,1",options="header"]
[cols="1,1,3,1,1,1",options="header"]
|===
| Name | Type | Description | Filter ID | Loki label
| Name | Type | Description | Filter ID | Loki label | Cardinality
| `Bytes`
| number
| Number of bytes
| n/a
| no
| avoid
| `DnsErrno`
| number
| Error number returned from DNS tracker ebpf hook function
| `dns_errno`
| no
| fine
| `DnsFlags`
| number
| DNS flags for DNS record
| n/a
| no
| fine
| `DnsFlagsResponseCode`
| string
| Parsed DNS header RCODEs name
| `dns_flag_response_code`
| no
| fine
| `DnsId`
| number
| DNS record id
| `dns_id`
| no
| avoid
| `DnsLatencyMs`
| number
| Time between a DNS request and response, in milliseconds
| `dns_latency`
| no
| avoid
| `Dscp`
| number
| Differentiated Services Code Point (DSCP) value
| `dscp`
| no
| fine
| `DstAddr`
| string
| Destination IP address (ipv4 or ipv6)
| `dst_address`
| no
| avoid
| `DstK8S_HostIP`
| string
| Destination node IP
| `dst_host_address`
| no
| fine
| `DstK8S_HostName`
| string
| Destination node name
| `dst_host_name`
| no
| fine
| `DstK8S_Name`
| string
| Name of the destination Kubernetes object, such as Pod name, Service name or Node name.
| `dst_name`
| no
| careful
| `DstK8S_Namespace`
| string
| Destination namespace
| `dst_namespace`
| yes
| fine
| `DstK8S_OwnerName`
| string
| Name of the destination owner, such as Deployment name, StatefulSet name, etc.
| `dst_owner_name`
| yes
| fine
| `DstK8S_OwnerType`
| string
| Kind of the destination owner, such as Deployment, StatefulSet, etc.
| `dst_kind`
| no
| fine
| `DstK8S_Type`
| string
| Kind of the destination Kubernetes object, such as Pod, Service or Node.
| `dst_kind`
| yes
| fine
| `DstK8S_Zone`
| string
| Destination availability zone
| `dst_zone`
| yes
| fine
| `DstMac`
| string
| Destination MAC address
| `dst_mac`
| no
| avoid
| `DstPort`
| number
| Destination port
| `dst_port`
| no
| careful
| `DstSubnetLabel`
| string
| Destination subnet label
| `dst_subnet_label`
| no
| fine
| `Duplicate`
| boolean
| Indicates if this flow was also captured from another interface on the same host
| n/a
| yes
| fine
| `Flags`
| number
| Logical OR combination of unique TCP flags comprised in the flow, as per RFC-9293, with additional custom flags to represent the following per-packet combinations: +
@@ -116,164 +142,202 @@ The "Loki label" column is useful when querying Loki directly: label fields need
- RST+ACK (0x400)
| n/a
| no
| fine
| `FlowDirection`
| number
| Flow direction from the node observation point. Can be one of: +
| Flow interpreted direction from the node observation point. Can be one of: +
- 0: Ingress (incoming traffic, from the node observation point) +
- 1: Egress (outgoing traffic, from the node observation point) +
- 2: Inner (with the same source and destination node)
| `direction`
| `node_direction`
| yes
| fine
| `IcmpCode`
| number
| ICMP code
| `icmp_code`
| no
| fine
| `IcmpType`
| number
| ICMP type
| `icmp_type`
| no
| `IfDirection`
| fine
| `IfDirections`
| number
| Flow direction from the network interface observation point. Can be one of: +
| Flow directions from the network interface observation point. Can be one of: +
- 0: Ingress (interface incoming traffic) +
- 1: Egress (interface outgoing traffic)
| n/a
| `ifdirections`
| no
| `Interface`
| fine
| `Interfaces`
| string
| Network interface
| `interface`
| Network interfaces
| `interfaces`
| no
| careful
| `K8S_ClusterName`
| string
| Cluster name or identifier
| `cluster_name`
| yes
| fine
| `K8S_FlowLayer`
| string
| Flow layer: 'app' or 'infra'
| `flow_layer`
| no
| fine
| `Packets`
| number
| Number of packets
| n/a
| no
| avoid
| `PktDropBytes`
| number
| Number of bytes dropped by the kernel
| n/a
| no
| avoid
| `PktDropLatestDropCause`
| string
| Latest drop cause
| `pkt_drop_cause`
| no
| fine
| `PktDropLatestFlags`
| number
| TCP flags on last dropped packet
| n/a
| no
| fine
| `PktDropLatestState`
| string
| TCP state on last dropped packet
| `pkt_drop_state`
| no
| fine
| `PktDropPackets`
| number
| Number of packets dropped by the kernel
| n/a
| no
| avoid
| `Proto`
| number
| L4 protocol
| `protocol`
| no
| fine
| `SrcAddr`
| string
| Source IP address (ipv4 or ipv6)
| `src_address`
| no
| avoid
| `SrcK8S_HostIP`
| string
| Source node IP
| `src_host_address`
| no
| fine
| `SrcK8S_HostName`
| string
| Source node name
| `src_host_name`
| no
| fine
| `SrcK8S_Name`
| string
| Name of the source Kubernetes object, such as Pod name, Service name or Node name.
| `src_name`
| no
| careful
| `SrcK8S_Namespace`
| string
| Source namespace
| `src_namespace`
| yes
| fine
| `SrcK8S_OwnerName`
| string
| Name of the source owner, such as Deployment name, StatefulSet name, etc.
| `src_owner_name`
| yes
| fine
| `SrcK8S_OwnerType`
| string
| Kind of the source owner, such as Deployment, StatefulSet, etc.
| `src_kind`
| no
| fine
| `SrcK8S_Type`
| string
| Kind of the source Kubernetes object, such as Pod, Service or Node.
| `src_kind`
| yes
| fine
| `SrcK8S_Zone`
| string
| Source availability zone
| `src_zone`
| yes
| fine
| `SrcMac`
| string
| Source MAC address
| `src_mac`
| no
| avoid
| `SrcPort`
| number
| Source port
| `src_port`
| no
| careful
| `SrcSubnetLabel`
| string
| Source subnet label
| `src_subnet_label`
| no
| fine
| `TimeFlowEndMs`
| number
| End timestamp of this flow, in milliseconds
| n/a
| no
| avoid
| `TimeFlowRttNs`
| number
| TCP Smoothed Round Trip Time (SRTT), in nanoseconds
| `time_flow_rtt`
| no
| avoid
| `TimeFlowStartMs`
| number
| Start timestamp of this flow, in milliseconds
| n/a
| no
| avoid
| `TimeReceived`
| number
| Timestamp when this flow was received and processed by the flow collector, in seconds
| n/a
| no
| avoid
| `_HashId`
| string
| In conversation tracking, the conversation identifier
| `id`
| no
| avoid
| `_RecordType`
| string
| Type of record: 'flowLog' for regular flow logs, or 'newConnection', 'heartbeat', 'endConnection' for conversation tracking
| `type`
| yes
| fine
|===

View File

@@ -0,0 +1,13 @@
// Module included in the following assemblies:
//
// * network_observability/network-observability-operator-monitoring.adoc
:_mod-docs-content-type: CONCEPT
[id="network-observability-health-alert-overview_{context}"]
= Health alerts
A health alert banner that directs you to the dashboard can appear on the *Network Traffic* and *Home* pages if an alert is triggered. Alerts are generated in the following cases:
* The `NetObservLokiError` alert occurs if the `flowlogs-pipeline` workload is dropping flows because of Loki errors, such as if the Loki ingestion rate limit has been reached.
* The `NetObservNoFlows` alert occurs if no flows are ingested for a certain amount of time.
* The `NetObservFlowsDropped` alert occurs if the Network Observability eBPF agent hashmap table is full, and the eBPF agent processes flows with degraded performance, or when the capacity limiter is triggered.

View File

@@ -0,0 +1,19 @@
// Module included in the following assemblies:
//
// * network_observability/network-observability-operator-monitoring.adoc
:_mod-docs-content-type: CONCEPT
[id="network-observability-health-dashboard-overview_{context}"]
= Health dashboards
Metrics about health and resource usage of the Network Observability Operator are located in the *Observe* -> *Dashboards* page in the web console. You can view metrics about the health of the Operator in the following categories:
* *Flows per second*
* *Sampling*
* *Errors last minute*
* *Dropped flows per second*
* *Flowlogs-pipeline statistics*
* *Flowlogs-pipleine statistics views*
* *eBPF agent statistics views*
* *Operator statistics*
* *Resource usage*

View File

@@ -5,13 +5,11 @@
:_mod-docs-content-type: REFERENCE
[id="network-observability-metrics_{context}"]
= Network Observability metrics
Metrics generated by the `flowlogs-pipeline` are configurable in the `spec.processor.metrics.includeList` of the `FlowCollector` custom resource to add or remove metrics.
You can also create alerts by using the `includeList` metrics in Prometheus rules, as shown in the example "Creating alerts".
When looking for these metrics in Prometheus, such as in the Console through Observe -> Metrics, or when defining alerts, all the metrics names are prefixed with `netobserv_. For example, `netobserv_namespace_flows_total. Available metrics names are as follows.
When looking for these metrics in Prometheus, such as in the Console through *Observe* -> *Metrics*, or when defining alerts, all the metrics names are prefixed with `netobserv_`. For example, `netobserv_namespace_flows_total`. Available metrics names are as follows:
== includeList metrics names
includeList metrics names::
Names followed by an asterisk `*` are enabled by default.
* `namespace_egress_bytes_total`
@@ -30,7 +28,7 @@ Names followed by an asterisk `*` are enabled by default.
* `workload_ingress_packets_total`
* `workload_flows_total`
=== PacketDrop metrics names
PacketDrop metrics names::
When the `PacketDrop` feature is enabled in `spec.agent.ebpf.features` (with `privileged` mode), the following additional metrics are available:
* `namespace_drop_bytes_total`
@@ -40,14 +38,14 @@ When the `PacketDrop` feature is enabled in `spec.agent.ebpf.features` (with `pr
* `workload_drop_bytes_total`
* `workload_drop_packets_total`
=== DNS metrics names
DNS metrics names::
When the `DNSTracking` feature is enabled in `spec.agent.ebpf.features`, the following additional metrics are available:
* `namespace_dns_latency_seconds` *
* `node_dns_latency_seconds`
* `workload_dns_latency_seconds`
=== FlowRTT metrics names
FlowRTT metrics names::
When the `FlowRTT` feature is enabled in `spec.agent.ebpf.features`, the following additional metrics are available:
* `namespace_rtt_seconds` *

View File

@@ -0,0 +1,14 @@
//Module included in the following assemblies:
//
// observability/network_observability/network-observability-cli/netobserv-cli-overview.adoc
:_mod-docs-content-type: CONCEPT
[id="network-observability-netoberv-cli-about_{context}"]
= About the Network Observability CLI
You can quickly debug and troubleshoot networking issues by using the Network Observability CLI (`oc netobserv`). The Network Observability CLI is a flow and packet visualization tool that relies on eBPF agents to stream collected data to an ephemeral collector pod. It requires no persistent storage during the capture. After the run, the output is transferred to your local machine. This enables quick, live insight into packets and flow data without installing the Network Observability Operator.
[IMPORTANT]
====
CLI capture is meant to run only for short durations, such as 8-10 minutes. If it runs for too long, it can be difficult to delete the running process.
====

View File

@@ -0,0 +1,19 @@
// Module included in the following assemblies:
// * observability/network_observability/netobserv_cli/netobserv-cli-install.adoc
:_mod-docs-content-type: PROCEDURE
[id="network-observability-cli-uninstall_{context}"]
= Cleaning the Network Observability CLI
You can manually clean the CLI workload by running `oc netobserv cleanup`. This command removes all the CLI components from your cluster.
When you end a capture, this command is run automatically by the client. You might be required to manually run it if you experience connectivity issues.
.Procedure
* Run the following command:
+
[source,terminal]
----
$ oc netobserv cleanup
----

View File

@@ -0,0 +1,54 @@
// Module included in the following assemblies:
// * observability/network_observability/netobserv_cli/netobserv-cli-install.adoc
:_mod-docs-content-type: PROCEDURE
[id="network-observability-cli-install_{context}"]
= Installing the Network Observability CLI
Installing the Network Observability CLI (`oc netobserv`) is a separate procedure from the Network Observability Operator installation. This means that, even if you have the Operator installed from OperatorHub, you need to install the CLI separately.
[NOTE]
====
You can optionally use Krew to install the `netobserv` CLI plugin. For more information, see "Installing a CLI plugin with Krew".
====
.Prerequisites
* You must install the {oc-first}.
* You must have a macOS or Linux operating system.
.Procedure
. Download the link:https://mirror.openshift.com/pub/openshift-v4/clients/netobserv/latest/[`oc netobserv` CLI tar file].
. Unpack the archive:
+
[source,terminal]
----
$ tar xvf netobserv-cli.tar.gz
----
. Make the file executable:
+
[source,terminal]
----
$ chmod +x ./build/oc-netobserv
----
. Move the extracted `netobserv-cli` binary to a directory that is on your `PATH`, such as `/usr/local/bin/`:
+
[source,terminal]
----
$ sudo mv ./build/oc-netobserv /usr/local/bin/
----
.Verification
* Verify that `oc netobserv` is available:
+
[source,terminal]
----
$ oc netobserv version
----
+
.Example output
[source,terminal]
----
Netobserv CLI version <version>
----

View File

@@ -0,0 +1,195 @@
// Module included in the following assemblies:
// * observability/network_observability/netobserv-cli-reference.adoc
:_mod-docs-content-type: REFERENCE
[id="network-observability-netobserv-cli-reference_{context}"]
= oc netobserv CLI reference
The Network Observability CLI (`oc netobserv`) is a CLI tool for capturing flow data and packet data for further analysis.
.`oc netobserv` syntax
[source,terminal]
----
$ oc netobserv [<command>] [<feature_option>] [<command_options>] <1>
----
<1> Feature options can only be used with the `oc netobserv flows` command. They cannot be used with the `oc netobserv packets` command.
[cols="3a,8a",options="header"]
.Basic commands
|===
|Command| Description
| `flows`
| Capture flows information. For subcommands, see the "Flow capture subcommands" table.
| `packets`
| Capture packets from a specific protocol or port pair, such as `netobserv packets --filter=tcp,80`. For more information about packet capture, see the "Packet capture subcommand" table.
| `cleanup`
| Remove the Network Observability CLI components.
| `version`
| Print the software version.
| `help`
| Show help.
|===
[id="network-observability-cli-enrichment_{context}"]
== Network Observability enrichment
The Network Observability enrichment to display zone, node, owner and resource names including optional features about packet drops, DNS latencies and Round-trip time can only be enabled when capturing flows. These do not appear in packet capture pcap output file.
.Network Observability enrichment syntax
[source,terminal]
----
$ oc netobserv flows [<enrichment_options>] [<subcommands>]
----
.Network Observability enrichment options
|===
|Option| Description| Possible values| Default
| `--enable_pktdrop`
| Enable packet drop.
| `true`, `false`
| `false`
| `--enable_rtt`
| Enable round trip time.
| `true`, `false`
| `false`
| `--enable_dns`
| Enable DNS tracking.
| `true`, `false`
| `false`
| `--help`
| Show help.
| -
| -
| `--interfaces`
| Interfaces to match on the flow. For example, `"eth0,eth1"`.
| `"<interface>"`
| -
|===
[id="cli-reference-flow-capture-options_{context}"]
== Flow capture options
Flow capture has mandatory commands as well as additional options, such as enabling extra features about packet drops, DNS latencies, Round-trip time, and filtering.
.`oc netobserv flows` syntax
[source,terminal]
----
$ oc netobserv flows [<feature_option>] [<command_options>]
----
.Flow capture filter options
|===
|Option| Description| Possible values| Mandatory| Default
| `--enable_filter`
| Enable flow filter.
| `true`, `false`
| Yes
| `false`
| `--action`
| Action to apply on the flow.
| `Accept`, `Reject`
| Yes
| `Accept`
| `--cidr`
| CIDR to match on the flow.
| `1.1.1.0/24`, `1::100/64`, or `0.0.0.0/0`
| Yes
| `0.0.0.0/0`
| `--protocol`
| Protocol to match on the flow
| `TCP`, `UDP`, `SCTP`, `ICMP`, or `ICMPv6`
| No
| -
| `--direction`
| Direction to match on the flow
| `Ingress`, `Egress`
| No
| -
| `--dport`
| Destination port to match on the flow.
| `80`, `443`, or `49051`
| no
| -
| `--sport`
| Source port to match on the flow.
| `80`, `443`, or `49051`
| No
| -
| `--port`
| Port to match on the flow.
| `80`, `443`, or `49051`
| No
| -
| `--sport_range`
| Source port range to match on the flow.
| `80-100` or `443-445`
| No
| -
| `--dport_range`
| Destination port range to match on the flow.
| `80-100`
| No
| -
| `--port_range`
| Port range to match on the flow.
| `80-100` or `443-445`
| No
| -
| `--icmp_type`
| ICMP type to match on the flow.
| `8` or `13`
| No
| -
| `--icmp_code`
| ICMP code to match on the flow.
| `0` or `1`
| No
| -
| `--peer_ip`
| Peer IP to match on the flow.
| `1.1.1.1` or `1::1`
| No
| -
|===
[id="cli-reference-packet-capture-options_{context}"]
== Packet capture options
You can filter on port and protocol for packet capture data.
.`oc netobserv packets` syntax
[source,terminal]
----
$ oc netobserv packets [<option>]
----
.Packet capture filter option
|===
|Option| Description| Possible values| Mandatory| Default
| `--filter`
| Enable packet capture filtering.
| `tcp`, `udp`, or `<port>` You can specify filtering options using a comma as delimeter. For example, `tcp,80` specifies the `tcp` protocol and port `80`.
| Yes
| -
|===

View File

@@ -0,0 +1,44 @@
// Module included in the following assemblies:
//
// network_observability/network-observability-scheduling-resources.adoc
:_mod-docs-content-type: CONCEPT
[id="network-observability-multi-tenancy{context}"]
= Network Observability deployment in specific nodes
You can configure the `FlowCollector` to control the deployment of Network Observability components in specific nodes. The `spec.agent.ebpf.advanced.scheduling`, `spec.processor.advanced.scheduling`, and `spec.consolePlugin.advanced.scheduling` specifications have the following configurable settings:
* `NodeSelector`
* `Tolerations`
* `Affinity`
* `PriorityClassName`
.Sample `FlowCollector` resource for `spec.<component>.advanced.scheduling`
[source,yaml]
----
apiVersion: flows.netobserv.io/v1beta2
kind: FlowCollector
metadata:
name: cluster
spec:
# ...
advanced:
scheduling:
tolerations:
- key: "<taint key>"
operator: "Equal"
value: "<taint value>"
effect: "<taint effect>"
nodeSelector:
<key>: <value>
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: name
operator: In
values:
- app-worker-node
priorityClassName: """
# ...
----

View File

@@ -18,7 +18,7 @@ The actual memory consumption of the Operator depends on your cluster size and t
* You must have `cluster-admin` privileges.
* One of the following supported architectures is required: `amd64`, `ppc64le`, `arm64`, or `s390x`.
* Any CPU supported by Red Hat Enterprise Linux (RHEL) 9.
* Must be configured with OVN-Kubernetes or OpenShift SDN as the main network plugin, and optionally using secondary interfaces, such as Multus and SR-IOV.
* Must be configured with OVN-Kubernetes or OpenShift SDN as the main network plugin, and optionally using secondary interfaces with Multus and SR-IOV.
[NOTE]
====

View File

@@ -28,7 +28,6 @@ metadata:
name: cluster
spec:
namespace: netobserv
deploymentModel: Direct
agent:
type: eBPF
ebpf:

View File

@@ -0,0 +1,9 @@
// Module included in the following assemblies:
//
// network_observability/metrics-alerts-dashboards.adoc
:_mod-docs-content-type: CONCEPT
[id="network-observability-predefined-metrics_{context}"]
= Predefined metrics
Metrics generated by the `flowlogs-pipeline` are configurable in the `spec.processor.metrics.includeList` of the `FlowCollector` custom resource to add or remove metrics.

View File

@@ -11,8 +11,8 @@ Query Options::
You can use *Query Options* to optimize the search results, as listed below:
** *Log Type*: The available options *Conversation* and *Flows* provide the ability to query flows by log type, such as flow log, new conversation, completed conversation, and a heartbeat, which is a periodic record with updates for long conversations. A conversation is an aggregation of flows between the same peers.
** *Duplicated flows*: A flow might be reported from several interfaces, and from both source and destination nodes, making it appear in the data several times. By selecting this query option, you can choose to show duplicated flows. Duplicated flows have the same sources and destinations, including ports, and also have the same protocols, with the exception of `Interface` and `Direction` fields. Duplicates are hidden by default. Use the *Direction* filter in the *Common* section of the dropdown list to switch between ingress and egress traffic.
** *Match filters*: You can determine the relation between different filter parameters selected in the advanced filter. The available options are *Match all* and *Match any*. *Match all* provides results that match all the values, and *Match any* provides results that match any of the values entered. The default value is *Match all*.
** *Datasource*: You can choose the datasource to use for queries: *Loki*, *Prometheus*, or *Auto*. Notable performance improvements can be realized when using Prometheus as a datasource rather than Loki, but Prometheus supports a limited set of filters and aggregations. The default datasource is *Auto*, which uses Prometheus on supported queries or uses Loki if the query does not support Prometheus.
** *Drops filter*: You can view different levels of dropped packets with the following query options:
*** *Fully dropped* shows flow records with fully dropped packets.
*** *Containing drops* shows flow records that contain drops but can be sent.

View File

@@ -3,24 +3,9 @@
// * network_observability/network-observability-operator-monitoring.adoc
:_mod-docs-content-type: PROCEDURE
[id="network-observability-alert-dashboard_{context}"]
[id="network-observability-dashboard-view_{context}"]
= Viewing health information
You can access metrics about health and resource usage of the Network Observability Operator from the *Dashboards* page in the web console. A health alert banner that directs you to the dashboard can appear on the *Network Traffic* and *Home* pages in the event that an alert is triggered. Alerts are generated in the following cases:
* The `NetObservLokiError` alert occurs if the `flowlogs-pipeline` workload is dropping flows because of Loki errors, such as if the Loki ingestion rate limit has been reached.
* The `NetObservNoFlows` alert occurs if no flows are ingested for a certain amount of time.
You can also view metrics about the health of the Operator in the following categories:
* *Flows*
* *Flows Overhead*
* *Top flow rates per source and destination nodes*
* *Top flow rates per source and destination namespaces*
* *Top flow rates per source and destination workloads*
* *Agents*
* *Processor*
* *Operator*
You can access metrics about health and resource usage of the Network Observability Operator from the *Dashboards* page in the web console.
.Prerequisites

View File

@@ -4,16 +4,25 @@
:_mod-docs-content-type: REFERENCE
[id="network-observability-without-loki_{context}"]
= Network Observability without Loki
You can use Network Observability without Loki by not performing the Loki installation steps and skipping directly to "Installing the Network Observability Operator". If you only want to export flows to a Kafka consumer or IPFIX collector, or you only need dashboard metrics, then you do not need to install Loki or provide storage for Loki. Without Loki, there is no *Network Traffic* panel under *Observe*, which means there is no overview charts, flow table, or topology. The following table compares available features with and without Loki:
You can use Network Observability without Loki by not performing the Loki installation steps and skipping directly to "Installing the Network Observability Operator". If you only want to export flows to a Kafka consumer or IPFIX collector, or you only need dashboard metrics, then you do not need to install Loki or provide storage for Loki. The following table compares available features with and without Loki.
.Comparison of feature availability with and without Loki
[options="header"]
|===
| | *With Loki* | *Without Loki*
| *Exporters* | image:check-solid.png[,10] | image:check-solid.png[,10]
| *Flow-based metrics and dashboards* | image:check-solid.png[,10] | image:check-solid.png[,10]
| *Traffic Flow Overview, Table and Topology views* | image:check-solid.png[,10] | image:x-solid.png[,10]
| *Quick Filters* | image:check-solid.png[,10] | image:x-solid.png[,10]
| *{product-title} console Network Traffic tab integration* | image:check-solid.png[,10] | image:x-solid.png[,10]
| *Exporters* | image:check-solid.png[,10] | image:check-solid.png[,10]
| *Multi-tenancy* | image:check-solid.png[,10] | image:x-solid.png[,10]
| *Complete filtering and aggregations capabilities* ^[1]^| image:check-solid.png[,10] | image:x-solid.png[,10]
| *Partial filtering and aggregations capabilities* ^[2]^ | image:check-solid.png[,10] | image:check-solid.png[,10]
| *Flow-based metrics and dashboards* | image:check-solid.png[,10] | image:check-solid.png[,10]
| *Traffic flows view overview* ^[3]^ | image:check-solid.png[,10] | image:check-solid.png[,10]
| *Traffic flows view table* | image:check-solid.png[,10] | image:x-solid.png[,10]
| *Topology view* | image:check-solid.png[,10] | image:check-solid.png[,10]
| *{product-title} console Network Traffic tab integration* | image:check-solid.png[,10] | image:check-solid.png[,10]
|===
[.small]
--
1. Such as per pod.
2. Such as per workload or namespace.
3. Statistics on packet drops are only available with Loki.
--

View File

@@ -23,7 +23,7 @@ metadata:
spec:
# ...
processor:
addZone: true
addZone: true
# ...
----

View File

@@ -0,0 +1,11 @@
:_mod-docs-content-type: ASSEMBLY
[id="flowmetric-api"]
= FlowMetric configuration parameters
include::_attributes/common-attributes.adoc[]
:context: network_observability
toc::[]
`FlowMetric` is the API allowing to create custom metrics from the collected flow logs.
include::modules/network-observability-flowmetric-api-specifications.adoc[leveloffset=+1]

View File

@@ -9,9 +9,19 @@ toc::[]
The Network Observability Operator uses the `flowlogs-pipeline` to generate metrics from flow logs. You can utilize these metrics by setting custom alerts and viewing dashboards.
include::modules/network-observability-viewing-dashboards.adoc[leveloffset=+1]
include::modules/network-observability-metrics.adoc[leveloffset=+1]
include::modules/network-observability-predefined-metrics.adoc[leveloffset=+1]
include::modules/network-observability-metrics-names.adoc[leveloffset=+1]
include::modules/network-observability-includelist-example.adoc[leveloffset=+1]
include::modules/network-observability-custom-metrics.adoc[leveloffset=+1]
include::modules/network-observability-configuring-custom-metrics.adoc[leveloffset=+1]
[IMPORTANT]
====
High cardinality can affect the memory usage of Prometheus. You can check whether specific labels have high cardinality in the the xref:../../observability/network_observability/json-flows-format-reference.adocl#network-observability-flows-format_json_reference[Network Flows format reference].
====
include::modules/network-observability-flowmetrics-charts.adoc[leveloffset=+1]
[role="_additional-resources"]
.Additional resources
* For more information about creating alerts that you can see on the dashboard, see xref:../../observability/monitoring/managing-alerts.adoc#creating-alerting-rules-for-user-defined-projects_managing-alerts[Creating alerting rules for user-defined projects].
* xref:../../observability/monitoring/managing-alerts.adoc#creating-alerting-rules-for-user-defined-projects_managing-alerts[Creating alerting rules for user-defined projects].
* xref:../../support/troubleshooting/investigating-monitoring-issues.adoc#determining-why-prometheus-is-consuming-disk-space_investigating-monitoring-issues[Troubleshooting high cardinality metrics- Determining why Prometheus is consuming a lot of disk space]

View File

@@ -0,0 +1 @@
../../../_attributes/

View File

@@ -0,0 +1 @@
../../../images/

View File

@@ -0,0 +1 @@
../../../modules/

View File

@@ -0,0 +1,20 @@
:_mod-docs-content-type: ASSEMBLY
[id="netobserv-cli-install"]
= Installing the Network Observability CLI
include::_attributes/common-attributes.adoc[]
:context: netobserv-cli-install
toc::[]
The Network Observability CLI (`oc netobserv`) is deployed separately from the Network Observability Operator. The CLI is available as an {oc-first} plugin. It provides a lightweight way to quickly debug and troubleshoot with network observability.
:FeatureName: Network Observability CLI (`oc netobserv`)
include::snippets/technology-preview.adoc[]
include::modules/network-observability-netobserv-cli-about.adoc[leveloffset=+1]
include::modules/network-observability-netobserv-cli-install.adoc[leveloffset=+1]
[role="_additional-resources"]
.Additional resources
* xref:../../../cli_reference/openshift_cli/extending-cli-plugins.adoc#cli-installing-plugins_cli-extend-plugins[Installing and using CLI plugins]
* xref:../../../cli_reference/openshift_cli/managing-cli-plugins-krew.adoc#cli-krew-install-plugin_managing-cli-plugins-krew[Installing a CLI plugin with Krew]

View File

@@ -0,0 +1,11 @@
:_mod-docs-content-type: ASSEMBLY
[id="netobserv-cli-reference"]
= Network Observability CLI (oc netobserv) reference
include::_attributes/common-attributes.adoc[]
:context: netobserv-cli-reference
toc::[]
The Network Observability CLI (`oc netobserv`) has most features and filtering options that are available for the Network Observability Operator. You can pass command line arguments to enable features or filtering options.
include::modules/network-observability-netobserv-cli-reference.adoc[leveloffset=+1]

View File

@@ -0,0 +1,17 @@
:_mod-docs-content-type: ASSEMBLY
[id="netobserv-cli-using"]
= Using the Network Observability CLI
include::_attributes/common-attributes.adoc[]
:context: netobserv-cli-using
toc::[]
You can visualize and filter the flows and packets data directly in the terminal to see specific usage, such as identifying who is using a specific port. The Network Observability CLI collects flows as JSON and database files or packets as a PCAP file, which you can use with third-party tools.
include::modules/network-observability-cli-capturing-flows.adoc[leveloffset=+1]
include::modules/network-observability-cli-capturing-packets.adoc[leveloffset=+1]
include::modules/network-observability-netobserv-cli-cleaning.adoc[leveloffset=+1]
[role=_additional_resources]
.Additional resources
* xref:../../../observability/network_observability/netobserv_cli/netobserv-cli-reference.adoc#network-observability-netobserv-cli-reference_netobserv-cli-reference[Network Observability CLI reference]

View File

@@ -0,0 +1 @@
../../../snippets/

View File

@@ -8,10 +8,12 @@ toc::[]
You can use the web console to monitor alerts related to the health of the Network Observability Operator.
include::modules/network-observability-health-dashboard-overview.adoc[leveloffset=+1]
include::modules/network-observability-health-alerts-overview.adoc[leveloffset=+1]
include::modules/network-observability-viewing-alerts.adoc[leveloffset=+1]
include::modules/network-observability-disabling-health-alerts.adoc[leveloffset=+2]
include::modules/network-observability-rate-limit-alert.adoc[leveloffset=+1]
include::modules/network-observability-ebpf-agent-alert.adoc[leveloffset=+1]
[role="_additional-resources"]
.Additional resources

View File

@@ -13,6 +13,73 @@ These release notes track the development of the Network Observability Operator
For an overview of the Network Observability Operator, see xref:../../observability/network_observability/network-observability-overview.adoc#dependency-network-observability[About Network Observability Operator].
[id="network-observability-operator-release-notes-1-6_{context}"]
== Network Observability Operator 1.6.0
The following advisory is available for the Network Observability Operator 1.6.0:
* link:https://access.redhat.com/errata/RHSA-2024:3868[Network Observability Operator 1.6.0]
[id="network-observability-operator-1.6.0-features-enhancements_{context}"]
=== New features and enhancements
[id="network-observability-lokiless-enhancements-1.6_{context}"]
==== Enhanced use of Network Observability Operator without Loki
You can now use Prometheus metrics and rely less on Loki for storage when using the Network Observability Operator. For more information, see xref:../../observability/network_observability/installing-operators.adoc#network-observability-without-loki_network_observability[Network Observability without Loki].
[id="network-observability-custom-metrics-1.6_{context}"]
==== Custom metrics API
You can create custom metrics out of flowlogs data by using the `FlowMetrics` API. Flowlogs data can be used with Prometheus labels to customize cluster information on your dashboards. You can add custom labels for any subnet that you want to identify in your flows and metrics. This enhancement can also be used to more easily identify external traffic by using the new labels `SrcSubnetLabel` and `DstSubnetLabel`, which exists both in flow logs and in metrics. Those fields are empty when there is external traffic, which gives a way to identify it. For more information, see xref:../../observability/network_observability/metrics-alerts-dashboards.adoc#network-observability-custom-metrics_metrics-dashboards-alerts[Custom metrics] and xref:../../observability/network_observability/flowmetric-api.adoc#flowmetric-flows-netobserv-io-v1alpha1[FlowMetric API reference].
[id="network-observability-eBPF-performance-enhancements-1.6_{context}"]
==== eBPF performance enhancements
Experience improved performances of the eBPF agent, in terms of CPU and memory, with the following updates:
* The eBPF agent now uses TCX webhooks instead of TC.
* The *NetObserv / Health* dashboard has a new section that shows eBPF metrics.
** Based on the new eBPF metrics, an alert notifies you when the eBPF agent is dropping flows.
* Loki storage demand decreases significantly now that duplicated flows are removed. Instead of having multiple, individual duplicated flows per network interface, there is one de-duplicated flow with a list of related network interfaces.
[IMPORTANT]
====
With the duplicated flows update, the *Interface* and *Interface Direction* fields in the *Network Traffic* table are renamed to *Interfaces* and *Interface Directions*, so any bookmarked *Quick filter* queries using these fields need to be updated to `interfaces` and `ifdirections`.
====
For more information, see xref:../../observability/network_observability/network-observability-operator-monitoring.adoc#network-observability-netobserv-dashboard-ebpf-agent-alerts_network_observability[Using the eBPF agent alert]
and xref:../../observability/network_observability/observing-network-traffic.adoc#network-observability-quickfilternw-observe-network-traffic[Quick filters].
[id="network-observability-ebpf-collection-filtering-1.6_{context}"]
==== eBPF collection rule-based filtering
You can use rule-based filtering to reduce the volume of created flows. When this option is enabled, the *Netobserv / Health* dashboard for eBPF agent statistics has the *Filtered flows rate* view. For more information, see xref:../../observability/network_observability/observing-network-traffic.adoc#network-observability-ebpf-flow-rule-filter_nw-observe-network-traffic[eBPF flow rule filter].
[id="network-observability-technology-preview-1.6_{context}"]
=== Technology Preview features
Some features in this release are currently in Technology Preview. These experimental features are not intended for production use. Note the following scope of support on the Red Hat Customer Portal for these features:
link:https://access.redhat.com/support/offerings/techpreview[Technology Preview Features Support Scope]
[id="network-observability-cli-1.6_{context}"]
==== Network Observability CLI
You can debug and troubleshoot network traffic issues without needing to install the Network Observability Operator by using the Network Observability CLI. Capture and visualize flow and packet data in real-time with no persistent storage requirement during the capture. For more information, see xref:../../observability/network_observability/netobserv_cli/netobserv-cli-install.adoc#network-observability-netoberv-cli-about_netobserv-cli-install[Network Observability CLI] and link:https://access.redhat.com/errata/RHEA-2024:3869[Network Observability CLI 1.6.0]
[id="network-observability-operator-1.6.0-bug-fixes_{context}"]
=== Bug fixes
* Previously, a dead link to the OpenShift containter platform documentation was displayed in the Operator Lifecycle Manager (OLM) form for the `FlowMetrics` API creation. Now the link has been updated to point to a valid page. (link:https://issues.redhat.com/browse/NETOBSERV-1607[*NETOBSERV-1607*])
* Previously, the Network Observability Operator description in the Operator Hub displayed a broken link to the documentation. With this fix, this link is restored. (link:https://issues.redhat.com/browse/NETOBSERV-1544[*NETOBSERV-1544*])
* Previously, if Loki was disabled and the Loki `Mode` was set to `LokiStack`, or if Loki manual TLS configuration was configured, the Network Observability Operator still tried to read the Loki CA certificates. With this fix, when Loki is disabled, the Loki certificates are not read, even if there are settings in the Loki configuration. (link:https://issues.redhat.com/browse/NETOBSERV-1647[*NETOBSERV-1647*])
* Previously, the `oc` `must-gather` plugin for the Network Observability Operator was only working on the `amd64` architecture and failing on all others because the plugin was using `amd64` for the `oc` binary. Now, the Network Observability Operator `oc` `must-gather` plugin collects logs on any architecture platform.
* Previously, when filtering on IP addresses using `not equal to`, the Network Observability Operator would return a request error.
Now, the IP filtering works in both `equal` and `not equal to` cases for IP addresses and ranges. (link:https://issues.redhat.com/browse/NETOBSERV-1630[*NETOBSERV-1630*])
* Previously, when a user was not an admin, the error messages were not consistent with the selected tab of the *Network Traffic* view in the web console. Now, the `user not admin` error displays on any tab with improved display.(link:https://issues.redhat.com/browse/NETOBSERV-1621[*NETOBSERV-1621*])
[id="network-observability-operator-1.6.0-known-issues_{context}"]
=== Known issues
* When the eBPF agent `PacketDrop` feature is enabled, and sampling is configured to a value greater than `1`, reported dropped bytes and dropped packets ignore the sampling configuration. While this is done on purpose to not miss any drops, a side effect is that the reported proportion of drops versus non-drops becomes biased. For example, at a very high sampling rate, such as `1:1000`, it is likely that almost all the traffic appears to be dropped when observed from the console plugin. (link:https://issues.redhat.com/browse/NETOBSERV-1676[*NETOBSERV-1676*])
* In the *Manage panels* pop-up window in the *Overview* tab, filtering on *total*, *bar*, *donut*, or *line* does not show any result. (link:https://issues.redhat.com/browse/NETOBSERV-1540[*NETOBSERV-1540*])
* The SR-IOV secondary interface is not detected if the interface was created first and then the eBPF agent was deployed. It is only detected if the agent was deployed first and then the SR-IOV interface is created. (link:https://issues.redhat.com/browse/NETOBSERV-1697[*NETOBSERV-1697*])
* When Loki is disabled, the *Topology* view in the OpenShift web console always shows the *Cluster* and *Zone* aggregation options in the slider beside the network topology diagram, even when the related features are not enabled. There is no specific workaround, besides ignoring these slider options. (link:https://issues.redhat.com/browse/NETOBSERV-1705[*NETOBSERV-1705*])
* When Loki is disabled, and the OpenShift web console first loads, it might display an error: `Request failed with status code 400 Loki is disabled`. As a workaround, you can continue switching content on the *Network Traffic* page, such as clicking between the *Topology* and the *Overview* tabs. The error should disappear. (link:https://issues.redhat.com/browse/NETOBSERV-1706[*NETOBSERV-1706*])
[id="network-observability-operator-release-notes-1-5"]
== Network Observability Operator 1.5.0
The following advisory is available for the Network Observability Operator 1.5.0:
@@ -90,7 +157,7 @@ With the `FlowCollector` `v1beta2` API update, you can configure the `spec.loki.
* Previously, the Operator bundle did not display some of the supported features by CSV annotations as expected, such as `features.operators.openshift.io/...`
With this fix, these annotations are set in the CSV as expected. (link:https://issues.redhat.com/browse/NETOBSERV-1305[*NETOBSERV-1305*])
* Previously, the `FlowCollector` status sometimes oscillated between `DeploymentInProgress` and `Ready` states during reconciliation.
With this fix, the status only becomes `Ready` when all the underlying components are fully ready.(link:https://issues.redhat.com/browse/NETOBSERV-1293[NETOBSERV-1293])
With this fix, the status only becomes `Ready` when all of the underlying components are fully ready. (link:https://issues.redhat.com/browse/NETOBSERV-1293[*NETOBSERV-1293*])
[id="network-observability-operator-1.5.0-known-issue"]
=== Known issues
@@ -307,7 +374,7 @@ The subscription of an installed Operator specifies an update channel that track
[id="health-alerts-feature-1.2"]
==== Network Observability health alerts
* The Network Observability Operator now creates automatic alerts if the `flowlogs-pipeline` is dropping flows because of errors at the write stage or if the Loki ingestion rate limit has been reached. For more information, see xref:../../observability/network_observability/network-observability-operator-monitoring.adoc#network-observability-alert-dashboard_network_observability[Viewing health information].
* The Network Observability Operator now creates automatic alerts if the `flowlogs-pipeline` is dropping flows because of errors at the write stage or if the Loki ingestion rate limit has been reached. For more information, see xref:../../observability/network_observability/network-observability-operator-monitoring.adoc#network-observability-health-dashboard-overview_network_observability[Health dashboards].
[id="network-observability-operator-1.2.0-bug-fixes"]
=== Bug fixes

View File

@@ -6,19 +6,18 @@ include::_attributes/common-attributes.adoc[]
toc::[]
Red Hat offers cluster administrators the Network Observability Operator to observe the network traffic for {product-title} clusters. The Network Observability Operator uses the eBPF technology to create network flows. The network flows are then enriched with {product-title} information and stored in Loki. You can view and analyze the stored network flows information in the {product-title} console for further insight and troubleshooting.
Red Hat offers cluster administrators the Network Observability Operator to observe the network traffic for {product-title} clusters. The Network Observability Operator uses the eBPF technology to create network flows. The network flows are then enriched with {product-title} information. They are available as Prometheus metrics or as logs in Loki. You can view and analyze the stored network flows information in the {product-title} console for further insight and troubleshooting.
[id="dependency-network-observability"]
== Optional dependencies of the Network Observability Operator
* {loki-op}: Loki is the backend that is used to store all collected flows. It is recommended to install Loki to use with the Network Observability Operator. You can choose to use xref:../network_observability/installing-operators.adoc#network-observability-without-loki_network_observability[Network Observability without Loki], but there are some considerations for doing this, as described in the linked section. If you choose to install Loki, it is recommended to use the {loki-op}, as it is supported by Red Hat.
* Grafana Operator: You can install Grafana for creating custom dashboards and querying capabilities, by using an open source product, such as the Grafana Operator. Red Hat does not support the Grafana Operator.
* {loki-op}: Loki is the backend that can be used to store all collected flows with a maximal level of details. You can choose to use xref:../network_observability/installing-operators.adoc#network-observability-without-loki_network_observability[Network Observability without Loki], but there are some considerations for doing this, as described in the linked section. If you choose to install Loki, it is recommended to use the {loki-op}, which is supported by Red Hat.
* AMQ Streams Operator: Kafka provides scalability, resiliency and high availability in the {product-title} cluster for large scale deployments. If you choose to use Kafka, it is recommended to use the AMQ Streams Operator, because it is supported by Red Hat.
[id="network-observability-operator"]
== Network Observability Operator
The Network Observability Operator provides the Flow Collector API custom resource definition. A Flow Collector instance is created during installation and enables configuration of network flow collection. The Flow Collector instance deploys pods and services that form a monitoring pipeline where network flows are then collected and enriched with the Kubernetes metadata before storing in Loki. The eBPF agent, which is deployed as a `daemonset` object, creates the network flows.
The Network Observability Operator provides the Flow Collector API custom resource definition. A Flow Collector instance is a cluster-scoped resource that enables configuration of network flow collection. The Flow Collector instance deploys pods and services that form a monitoring pipeline where network flows are then collected and enriched with the Kubernetes metadata before storing in Loki or generating Prometheus metrics. The eBPF agent, which is deployed as a `daemonset` object, creates the network flows.
[id="no-console-integration"]
== {product-title} console integration
@@ -28,17 +27,21 @@ The Network Observability Operator provides the Flow Collector API custom resour
[id="network-observability-dashboards"]
=== Network Observability metrics dashboards
On the *Overview* tab in the {product-title} console, you can view the overall aggregated metrics of the network traffic flow on the cluster. You can choose to display the information by node, namespace, owner, pod, zone, and service. Filters and display options can further refine the metrics. For more information, see xref:../network_observability/observing-network-traffic.adoc#network-observability-overview_nw-observe-network-traffic[Observing the network traffic from the Overview view].
On the *Overview* tab in the {product-title} console, you can view the overall aggregated metrics of the network traffic flow on the cluster. You can choose to display the information by zone, node, namespace, owner, pod, and service. Filters and display options can further refine the metrics. For more information, see xref:../network_observability/observing-network-traffic.adoc#network-observability-overview_nw-observe-network-traffic[Observing the network traffic from the Overview view].
In *Observe* -> *Dashboards*, the *Netobserv* dashboard provides a quick overview of the network flows in your {product-title} cluster. The *Netobserv/Health* dashboard provides metrics about the health of the Operator. For more information, see xref:../network_observability/metrics-alerts-dashboards.adoc#network-observability-metrics_metrics-dashboards-alerts[Network Observability Metrics] and xref:../network_observability/network-observability-operator-monitoring.adoc#network-observability-alert-dashboard_network_observability[Viewing health information].
In *Observe* -> *Dashboards*, the *Netobserv* dashboards provide a quick overview of the network flows in your {product-title} cluster. The *Netobserv/Health* dashboard provides metrics about the health of the Operator. For more information, see xref:../network_observability/metrics-alerts-dashboards.adoc#network-observability-metrics_metrics-dashboards-alerts[Network Observability Metrics] and xref:../network_observability/network-observability-operator-monitoring.adoc#network-observability-health-dashboard-overview_network_observability[Viewing health information].
[id="network-observability-topology-views"]
=== Network Observability topology views
The {product-title} console offers the *Topology* tab which displays a graphical representation of the network flows and the amount of traffic. The topology view represents traffic between the {product-title} components as a network graph. You can refine the graph by using the filters and display options. You can access the information for node, namespace, owner, pod, and service.
The {product-title} console offers the *Topology* tab which displays a graphical representation of the network flows and the amount of traffic. The topology view represents traffic between the {product-title} components as a network graph. You can refine the graph by using the filters and display options. You can access the information for zone, node, namespace, owner, pod, and service.
[id="traffic-flow-tables"]
=== Traffic flow tables
The traffic flow table view provides a view for raw flows, non aggregated filtering options, and configurable columns. The {product-title} console offers the *Traffic flows* tab which displays the data of the network flows and the amount of traffic.
[id="network-observability-cli"]
== Network Observability CLI
You can quickly debug and troubleshoot networking issues with Network Observability by using the Network Observability CLI (`oc netobserv`). The Network Observability CLI is a flow and packet visualization tool that relies on eBPF agents to stream collected data to an ephemeral collector pod. It requires no persistent storage during the capture. After the run, the output is transferred to your local machine. This enables quick, live insight into packets and flow data without installing the Network Observability Operator.

View File

@@ -0,0 +1,19 @@
:_mod-docs-content-type: ASSEMBLY
[id="network-observability-scheduling-resources"]
= Scheduling resources
include::_attributes/common-attributes.adoc[]
:context: network_observability_scheduling
Taints and tolerations allow the node to control which pods should (or should not) be scheduled on them.
A node selector specifies a map of key/value pairs that are defined using custom labels on nodes and selectors specified in pods.
For the pod to be eligible to run on a node, the pod must have the same key/value node selector as the label on the node.
include::modules/network-observability-nodes-taints-tolerations.adoc[leveloffset=+1]
[role="_additional-resources"]
.Additional resources
* xref:../../nodes/scheduling/nodes-scheduler-taints-tolerations.adoc#nodes-scheduler-taints-tolerations-about_nodes-scheduler-taints-tolerations[Understanding taints and tolerations]
* link:https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/[Assign Pods to Nodes] (Kubernetes documentation)
* link:https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/#priorityclass[Pod Priority and Preemption] (Kubernetes documentation)

View File

@@ -32,6 +32,15 @@ include::modules/network-observability-RTT-overview.adoc[leveloffset=+2]
.Additional resources
* xref:../../observability/network_observability/observing-network-traffic.adoc#network-observability-RTT_nw-observe-network-traffic[Working with RTT tracing]
include::modules/network-observability-ebpf-rule-flow-filter.adoc[leveloffset=+2]
include::modules/network-observability-flow-filter-parameters.adoc[leveloffset=+3]
[role="_additional-resources"]
.Additional resources
* xref:../../observability/network_observability/observing-network-traffic.adoc#network-observability-filtering-ebpf-rule_nw-observe-network-traffic[Filtering eBPF flow data with rules]
* xref:../../observability/network_observability/metrics-alerts-dashboards.adoc#network-observability-metrics_metrics-dashboards-alerts[Network Observability metrics]
* xref:../../observability/network_observability/network-observability-operator-monitoring.adoc#network-observability-health-dashboard-overview_network_observability[Health dashboards]
//Traffic flows
include::modules/network-observability-trafficflow.adoc[leveloffset=+1]
include::modules/network-observability-working-with-trafficflow.adoc[leveloffset=+2]
@@ -42,6 +51,7 @@ include::modules/network-observability-dns-tracking.adoc[leveloffset=+2]
include::modules/network-observability-RTT.adoc[leveloffset=+2]
include::modules/network-observability-histogram-trafficflow.adoc[leveloffset=+2]
include::modules/network-observability-working-with-zones.adoc[leveloffset=+2]
include::modules/network-observability-filtering-ebpf-rule.adoc[leveloffset=+2]
//Topology
include::modules/network-observability-topology.adoc[leveloffset=+1]