1
0
mirror of https://github.com/openshift/openshift-docs.git synced 2026-02-05 12:46:18 +01:00

TELCODOCS-395: restore file

This commit is contained in:
Tony Mulqueen
2022-05-31 14:36:06 +01:00
parent 662155beaf
commit c7a660ef93
10 changed files with 371 additions and 26 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 59 KiB

BIN
images/dpdk_line_rate.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 47 KiB

View File

@@ -0,0 +1,31 @@
// Module included in the following assemblies:
//
// * networking/hardware_networks/using-dpdk-and-rdma.adoc
:_content-type: CONCEPT
[id="nw-sriov-example-dpdk-line-rate_{context}"]
= Overview of achieving a specific DPDK line rate
To achieve a specific Data Plane Development Kit (DPDK) line rate, deploy a Node Tuning Operator and configure Single Root I/O Virtualization (SR-IOV). You must also tune the DPDK settings for the following resources:
- Isolated CPUs
- Hugepages
- The topology scheduler
[NOTE]
====
In previous versions of {product-title}, the Performance Addon Operator was used to implement automatic tuning to achieve low latency performance for {product-title} applications. In {product-title} 4.11 and later, this functionality is part of the Node Tuning Operator.
====
.DPDK test environment
The following diagram shows the components of a traffic-testing environment:
image::261_OpenShift_DPDK_0722.png[DPDK test environment]
- **Traffic generator**: An application that can generate high-volume packet traffic.
- **SR-IOV-supporting NIC**: A network interface card compatible with SR-IOV. The card runs a number of virtual functions on a physical interface.
- **Physical Function (PF)**: A PCI Express (PCIe) function of a network adapter that supports the SR-IOV interface.
- **Virtual Function (VF)**: A lightweight PCIe function on a network adapter that supports SR-IOV. The VF is associated with the PCIe PF on the network adapter. The VF represents a virtualized instance of the network adapter.
- **Switch**: A network switch. Nodes can also be connected back-to-back.
- **`testpmd`**: An example application included with DPDK. The `testpmd` application can be used to test the DPDK in a packet-forwarding mode. The `testpmd` application is also an example of how to build a fully-fledged application using the DPDK Software Development Kit (SDK).
- **worker 0** and **worker 1**: {product-title} nodes.

View File

@@ -0,0 +1,40 @@
// Module included in the following assemblies:
//
// * networking/hardware_networks/using-dpdk-and-rdma.adoc
:_content-type: REFERENCE
[id="nw-sriov-create-object_{context}"]
= Example SR-IOV network operator
The following is an example definition of an `sriovNetwork` object. In this case, Intel and Mellanox configurations are identical:
[source,yaml]
----
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
name: dpdk-network-1
namespace: openshift-sriov-network-operator
spec:
ipam: '{"type": "host-local","ranges": [[{"subnet": "10.0.1.0/24"}]],"dataDir":
"/run/my-orchestrator/container-ipam-state-1"}' <1>
networkNamespace: dpdk-test <2>
spoofChk: "off"
trust: "on"
resourceName: dpdk_nic_1 <3>
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
name: dpdk-network-2
namespace: openshift-sriov-network-operator
spec:
ipam: '{"type": "host-local","ranges": [[{"subnet": "10.0.2.0/24"}]],"dataDir":
"/run/my-orchestrator/container-ipam-state-1"}'
networkNamespace: dpdk-test
spoofChk: "off"
trust: "on"
resourceName: dpdk_nic_2
----
<1> You can use a different IP Address Management (IPAM) implementation, such as Whereabouts. For more information, see _Dynamic IP address assignment configuration with Whereabouts_.
<2> You must request the `networkNamespace` where the network attachment definition will be created. You must create the `sriovNetwork` CR under the `openshift-sriov-network-operator` namespace.
<3> The `resourceName` value must match that of the `resourceName` created under the `sriovNetworkNodePolicy`.

View File

@@ -0,0 +1,82 @@
// Module included in the following assemblies:
//
// * networking/hardware_networks/using-dpdk-and-rdma.adoc
:_content-type: REFERENCE
[id="nw-sriov-dpdk-base-workload_{context}"]
= Example DPDK base workload
The following is an example of a Data Plane Development Kit (DPDK) container:
[source,yaml]
----
apiVersion: v1
kind: Namespace
metadata:
name: dpdk-test
---
apiVersion: v1
kind: Pod
metadata:
annotations:
k8s.v1.cni.cncf.io/networks: '[ <1>
{
"name": "dpdk-network-1",
"namespace": "dpdk-test"
},
{
"name": "dpdk-network-2",
"namespace": "dpdk-test"
}
]'
irq-load-balancing.crio.io: "disable" <2>
cpu-load-balancing.crio.io: "disable"
cpu-quota.crio.io: "disable"
labels:
app: dpdk
name: testpmd
namespace: dpdk-test
spec:
runtimeClassName: performance-performance <3>
containers:
- command:
- /bin/bash
- -c
- sleep INF
image: registry.redhat.io/openshift4/dpdk-base-rhel8
imagePullPolicy: Always
name: dpdk
resources: <4>
limits:
cpu: "16"
hugepages-1Gi: 8Gi
memory: 2Gi
requests:
cpu: "16"
hugepages-1Gi: 8Gi
memory: 2Gi
securityContext:
capabilities:
add:
- IPC_LOCK
- SYS_RESOURCE
- NET_RAW
- NET_ADMIN
runAsUser: 0
volumeMounts:
- mountPath: /mnt/huge
name: hugepages
terminationGracePeriodSeconds: 5
volumes:
- emptyDir:
medium: HugePages
name: hugepages
----
<1> Request the SR-IOV networks you need. Resources for the devices will be injected automatically.
<2> Disable the CPU and IRQ load balancing base. See _Disabling interrupt processing for individual pods_ for more information.
<3> Set the `runtimeClass` to `performance-performance`. Do not set the `runtimeClass` to `HostNetwork` or `privileged`.
<4> Request an equal number of resources for requests and limits to start the pod with `Guaranteed` Quality of Service (QoS).
[NOTE]
====
Do not start the pod with `SLEEP` and then exec into the pod to start the testpmd or the DPDK workload. This can add additional interrupts as the `exec` process is not pinned to any CPU.
====

View File

@@ -6,15 +6,17 @@
[id="example-vf-use-in-dpdk-mode-mellanox_{context}"]
= Using a virtual function in DPDK mode with a Mellanox NIC
You can create a network node policy and create a Data Plane Development Kit (DPDK) pod using a virtual function in DPDK mode with a Mellanox NIC.
.Prerequisites
* Install the OpenShift CLI (`oc`).
* Install the SR-IOV Network Operator.
* Log in as a user with `cluster-admin` privileges.
* You have installed the OpenShift CLI (`oc`).
* You have installed the Single Root I/O Virtualization (SR-IOV) Network Operator.
* You have logged in as a user with `cluster-admin` privileges.
.Procedure
. Create the following `SriovNetworkNodePolicy` object, and then save the YAML in the `mlx-dpdk-node-policy.yaml` file.
. Save the following `SriovNetworkNodePolicy` YAML configuration to an `mlx-dpdk-node-policy.yaml` file:
+
[source,yaml]
----
@@ -37,16 +39,16 @@ spec:
deviceType: netdevice <2>
isRdma: true <3>
----
<1> Specify the device hex code of the SR-IOV network device. The only allowed values for Mellanox cards are `1015`, `1017`.
<2> Specify the driver type for the virtual functions to `netdevice`. Mellanox SR-IOV VF can work in DPDK mode without using the `vfio-pci` device type. VF device appears as a kernel network interface inside a container.
<3> Enable RDMA mode. This is required by Mellanox cards to work in DPDK mode.
<1> Specify the device hex code of the SR-IOV network device. The only allowed values for Mellanox cards are `1015` and `1017`.
<2> Specify the driver type for the virtual functions to `netdevice`. A Mellanox SR-IOV Virtual Function (VF) can work in DPDK mode without using the `vfio-pci` device type. The VF device appears as a kernel network interface inside a container.
<3> Enable Remote Direct Memory Access (RDMA) mode. This is required for Mellanox cards to work in DPDK mode.
+
[NOTE]
=====
See the `Configuring SR-IOV network devices` section for detailed explanation on each option in `SriovNetworkNodePolicy`.
See _Configuring an SR-IOV network device_ for a detailed explanation of each option in the `SriovNetworkNodePolicy` object.
When applying the configuration specified in a `SriovNetworkNodePolicy` object, the SR-IOV Operator may drain the nodes, and in some cases, reboot nodes.
It may take several minutes for a configuration change to apply.
When applying the configuration specified in an `SriovNetworkNodePolicy` object, the SR-IOV Operator might drain the nodes, and in some cases, reboot nodes.
It might take several minutes for a configuration change to apply.
Ensure that there are enough available nodes in your cluster to handle the evicted workload beforehand.
After the configuration update is applied, all the pods in the `openshift-sriov-network-operator` namespace will change to a `Running` status.
@@ -59,7 +61,7 @@ After the configuration update is applied, all the pods in the `openshift-sriov-
$ oc create -f mlx-dpdk-node-policy.yaml
----
. Create the following `SriovNetwork` object, and then save the YAML in the `mlx-dpdk-network.yaml` file.
. Save the following `SriovNetwork` YAML configuration to an `mlx-dpdk-network.yaml` file:
+
[source,yaml]
----
@@ -71,27 +73,27 @@ metadata:
spec:
networkNamespace: <target_namespace>
ipam: |- <1>
# ...
...
vlan: <vlan>
resourceName: mlxnics
----
<1> Specify a configuration object for the ipam CNI plug-in as a YAML block scalar. The plug-in manages IP address assignment for the attachment definition.
<1> Specify a configuration object for the IP Address Management (IPAM) Container Network Interface (CNI) plug-in as a YAML block scalar. The plug-in manages IP address assignment for the attachment definition.
+
[NOTE]
=====
See the "Configuring SR-IOV additional network" section for a detailed explanation on each option in `SriovNetwork`.
See _Configuring an SR-IOV network device_ for a detailed explanation on each option in the `SriovNetwork` object.
=====
+
An optional library, app-netutil, provides several API methods for gathering network information about a container's parent pod.
The `app-netutil` option library provides several API methods for gathering network information about the parent pod of a container.
. Create the `SriovNetworkNodePolicy` object by running the following command:
. Create the `SriovNetwork` object by running the following command:
+
[source,terminal]
----
$ oc create -f mlx-dpdk-network.yaml
----
. Save the following `Pod` YAML configuration to an `mlx-dpdk-pod.yaml` file:
. Create the following `Pod` spec, and then save the YAML in the `mlx-dpdk-pod.yaml` file.
+
[source,yaml]
----
@@ -130,13 +132,13 @@ spec:
emptyDir:
medium: HugePages
----
<1> Specify the same `target_namespace` where `SriovNetwork` object `mlx-dpdk-network` is created. If you would like to create the pod in a different namespace, change `target_namespace` in both `Pod` spec and `SriovNetowrk` object.
<2> Specify the DPDK image which includes your application and the DPDK library used by application.
<1> Specify the same `target_namespace` where `SriovNetwork` object `mlx-dpdk-network` is created. To create the pod in a different namespace, change `target_namespace` in both the `Pod` spec and `SriovNetwork` object.
<2> Specify the DPDK image which includes your application and the DPDK library used by the application.
<3> Specify additional capabilities required by the application inside the container for hugepage allocation, system resource allocation, and network interface access.
<4> Mount the hugepage volume to the DPDK pod under `/dev/hugepages`. The hugepage volume is backed by the emptyDir volume type with the medium being `Hugepages`.
<5> Optional: Specify the number of DPDK devices allocated to the DPDK pod. This resource request and limit, if not explicitly specified, will be automatically added by SR-IOV network resource injector. The SR-IOV network resource injector is an admission controller component managed by SR-IOV Operator. It is enabled by default and can be disabled by setting the `enableInjector` option to `false` in the default `SriovOperatorConfig` CR.
<6> Specify the number of CPUs. The DPDK pod usually requires exclusive CPUs be allocated from kubelet. This is achieved by setting CPU Manager policy to `static` and creating a pod with `Guaranteed` QoS.
<7> Specify hugepage size `hugepages-1Gi` or `hugepages-2Mi` and the quantity of hugepages that will be allocated to DPDK pod. Configure `2Mi` and `1Gi` hugepages separately. Configuring `1Gi` hugepage requires adding kernel arguments to Nodes.
<4> Mount the hugepage volume to the DPDK pod under `/dev/hugepages`. The hugepage volume is backed by the `emptyDir` volume type with the medium being `Hugepages`.
<5> Optional: Specify the number of DPDK devices allocated for the DPDK pod. If not explicitly specified, this resource request and limit is automatically added by the SR-IOV network resource injector. The SR-IOV network resource injector is an admission controller component managed by SR-IOV Operator. It is enabled by default and can be disabled by setting the `enableInjector` option to `false` in the default `SriovOperatorConfig` CR.
<6> Specify the number of CPUs. The DPDK pod usually requires that exclusive CPUs be allocated from the kubelet. To do this, set the CPU Manager policy to `static` and create a pod with `Guaranteed` Quality of Service (QoS).
<7> Specify hugepage size `hugepages-1Gi` or `hugepages-2Mi` and the quantity of hugepages that will be allocated to the DPDK pod. Configure `2Mi` and `1Gi` hugepages separately. Configuring `1Gi` hugepages requires adding kernel arguments to Nodes.
. Create the DPDK pod by running the following command:
+

View File

@@ -0,0 +1,21 @@
// Module included in the following assemblies:
//
// * networking/hardware_networks/using-dpdk-and-rdma.adoc
:_content-type: REFERENCE
[id="nw-sriov-dpdk-running-testpmd_{context}"]
= Example testpmd script
The following is an example script for running `testpmd`:
[source,terminal]
----
#!/bin/bash
set -ex
export CPU=$(cat /sys/fs/cgroup/cpuset/cpuset.cpus)
echo ${CPU}
dpdk-testpmd -l ${CPU} -a ${PCIDEVICE_OPENSHIFT_IO_DPDK_NIC_1} -a ${PCIDEVICE_OPENSHIFT_IO_DPDK_NIC_2} -n 4 -- -i --nb-cores=15 --rxd=4096 --txd=4096 --rxq=7 --txq=7 --forward-mode=mac --eth-peer=0,50:00:00:00:00:01 --eth-peer=1,50:00:00:00:00:02
----
This example uses two different `sriovNetwork` CRs. The environment variable contains the Virtual Function (VF) PCI address that was allocated for the pod. If you use the same network in the pod definition, you must split the `pciAddress`.
It is important to configure the correct MAC addresses of the traffic generator. This example uses custom MAC addresses.

View File

@@ -0,0 +1,58 @@
// Module included in the following assemblies:
//
// * networking/hardware_networks/using-dpdk-and-rdma.adoc
:_content-type: PROCEDURE
[id="nw-example-dpdk-line-rate_{context}"]
= Using SR-IOV and the Node Tuning Operator to achieve a DPDK line rate
You can use the Node Tuning Operator to configure isolated CPUs, hugepages, and a topology scheduler.
You can then use the Node Tuning Operator with Single Root I/O Virtualization (SR-IOV) to achieve a specific Data Plane Development Kit (DPDK) line rate.
.Prerequisites
* You have installed the OpenShift CLI (`oc`).
* You have installed the SR-IOV Network Operator.
* You have logged in as a user with `cluster-admin` privileges.
* You have deployed a standalone Node Tuning Operator.
+
[NOTE]
====
In previous versions of {product-title}, the Performance Addon Operator was used to implement automatic tuning to achieve low latency performance for OpenShift applications. In {product-title} 4.11 and later, this functionality is part of the Node Tuning Operator.
====
.Procedure
. Create a `PerformanceProfile` object based on the following example:
+
[source,yaml]
----
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
name: performance
spec:
globallyDisableIrqLoadBalancing: true
cpu:
isolated: 21-51,73-103 <1>
reserved: 0-20,52-72 <2>
hugepages:
defaultHugepagesSize: 1G <3>
pages:
- count: 32
size: 1G
net:
userLevelNetworking: true
numa:
topologyPolicy: "single-numa-node"
nodeSelector:
node-role.kubernetes.io/worker-cnf: ""
----
<1> If hyperthreading is enabled on the system, allocate the relevant symbolic links to the `isolated` and `reserved` CPU groups. If the system contains multiple non-uniform memory access nodes (NUMAs), allocate CPUs from both NUMAs to both groups. You can also use the Performance Profile Creator for this task. For more information, see _Creating a performance profile_.
<2> You can also specify a list of devices that will have their queues set to the reserved CPU count. For more information, see _Reducing NIC queues using the Node Tuning Operator_.
<3> Allocate the number and size of hugepages needed. You can specify the NUMA configuration for the hugepages. By default, the system allocates an even number to every NUMA node on the system. If needed, you can request the use of a realtime kernel for the nodes. See _Provisioning a worker with real-time capabilities_ for more information.
. Save the `yaml` file as `mlx-dpdk-perfprofile-policy.yaml`.
. Apply the performance profile using the following command:
+
[source,terminal]
----
$ oc create -f mlx-dpdk-perfprofile-policy.yaml
----

View File

@@ -0,0 +1,93 @@
// Module included in the following assemblies:
//
// * networking/hardware_networks/using-dpdk-and-rdma.adoc
:_content-type: REFERENCE
[id="nw-sriov-network-operator_{context}"]
= Example SR-IOV Network Operator for virtual functions
You can use the Single Root I/O Virtualization (SR-IOV) Network Operator to allocate and configure Virtual Functions (VFs) from SR-IOV-supporting Physical Function NICs on the nodes.
For more information on deploying the Operator, see _Installing the SR-IOV Network Operator_.
For more information on configuring an SR-IOV network device, see _Configuring an SR-IOV network device_.
There are some differences between running Data Plane Development Kit (DPDK) workloads on Intel VFs and Mellanox VFs. This section provides object configuration examples for both VF types.
The following is an example of an `sriovNetworkNodePolicy` object used to run DPDK applications on Intel NICs:
[source,yaml]
----
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: dpdk-nic-1
namespace: openshift-sriov-network-operator
spec:
deviceType: vfio-pci <1>
needVhostNet: true <2>
nicSelector:
pfNames: ["ens3f0"]
nodeSelector:
node-role.kubernetes.io/worker-cnf: ""
numVfs: 10
priority: 99
resourceName: dpdk_nic_1
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: dpdk-nic-1
namespace: openshift-sriov-network-operator
spec:
deviceType: vfio-pci
needVhostNet: true
nicSelector:
pfNames: ["ens3f1"]
nodeSelector:
node-role.kubernetes.io/worker-cnf: ""
numVfs: 10
priority: 99
resourceName: dpdk_nic_2
----
<1> For Intel NICs, `deviceType` must be `vfio-pci`.
<2> If kernel communication with DPDK workloads is required, add `needVhostNet: true`. This mounts the `/dev/net/tun` and `/dev/vhost-net` devices into the container so the application can create a tap device and connect the tap device to the DPDK workload.
The following is an example of an `sriovNetworkNodePolicy` object for Mellanox NICs:
[source,yaml]
----
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: dpdk-nic-1
namespace: openshift-sriov-network-operator
spec:
deviceType: netdevice <1>
isRdma: true <2>
nicSelector:
rootDevices:
- "0000:5e:00.1"
nodeSelector:
node-role.kubernetes.io/worker-cnf: ""
numVfs: 5
priority: 99
resourceName: dpdk_nic_1
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: dpdk-nic-2
namespace: openshift-sriov-network-operator
spec:
deviceType: netdevice
isRdma: true
nicSelector:
rootDevices:
- "0000:5e:00.0"
nodeSelector:
node-role.kubernetes.io/worker-cnf: ""
numVfs: 5
priority: 99
resourceName: dpdk_nic_2
----
<1> For Mellanox devices the `deviceType` must be `netdevice`.
<2> For Mellanox devices `isRdma` must be `true`.
Mellanox cards are connected to DPDK applications using Flow Bifurcation. This mechanism splits traffic between Linux user space and kernel space, and can enhance line rate processing capability.

View File

@@ -6,7 +6,6 @@ include::_attributes/common-attributes.adoc[]
toc::[]
The containerized Data Plane Development Kit (DPDK) application is supported on {product-title}. You can use Single Root I/O Virtualization (SR-IOV) network hardware with the Data Plane Development Kit (DPDK) and with remote direct memory access (RDMA).
For information on supported devices, refer to xref:../../networking/hardware_networks/about-sriov.adoc#supported-devices_about-sriov[Supported devices].
@@ -15,6 +14,18 @@ include::modules/nw-sriov-dpdk-example-intel.adoc[leveloffset=+1]
include::modules/nw-sriov-dpdk-example-mellanox.adoc[leveloffset=+1]
include::modules/nw-sriov-concept-dpdk-line-rate.adoc[leveloffset=+1]
include::modules/nw-sriov-example-dpdk-line-rate.adoc[leveloffset=+1]
include::modules/nw-sriov-network-operator.adoc[leveloffset=+2]
include::modules/nw-sriov-create-object.adoc[leveloffset=+2]
include::modules/nw-sriov-dpdk-base-workload.adoc[leveloffset=+2]
include::modules/nw-sriov-dpdk-running-testpmd.adoc[leveloffset=+2]
[id="example-vf-use-in-rdma-mode-mellanox_{context}"]
== Using a virtual function in RDMA mode with a Mellanox NIC
@@ -28,7 +39,14 @@ include::modules/nw-sriov-rdma-example-mellanox.adoc[tag=content]
[id="additional-resources_using-dpdk-and-rdma"]
== Additional resources
* xref:../../networking/hardware_networks/configuring-sriov-net-attach.adoc#configuring-sriov-net-attach[Configuring an SR-IOV Ethernet network attachment].
* The xref:../../networking/hardware_networks/about-sriov.adoc#nw-sriov-app-netutil_about-sriov[app-netutil library], provides several API methods for gathering network information about a container's parent pod.
* xref:../../scalability_and_performance/cnf-create-performance-profiles.adoc#cnf-about-the-profile-creator-tool_cnf-create-performance-profiles[Creating a performance profile]
* xref:../../scalability_and_performance/cnf-low-latency-tuning.adoc#adjusting-nic-queues-with-the-performance-profile_cnf-master[Reducing NIC queues using the Node Tuning Operator]
* xref:../../scalability_and_performance/cnf-low-latency-tuning.adoc#performance-addon-operator-provisioning-worker-with-real-time-capabilities_cnf-master[Provisioning a worker with real-time capabilities]
* xref:../../networking/hardware_networks/installing-sriov-operator.adoc#installing-sr-iov-operator_installing-sriov-operator[Installing the SR-IOV Network Operator]
* xref:../../networking/hardware_networks/configuring-sriov-device.adoc#nw-sriov-networknodepolicy-object_configuring-sriov-device[Configuring an SR-IOV network device]
* xref:../../networking/multiple_networks/configuring-additional-network.adoc#nw-multus-whereabouts_configuring-additional-network[Dynamic IP address assignment configuration with Whereabouts]
* xref:../../scalability_and_performance/cnf-low-latency-tuning.adoc#disabling_interrupt_processing_for_individual_pods_cnf-master[Disabling interrupt processing for individual pods]
* xref:../../networking/hardware_networks/configuring-sriov-net-attach.adoc#configuring-sriov-net-attach[Configuring an SR-IOV Ethernet network attachment]
* The xref:../../networking/hardware_networks/about-sriov.adoc#nw-sriov-app-netutil_about-sriov[app-netutil library] provides several API methods for gathering network information about a container's parent pod.
:!FeatureName: