mirror of
https://github.com/openshift/openshift-docs.git
synced 2026-02-05 12:46:18 +01:00
21 lines
1.5 KiB
Plaintext
21 lines
1.5 KiB
Plaintext
// Module included in the following assemblies:
|
|
//
|
|
// * ai_workloads/leader_worker_set/index.adoc
|
|
|
|
:_mod-docs-content-type: CONCEPT
|
|
[id="lws-about_{context}"]
|
|
= About the {lws-operator}
|
|
|
|
The {lws-operator} is based on the link:https://lws.sigs.k8s.io/[LeaderWorkerSet] open source project. `LeaderWorkerSet` is a custom Kubernetes API that can be used to deploy a group of pods as a unit. This is useful for artificial intelligence (AI) and machine learning (ML) inference workloads, where large language models (LLMs) are sharded across multiple nodes.
|
|
|
|
With the `LeaderWorkerSet` API, pods are grouped into units consisting of one leader and multiple workers, all managed together as a single entity. Each pod in a group has a unique pod identity. Pods within a group are created in parallel and share identical lifecycle stages. Rollouts, rolling updates, and pod failure restarts are performed as a group.
|
|
|
|
In the `LeaderWorkerSet` configuration, you define the size of the groups and the number of group replicas. If necessary, you can define separate templates for leader and worker pods, allowing for role-specific customization. You can also configure topology-aware placement, so that pods in the same group are co-located in the same topology.
|
|
|
|
[IMPORTANT]
|
|
====
|
|
Before you install the {lws-operator}, you must install the {cert-manager-operator} because it is required to configure services and manage metrics collection.
|
|
====
|
|
|
|
Monitoring for the {lws-operator} is provided by default with {product-title} through Prometheus.
|