openshift-docs/modules/otel-processors-probabilistic-sampling-processor.adoc

// Module included in the following assemblies:
//
// * observability/otel/otel-collector/otel-collector-processors.adoc

:_mod-docs-content-type: REFERENCE
[id="otel-processors-probabilistic-sampling-processor_{context}"]
= Probabilistic Sampling Processor

[role="_abstract"]
If you handle high volumes of telemetry data and seek to reduce costs by reducing processed data volumes, you can use the Probabilistic Sampling Processor as an alternative to the Tail Sampling Processor.

:FeatureName: Probabilistic Sampling Processor
include::snippets/technology-preview.adoc[]

The processor samples a specified percentage of trace spans or log records statelessly and per request.

The processor adds the information about the used effective sampling probability into the telemetry data:

* In trace spans, the processor encodes the threshold and optional randomness information in the W3C Trace Context `tracestate` fields.

* In log records, the processor encodes the threshold and randomness information as attributes.

The following is an example `OpenTelemetryCollector` custom resource configuration for the Probabilistic Sampling Processor for sampling trace spans:

[source,yaml]
----
# ...
  config:
    processors:
      probabilistic_sampler: # <1>
        sampling_percentage: 15.3 # <2>
        mode: "proportional" # <3>
        hash_seed: 22 # <4>
        sampling_precision: 14 # <5>
        fail_closed: true # <6>
# ...
service:
  pipelines:
    traces:
      processors: [probabilistic_sampler]
# ...
----
<1> For trace pipelines, the source of randomness is the hashed value of the span trace ID.
<2> Required. Accepts a 32-bit floating-point percentage value at which spans are to be sampled.
<3> Optional. Accepts a supported string value for a sampling logic mode: the default `hash_seed`, `proportional`, or `equalizing`. The `hash_seed` mode applies the Fowler–Noll–Vo (FNV) hash function to the trace ID and weighs the hashed value against the sampling percentage value. You can also use the `hash_seed` mode with units of telemetry other than the trace ID. The `proportional` mode samples a strict, probability-based ratio of the total span quantity, and is based on the OpenTelemetry and World Wide Web Consortium specifications. The `equalizing` mode is useful for lowering the sampling probability to a minimum value across a whole pipeline or applying a uniform sampling probability in Collector deployments where client SDKs have mixed sampling configurations.
<4> Optional. Accepts a 32-bit unsigned integer, which is used to compute the hash algorithm. When this field is not configured, the default seed value is `0`. If you use multiple tiers of Collector instances, you must configure all Collectors of the same tier to the same seed value.
<5> Optional. Determines the number of hexadecimal digits used to encode the sampling threshold. Accepts an integer value. The supported values are `1`-`14`. The default value `4` causes the threshold to be rounded if it contains more than 16 significant bits, which is the case of the `proportional` mode that uses 56 bits. If you select the `proportional` mode, use a greater value for the purpose of preserving precision applied by preceding samplers.
<6> Optional. Rejects spans with sampling errors. Accepts a boolean value. The default value is `true`.

The following is an example `OpenTelemetryCollector` custom resource configuration for the Probabilistic Sampling Processor for sampling log records:

[source,yaml]
----
# ...
  config:
    processors:
      probabilistic_sampler/logs:
        sampling_percentage: 15.3 # <1>
        mode: "hash_seed" # <2>
        hash_seed: 22 # <3>
        sampling_precision: 4 # <4>
        attribute_source: "record" # <5>
        from_attribute: "<log_record_attribute_name>" # <6>
        fail_closed: true # <7>
# ...
service:
  pipelines:
    logs:
      processors: [ probabilistic_sampler/logs ]
# ...
----
<1> Required. Accepts a 32-bit floating-point percentage value at which spans are to be sampled.
<2> Optional. Accepts a supported string value for a sampling logic mode: the default `hash_seed`, `equalizing`, or `proportional`. The `hash_seed` mode applies the Fowler–Noll–Vo (FNV) hash function to the trace ID or a specified log record attribute and then weighs the hashed value against the sampling percentage value. You can also use `hash_seed` mode with other units of telemetry than trace ID, for example to use the `service.instance.id` resource attribute for collecting log records from a percentage of pods. The `equalizing` mode is useful for lowering the sampling probability to a minimum value across a whole pipeline or applying a uniform sampling probability in Collector deployments where client SDKs have mixed sampling configurations. The `proportional` mode samples a strict, probability-based ratio of the total span quantity, and is based on the OpenTelemetry and World Wide Web Consortium specifications.
<3> Optional. Accepts a 32-bit unsigned integer, which is used to compute the hash algorithm. When this field is not configured, the default seed value is `0`. If you use multiple tiers of Collector instances, you must configure all Collectors of the same tier to the same seed value.
<4> Optional. Determines the number of hexadecimal digits used to encode the sampling threshold. Accepts an integer value. The supported values are `1`-`14`. The default value `4` causes the threshold to be rounded if it contains more than 16 significant bits, which is the case of the `proportional` mode that uses 56 bits. If you select the `proportional` mode, use a greater value for the purpose of preserving precision applied by preceding samplers.
<5> Optional. Defines where to look for the log record attribute in `from_attribute`. The log record attribute is used as the source of randomness. Accept the default `traceID` value or the `record` value.
<6> Optional. The name of a log record attribute to be used to compute the sampling hash, such as a unique log record ID. Accepts a string value. The default value is `""`. Use this field only if you need to specify a log record attribute as the source of randomness in those situations where the trace ID is absent or trace ID sampling is disabled or the `attribute_source` field is set to the `record` value.
<7> Optional. Rejects spans with sampling errors. Accepts a boolean value. The default value is `true`.