feat: span processor api refactor #2962

paullegranddc · 2025-05-13T16:00:39Z

Design discussion issue (if applicable)

Changes

This PR refactors the SpanProcessor API.

The API currently has the following issues:

It's not possible to read a span fields in a processor without calling export_data, which completely copies the data
The SpanProcessor::on_end API means owned span data has to be passed to each span processor, even if it does nothing with it.

One way that was explored to fix number 2. was to pass a &mut Span to on_end do processor can use std::mem::take to consume data without copying, but this is not ideal for at least two reasons:

The otel sdk spec specifically forbids mutation of the span in on_end, and modifications done in one processor should not impact data passed to other processors
If there are multiple processors, and any processor but the last one grabs the data and leaves the span empty, this will cause bugs dependent on span processor ordering...

This PR thus proposes the following changes changes to the API:

Introduce a `ReadableSpan` trait, implemented for SdkSpan.

This trait define getters for span fields so we can read the span without copying or consuming data. This is inspired by other Otel libraries such as Java which has a similar interface.

Modify `on_end` to pass it a `FinishedSpan`.

FinishedSpan is a new abstraction around a span that has been closed. It implements the ReadableSpan trait so we can read from it without consuming the data.
if a span processor needs to get an owned SpanData on it. This will do one of two things:

if this is the last span processor that will be invoked, it gives ownership of the span data without copying
otherwise it copies the span data

This design

allows not copying in span processors that don't need owned data during on_end
doesn't expose mutable spans in on_end
eludes the span data copy if the exporter is the last span processor (which is usually the case).

Add `on_ending` method on thr `SpanProcessor` trait. [Spec link

](https://github.com/open-telemetry/opentelemetry-specification/blob/e5bc8e18e647a47b25d264b0c67ce6c0d0e1ec93/specification/trace/sdk.md#onending)
This method is called during span ending, before on_end. it is given a mutable span and is very useful to implement obfuscation for instance.
It makes possible the use cases that were the motivation for passing mutable data to on_end while respecting the spec.

Benchmarking

A benchmark running creating a span and dropping it, invoking the on_start, on_ending and on_end methods for a varying number of span processors has been added.

Baseline - main branch:

SpanProcessorApi/0_processors
    time:   [339.66 ns 340.56 ns 341.47 ns]
SpanProcessorApi/1_processors
    time:   [373.10 ns 374.36 ns 375.60 ns]
SpanProcessorApi/2_processors
    time:   [803.10 ns 804.99 ns 807.03 ns]
SpanProcessorApi/4_processors
    time:   [1.2096 µs 1.2137 µs 1.2179 µs]

Candidate - paullegranddc:paullgdc/sdk/span_processor_api_refactor:

SpanProcessorApi/0_processors
    time:   [385.15 ns 386.14 ns 387.25 ns]
SpanProcessorApi/1_processors
    time:   [385.73 ns 387.17 ns 388.85 ns]
SpanProcessorApi/2_processors
    time:   [384.84 ns 385.66 ns 386.50 ns]
SpanProcessorApi/4_processors
    time:   [386.78 ns 388.17 ns 389.58 ns]

On main, the cost of a span lifecycle is proportional to the number of span processors, because the Clone operation on span data is as expensive as creating the span.

After this PR, since span processors don't clone data unless they need it, the cost is constant regardless of the number of span processors.

Merge requirement checklist

CONTRIBUTING guidelines followed
Unit tests added/updated (if applicable)
Appropriate CHANGELOG.md files updated for non-trivial, user-facing changes
Changes in public API reviewed (if applicable)

Currently there is no way to read the span data without cloning it. This causes performance issues when the span for span processors that need to read the span data. This commit introduces a new trait `ReadableSpan` in the sdk implemented by SDK Spans that allows to read the span data without cloning it.

This API allows to mutations of the span when it is ending. It's marked as on development in the spec, but it is useful for span obfuscation for example, which needs to done after attributes can added to the span anymore.

codecov · 2025-05-13T20:13:15Z

Codecov Report

Attention: Patch coverage is 44.76190% with 116 lines in your changes missing coverage. Please review.

Project coverage is 81.1%. Comparing base (8fe3dcc) to head (1d3e6d8).
Report is 8 commits behind head on main.

Files with missing lines	Patch %	Lines
opentelemetry-sdk/src/trace/span.rs	35.1%	116 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##            main   #2962     +/-   ##
=======================================
- Coverage   81.4%   81.1%   -0.4%     
=======================================
  Files        126     126             
  Lines      24305   24465    +160     
=======================================
+ Hits       19808   19860     +52     
- Misses      4497    4605    +108

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

scottgerring

Hey @paullegranddc thanks for opening the PR!
I've had a quick zoom through now and have some comments and queries.

This feels like a sensible way of addressing the problems you mentioned (and highlighted in the examples 👍 ). Of course it will break the existing on_end, but as we're not stable here, yet and there's a clear upshot, from my perspective that seems reasonable. I think @cijothomas will also have helpful opinions here - Cijo?

Adding on_ending seems like a no brainer.

It would be great to look at some big downstream projects (tracing-opentelemetry comes to mind), and check if they are impacted. It would also be great to have some perf numbers from benchmarks touching the path through ensure_ended_and_exported which you may already have handy?

scottgerring · 2025-05-20T10:54:02Z

opentelemetry-sdk/src/trace/span_processor.rs

@@ -79,11 +79,24 @@ pub trait SpanProcessor: Send + Sync + std::fmt::Debug {
    /// synchronously on the thread that started the span, therefore it should
    /// not block or throw exceptions.
    fn on_start(&self, span: &mut Span, cx: &Context);
+    #[cfg(feature = "experimental_span_processor_on_ending")]
+    /// `on_ending` is called when a `Span` is ending. The en timestampe has already


Suggested change

/// `on_ending` is called when a `Span` is ending. The en timestampe has already

/// `on_ending` is called when a `Span` is ending. The end timestamp has already

We should also clarify that these are called before on_end - just to make the ordering super explicit

I reformulated the comment, it should be a bit clearer

scottgerring · 2025-05-20T10:56:56Z

opentelemetry-sdk/src/trace/span_processor.rs

    /// `on_end` is called after a `Span` is ended (i.e., the end timestamp is
    /// already set). This method is called synchronously within the `Span::end`
    /// API, therefore it should not block or throw an exception.
-    /// TODO - This method should take reference to `SpanData`
-    fn on_end(&self, span: SpanData);
+    fn on_end(&self, span: &mut FinishedSpan);


Noting this is a breaking change

scottgerring · 2025-05-20T11:16:17Z

opentelemetry-sdk/src/trace/span.rs

+    /// * if it called twice in the same SpanProcessor::on_end
+    pub fn consume(&mut self) -> crate::trace::SpanData {
+        if self.is_consummed {
+            panic!("Span data has already been consumed");


I reckon this shouldn't panic; we can use the internal logging macros to emit an error

scottgerring · 2025-05-20T11:18:48Z

examples/tracing-http-propagator/src/server.rs

+    }
+
+    fn on_end(&self, span: &mut FinishedSpan) {
+        if !matches!(span.span_kind(), SpanKind::Server) {


Nice motivating example for the on_end changes!

examples/tracing-http-propagator/src/server.rs

scottgerring · 2025-05-20T11:39:47Z

opentelemetry-sdk/benches/batch_span_processor.rs

@@ -4,10 +4,10 @@ use opentelemetry::trace::{
    SpanContext, SpanId, SpanKind, Status, TraceFlags, TraceId, TraceState,


Have you run the benchmark suite to do a performance regression against main ? Would be great to include.

Yes, I've run the existing BatchSpanProcessor benchmarks.
With experimental_span_processor_on_ending not enabled, the benchmark show no detectable changes in performance.

If the experimental_span_processor_on_ending feature is enabled, there is a slight cost (5%) which I believe comes from the fact we have to clone the tracer field, because we have to iterate over the span processor list through a shared reference, while passing the span mutably. Thus we have to clone the tracer associated with the span so we can split the ownership (it's in an Arc cell so this is not that expensive)

scottgerring · 2025-05-20T12:03:31Z

opentelemetry-sdk/src/trace/span.rs

+            Some(data) => data,
+            None => return,
+        };
+        let span_context: SpanContext =


It would be helpful to comment around this bit to make it easier to re-grok -- "take ownership of the span_context leaving an empty context in its place; this saves us a clone" or so, right?

scottgerring · 2025-05-20T12:09:42Z

opentelemetry-sdk/src/trace/span.rs

+/// If `consume`` is never called, the on_ending method will not perform any copy of
+/// the span data.
+///
+/// Calling any `ReadableSpan` method on the `FinishedSpan` will panic if the span data


We should probably error out through the internal logging macros here and not panic. We try to avoid panic-ing wherever we can.

opentelemetry-sdk/src/trace/span.rs

Adds a benchmark creating a span and dropping it, without an exporter. The goal is to estimate the cost of the SDK running without the exporter

paullegranddc added 2 commits May 13, 2025 17:26

feat: add an experimental on_ending api

a0a8c23

This API allows to mutations of the span when it is ending. It's marked as on development in the spec, but it is useful for span obfuscation for example, which needs to done after attributes can added to the span anymore.

paullegranddc requested a review from a team as a code owner May 13, 2025 16:00

paullegranddc added 3 commits May 13, 2025 18:09

fix: formatting and clippy

54fa6f2

Merge branch 'main' into paullgdc/sdk/span_processor_api_refactor

445aa66

fix: some SpanProcessor calls were not finished

b72b031

paullegranddc added 3 commits May 14, 2025 11:58

fix: lint

afb4197

fix: lint clippy

4954b0d

Merge branch 'main' into paullgdc/sdk/span_processor_api_refactor

1d3e6d8

scottgerring added the performance label May 20, 2025

scottgerring closed this May 20, 2025

scottgerring reopened this May 20, 2025

scottgerring reviewed May 20, 2025

View reviewed changes

scottgerring closed this May 20, 2025

scottgerring reopened this May 20, 2025

feat: add span processor benchmarks

224a7ed

Adds a benchmark creating a span and dropping it, without an exporter. The goal is to estimate the cost of the SDK running without the exporter

paullegranddc mentioned this pull request May 20, 2025

feat: add span processor benchmarks #2980

Closed

4 tasks

paullegranddc added 5 commits May 20, 2025 17:18

fix: remove unised import

a62a9bd

fix: typos, so many typos...

5ce1ba4

fix: insert count as number instead of string in example

5dae8a7

fix: make on_ending contract more explicit

a28a461

fix: on_ending comment

edb290a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: span processor api refactor #2962

feat: span processor api refactor #2962

paullegranddc commented May 13, 2025 •

edited

Loading

codecov bot commented May 13, 2025 •

edited

Loading

scottgerring left a comment •

edited

Loading

scottgerring May 20, 2025

scottgerring May 20, 2025

paullegranddc May 20, 2025

scottgerring May 20, 2025

scottgerring May 20, 2025

scottgerring May 20, 2025

scottgerring May 20, 2025

paullegranddc May 20, 2025

scottgerring May 20, 2025

scottgerring May 20, 2025

	/// `on_ending` is called when a `Span` is ending. The en timestampe has already
	/// `on_ending` is called when a `Span` is ending. The end timestamp has already

		@@ -4,10 +4,10 @@ use opentelemetry::trace::{
		SpanContext, SpanId, SpanKind, Status, TraceFlags, TraceId, TraceState,

feat: span processor api refactor #2962

Are you sure you want to change the base?

feat: span processor api refactor #2962

Conversation

paullegranddc commented May 13, 2025 • edited Loading

Design discussion issue (if applicable)

Changes

Introduce a ReadableSpan trait, implemented for SdkSpan.

Modify on_end to pass it a FinishedSpan.

Add on_ending method on thr SpanProcessor trait. [Spec link

Benchmarking

Merge requirement checklist

codecov bot commented May 13, 2025 • edited Loading

Codecov Report

scottgerring left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paullegranddc commented May 13, 2025 •

edited

Loading

Introduce a `ReadableSpan` trait, implemented for SdkSpan.

Modify `on_end` to pass it a `FinishedSpan`.

Add `on_ending` method on thr `SpanProcessor` trait. [Spec link

codecov bot commented May 13, 2025 •

edited

Loading

scottgerring left a comment •

edited

Loading