-
Notifications
You must be signed in to change notification settings - Fork 522
feat: span processor api refactor #2962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: span processor api refactor #2962
Conversation
Currently there is no way to read the span data without cloning it. This causes performance issues when the span for span processors that need to read the span data. This commit introduces a new trait `ReadableSpan` in the sdk implemented by SDK Spans that allows to read the span data without cloning it.
This API allows to mutations of the span when it is ending. It's marked as on development in the spec, but it is useful for span obfuscation for example, which needs to done after attributes can added to the span anymore.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2962 +/- ##
=======================================
- Coverage 81.4% 81.1% -0.4%
=======================================
Files 126 126
Lines 24305 24465 +160
=======================================
+ Hits 19808 19860 +52
- Misses 4497 4605 +108 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @paullegranddc thanks for opening the PR!
I've had a quick zoom through now and have some comments and queries.
This feels like a sensible way of addressing the problems you mentioned (and highlighted in the examples 👍 ). Of course it will break the existing on_end
, but as we're not stable here, yet and there's a clear upshot, from my perspective that seems reasonable. I think @cijothomas will also have helpful opinions here - Cijo?
Adding on_ending
seems like a no brainer.
It would be great to look at some big downstream projects (tracing-opentelemetry
comes to mind), and check if they are impacted. It would also be great to have some perf numbers from benchmarks touching the path through ensure_ended_and_exported
which you may already have handy?
@@ -79,11 +79,24 @@ pub trait SpanProcessor: Send + Sync + std::fmt::Debug { | |||
/// synchronously on the thread that started the span, therefore it should | |||
/// not block or throw exceptions. | |||
fn on_start(&self, span: &mut Span, cx: &Context); | |||
#[cfg(feature = "experimental_span_processor_on_ending")] | |||
/// `on_ending` is called when a `Span` is ending. The en timestampe has already |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// `on_ending` is called when a `Span` is ending. The en timestampe has already | |
/// `on_ending` is called when a `Span` is ending. The end timestamp has already |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also clarify that these are called before on_end
- just to make the ordering super explicit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reformulated the comment, it should be a bit clearer
/// `on_end` is called after a `Span` is ended (i.e., the end timestamp is | ||
/// already set). This method is called synchronously within the `Span::end` | ||
/// API, therefore it should not block or throw an exception. | ||
/// TODO - This method should take reference to `SpanData` | ||
fn on_end(&self, span: SpanData); | ||
fn on_end(&self, span: &mut FinishedSpan); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noting this is a breaking change
/// * if it called twice in the same SpanProcessor::on_end | ||
pub fn consume(&mut self) -> crate::trace::SpanData { | ||
if self.is_consummed { | ||
panic!("Span data has already been consumed"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reckon this shouldn't panic; we can use the internal logging macros to emit an error
} | ||
|
||
fn on_end(&self, span: &mut FinishedSpan) { | ||
if !matches!(span.span_kind(), SpanKind::Server) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice motivating example for the on_end
changes!
@@ -4,10 +4,10 @@ use opentelemetry::trace::{ | |||
SpanContext, SpanId, SpanKind, Status, TraceFlags, TraceId, TraceState, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you run the benchmark suite to do a performance regression against main
? Would be great to include.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I've run the existing BatchSpanProcessor
benchmarks.
With experimental_span_processor_on_ending
not enabled, the benchmark show no detectable changes in performance.
If the experimental_span_processor_on_ending
feature is enabled, there is a slight cost (5%) which I believe comes from the fact we have to clone the tracer field, because we have to iterate over the span processor list through a shared reference, while passing the span mutably. Thus we have to clone the tracer associated with the span so we can split the ownership (it's in an Arc
cell so this is not that expensive)
Some(data) => data, | ||
None => return, | ||
}; | ||
let span_context: SpanContext = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be helpful to comment around this bit to make it easier to re-grok -- "take ownership of the span_context leaving an empty context in its place; this saves us a clone" or so, right?
/// If `consume`` is never called, the on_ending method will not perform any copy of | ||
/// the span data. | ||
/// | ||
/// Calling any `ReadableSpan` method on the `FinishedSpan` will panic if the span data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably error out through the internal logging macros here and not panic. We try to avoid panic-ing wherever we can.
Adds a benchmark creating a span and dropping it, without an exporter. The goal is to estimate the cost of the SDK running without the exporter
Design discussion issue (if applicable)
#2940
Changes
This PR refactors the SpanProcessor API.
The API currently has the following issues:
export_data
, which completely copies the dataSpanProcessor::on_end
API means owned span data has to be passed to each span processor, even if it does nothing with it.One way that was explored to fix number 2. was to pass a
&mut Span
toon_end
do processor can use std::mem::take to consume data without copying, but this is not ideal for at least two reasons:on_end
, and modifications done in one processor should not impact data passed to other processorsThis PR thus proposes the following changes changes to the API:
Introduce a
ReadableSpan
trait, implemented for SdkSpan.This trait define getters for span fields so we can read the span without copying or consuming data. This is inspired by other Otel libraries such as Java which has a similar interface.
Modify
on_end
to pass it aFinishedSpan
.FinishedSpan
is a new abstraction around a span that has been closed. It implements theReadableSpan
trait so we can read from it without consuming the data.if a span processor needs to get an owned SpanData on it. This will do one of two things:
This design
on_end
on_end
Add
on_ending
method on thrSpanProcessor
trait. [Spec link](https://github.com/open-telemetry/opentelemetry-specification/blob/e5bc8e18e647a47b25d264b0c67ce6c0d0e1ec93/specification/trace/sdk.md#onending)
This method is called during span ending, before
on_end
. it is given a mutable span and is very useful to implement obfuscation for instance.It makes possible the use cases that were the motivation for passing mutable data to
on_end
while respecting the spec.Benchmarking
A benchmark running creating a span and dropping it, invoking the
on_start
,on_ending
andon_end
methods for a varying number of span processors has been added.On main, the cost of a span lifecycle is proportional to the number of span processors, because the
Clone
operation on span data is as expensive as creating the span.After this PR, since span processors don't clone data unless they need it, the cost is constant regardless of the number of span processors.
Merge requirement checklist
CHANGELOG.md
files updated for non-trivial, user-facing changes