-
-
Notifications
You must be signed in to change notification settings - Fork 328
Feature Request: Optional OpenTelemetry Integration for Observability and Performance Tuning #2958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Probably, especially the distributed traces. I'm guessing the most common usage of OTel would be in combination with a Cloud object store. Those libraries should already have OTel available, which will provide traces for the actual HTTP calls made to Blob Storage. The bit of context we'd be able to layer on top is that a particular I don't think that I'm not sure what metrics we'd want to export, if any. I would need to think about that a bit more. One note on the implementation: there's a split between whether libraries implement OTel natively or whether it's done through an "instrumentation library" https://opentelemetry.io/docs/languages/python/libraries/. Instrumentation libraries, like
I'm a big fan of structlog. In this case, though, I feel like logs (structured or otherwise) will be much more valuable if they can be correlated with logs from the storage provider. I have a decent amount of experience with OpenTelemetry and would be happy to provide reviews. |
Thanks @TomAugspurger for the thoughtful reply! I'm still thinking on it but a monkey-patching approach could be a good fit here.
I did some basic experiments with a store wrapper last week and found even the store-level traces to be quite interesting -- illuminating the async behavior in some stores and the blocking behavior in others. In a perfect world, we could also instrument calling libraries (like Xarray). That would allow us to really understand behavior and usage all the way through the stack.
Me too! I've used ASGI correlation ID approaches with structlog elsewhere. The tricks for us would be a) defining the context to correlate under and b) passing that on to the storage provider (which will only be possible in certain store types). |
Ah, that's a good point. I was assuming that any store-level
Yeah, this is where integrating with opentelemetry probably makes it the right choice. Then we don't have to worry about different ways of setting / propagating the trace ID. Working backwards from questions we'd like to answer, to spots that ought to be traced:
|
Summary
This feature request proposes adding an optional integration of OpenTelemetry to the Zarr-Python codebase. OpenTelemetry is a widely adopted, vendor-neutral standard for generating, collecting, and exporting telemetry data (traces, metrics, and logs) used by many modern observability platforms. The goal is to improve observability, facilitate performance tuning, and enable integration with full-stack monitoring systems — all while preserving a lightweight default behavior.
📌 Motivation
Zarr is widely used in performance-critical and production environments such as:
Currently, Zarr provides limited visibility into internal operations like:
By integrating OpenTelemetry (OTel), Zarr users and developers would benefit from:
☝ Each of these are particularly important following Zarr's recent adoption of asyncio - where the execution of concurrent operations is increasingly hard to track explicitly.
🧩 Proposal
✅ Benefits
🛠️ Implementation Notes
tracing.py
module (or similar) to encapsulate OpenTelemetry usage@contextmanager
orTracer.start_as_current_span()
decorators in key areas🙋♂️ Call for Feedback
We would love to hear from maintainers and the community:
The text was updated successfully, but these errors were encountered: