-
-
Notifications
You must be signed in to change notification settings - Fork 329
Device support in zarr-python
(especially for GPU)
#2658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
CuPy/kvikio relies on nvidia's GPUDirect storage (GDS) driver and goes through PCIe. Metal GPUs are using unified memory, so CPU-to-GPU transfer can in theory be almost zero-cost (passing an address). If there is a way to pass the ownership of an array from CPU to GPU, nothing needs to be done in zarr unless there is need for GPU-accelerated decompression. In practice though, at least torch implements the |
Thanks for the pointers, @ziw-liu - your comment sent me down a rabbit-hole and I think I’ve now got a more concrete proposal worth floating. TL;DR
1 Background – why MLX is special
2 What I’m proposingA new 3 Questions
4 Final NotesThere are a lot more subtleties to implementing this than I have outlined in this issue. But I wanted to start an initial discussion here to get some feedback, before proceeding to a PoC. I know there’s a lot I don’t know about Zarr internals. Any pointers, pitfalls, or “please don’t do it this way” comments are very welcome! |
I think the idea behind the buffer API design was to support exactly this strategy, so it looks like the right direction to me! |
One other thing to think through is the config system (docs: https://zarr.readthedocs.io/en/stable/user-guide/gpu.html). We currently have a high-level
I'm not an expert, but am starting to dig into it as part of #2904. It's pretty challenging... At least for NVIDIA GPUs, I think we might need finer-grained controls over the input and output buffers are for each stage of the pipeline. Maybe that's not an issue with the unified memory model though. |
Problem
I would like to load
zarr
data directly onto non-CPU devices (especially GPU). The current approach appears to rely on usingcupy
to load ontocupy
-supported devices e.g. https://github.com/rapidsai/kvikio/blob/branch-25.02/notebooks/zarr.ipynb.Unfortunately, there are a number of devices that are not supported by
cupy
e.g. I don't believe that my Apple Metal GPU is supported. This means that I must load fromzarr
via CPU if I would like to use these devices e.g.zarr
on disk ->numpy
->torch
(which has Metal support).This is slow(er) and I don't believe is necessary from the
zarr
specification alone (?).Background
Multi-device support is a very important requirement in the AI/ML community. I would like to use
zarr
(and specifically the Python implementation) to run models such as LLMs on multiple devices. The quicker it is to load the model onto device (and with reduced memory usage etc), the better the UX and developer experience is.Questions
cupy
the correct/only way to load direct to GPU withzarr-python
?zarr-python
?zarr-python
? Is itcupy
and then using something like dlpack for zero-copy exchange? Are there alternatives?Related issues
#1967
#2574
cc @jhamman (as suggested by @TomNicholas)
The text was updated successfully, but these errors were encountered: