Skip to content

[SYCL][Bindless][UR][L0][E2E] Fix linear interop memory and L0 V1 adapter leaks. #18353

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
May 9, 2025

Conversation

przemektmalon
Copy link
Contributor

@przemektmalon przemektmalon commented May 7, 2025

This patch fixes the mapping of interop memory to a linear range represented by a void*. The SYCL Bindless Images spec is clarified with wording to indicate which free APIs should be used on mapped imported memory.

Using sycl::free for freeing mapped imported linear memory regions does not work for the LevelZero adapter, as the backend needs to call zeMemFree directly - it cannot go through the regular sycl::free code as that will use UMF free functions by default. This is invalid for memory regions mapped using map_external_linear_memory.

The unmap_external_linear_memory SYCL API has been added to the Bindless Images specification to remedy this problem. An unmap_external_image_memory has also been added.

This patch also introduces an E2E test which verifies that a Vulkan buffer's memory can be correctly imported into SYCL and its data retrieved from and written to within a SYCL kernel by using purely USM pointers (i.e. without using bindless images). This is a step towards extending the test coverage of interop functionality currently in the Bindless Images extension, with the goal being at some point splitting the interop functionality into one or more separate extensions.

Two other fixes have been made in the LevelZero UR adapter.

A leaked ze_image_handle_t object has been fixed in the LevelZero V1 adapter. The previous implementation incorrectly reinterpreted bindless memory handles as ur_mem_handle_t types. The implementation of urBindlessImagesImageFreeExp is now unified between V1 and V2, removing the leak. As such the level_zero/v2/image.cpp file has been removed.

The urMemoryHandle member of ur_ze_external_memory_data has been removed. Aside from being incorrectly reinterpreted from ur_bindless_mem_handle_t, the LevelZero adapter does not need to track the memory handle for external memory resources.

…L0 V1 adapter leaks.

This patch fixes the mapping of interop memory to a linear range
represented by a void*. The SYCL Bindless Images spec is clarified with
wording to indicate which free APIs should be used on mapped imported
memory.

Using `sycl::free` for freeing mapped imported linear memory regions
does not work for the LevelZero adapter, as the backend needs to call
`zeMemFree` directly - it cannot go through the regular `sycl::free`
code as that will use UMF free functions by default. This is invalid for
memory regions mapped using `map_external_linear_memory`.

The `free_mappep_linear_memory` SYCL API has been added to the Bindless
Images specification to remedy this problem.

This patch also introduces an E2E test which verifies that a Vulkan test
buffer's memory can be correctly imported into SYCL and its data
retrieved from and written to within a SYCL kernel by using purely
USM pointers (i.e. without using bindless images). This is a step
towards extending the test coverage of interop functionality currently
in the Bindless Images extension, with the goal being at some point
splitting the interop functionality into one or more separate
extensions.

Two other fixes have been made in the LevelZero UR adapter.

A leaked `ze_image_handle_t` object has been fixed in the LevelZero V1
adapter. The previous implementation incorrectly reinterpreted bindless
memory handles as `ur_mem_handle_t` types. The implementation of
`urBindlessImagesImageFreeExp` is now unified between V1 and V2,
removing the leak. As such the `level_zero/v2/image.cpp` file has been
removed.

The `urMemoryHandle` member of `ur_ze_external_memory_data` has been
removed. Aside from being incorrectly reinterpreted from
`ur_bindless_mem_handle_t`, the LevelZero adapter does not need to
track the memory handle for external memory resources.
`free_mapped_linear_memory`.

Imported external memory handle should be released using
`release_external_memory`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From these, it is unclear to me whether a user needs to call both free_mapped_linear_memory/free_image_mem and release_external_memory or just one. I.e., say I have:

external_mem EM = import_external_memory(...);
void *MEIM = map_external_image_memory(EM, ...);

...

free_image_mem(MEIM, ...);
release_external_memory(EM, ...);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've clarified the wording in the spec. It should hopefully be clear now that the user should first unmap_ the mapped memory, and then call release_external_memory.

I've also added a paragraph stating that an external_mem handle can only be mapped to a single image_mem_handle or void * at any one time. While this use case should be possible, at least for mapping multiple linear memory regions, it is currently untested, so I am reluctant to state that it is possible (and I do not want to leave this underspecified in the spec). We will need to consider this use case in a future PR, ensure that all backends support what we need (or introduce device queries if necessary), and amend the spec with an accompanying test case.

Comment on lines 2171 to 2175
Memory mapped using `map_external_image_memory` should be freed using
`free_image_mem`.

Memory mapped using `map_external_linear_memory` should be freed using
`free_mapped_linear_memory`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It irks me that free_image_mem uses mem while all these other APIs use memory. I suppose it's a little late to address that though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is unfortunate. alloc_image_mem is in the same boat. We should address this in the future, perhaps by deprecating and phasing out the _mem function naming first. Not something for this PR though.

Comment on lines 2171 to 2175
Memory mapped using `map_external_image_memory` should be freed using
`free_image_mem`.

Memory mapped using `map_external_linear_memory` should be freed using
`free_mapped_linear_memory`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the intention is that the user is supposed to call release_external_memory after calling one of these, then I would prefer the following names instead: unmap_external_image_memory and unmap_external_linear_memory, where the former would be a new API. It is less overloading of the term "free" and makes a much stronger association with the map_* APIs they are associated with.

If we want to go that way, I think these deserve a separate section, instead of being bundled together with release_external_memory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've amended the naming to follow your suggestion. The unmap_ and release_ functions now also have their separate code sections.

Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@przemektmalon przemektmalon force-pushed the przemek/vk-buffer-usm-interop branch from 8562d66 to bdfc206 Compare May 8, 2025 17:31
@przemektmalon przemektmalon force-pushed the przemek/vk-buffer-usm-interop branch from bdfc206 to b9e9be7 Compare May 8, 2025 17:41
@przemektmalon
Copy link
Contributor Author

Ping @intel/dpcpp-nativecpu-reviewers @intel/unified-runtime-reviewers @intel/unified-runtime-reviewers-level-zero @intel/unified-runtime-reviewers-level-zero

A quick review and approval for this PR would be very much appreciated, as getting these changes in to the upcoming DPC++ release will be very beneficial.

CI has previously been failing fill_any_size.cpp on OpenCL, however, this test should not be affected by this PR. I expect it is just being flaky.

@chrismile
Copy link

I can confirm that my testing code (with unmap_external_linear_memory added) runs fine now with these patch fixes.

Copy link
Contributor

@aarongreig aarongreig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UR + CL adapter LGTM

@martygrant
Copy link
Contributor

fill_any_size.cpp is flaky and I've opened a PR to disable it #18381

@przemektmalon
Copy link
Contributor Author

Ping @intel/dpcpp-nativecpu-reviewers @intel/unified-runtime-reviewers-level-zero

Copy link
Contributor

@uwedolinsky uwedolinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NativeCPU part looks fine to me

@sarnex sarnex merged commit 4a912ed into intel:sycl May 9, 2025
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.