Skip to content

[SYCL][Bindless][UR][L0][E2E] Fix linear interop memory and L0 V1 adapter leaks. #18353

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
May 9, 2025
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -2150,6 +2150,10 @@ the same way as memory allocated through `alloc_image_mem`,
operations also work with imported memory mapped to `image_mem_handle` and
`void *` types.

An `external_mem` handle can only be mapped to a single `image_mem_handle` or
`void *` at any one time. Attempting to map a single `external_mem` handle to
more than one `image_mem_handle` or `void *` is considered undefined behaviour.

When calling `create_image` with an `image_mem_handle` or `void *` mapped from
an external memory object, the user must ensure that the image descriptor they
pass to `create_image` has members that match or map to those of the external
Expand All @@ -2165,11 +2169,41 @@ the external API. The current supported importable image types are `standard`
and `mipmap`. Attempting to import other image types will result in undefined
behaviour.

Once a user has finished operating on imported memory, they must ensure that
they destroy the imported memory handle through `release_external_memory`.
Once a user has finished operating on mapped memory, they must ensure that they
unmap that memory.

Memory mapped using `map_external_image_memory` should be unmapped using
`unmap_external_image_memory`. The `image_type` parameter passed to this
function must reflect the `image_type` of the image descriptor used when the
memory was originally mapped. Passing an `image_type` value different to that of
the value used in the image descriptor when the memory was originally mapped
will result in undefined behaviour.

`release_external_memory` can only accept `external_mem` objects that were
created through `import_external_memory`.
Memory mapped using `map_external_linear_memory` should be unmapped using
`unmap_external_linear_memory`.

```cpp
namespace sycl::ext::oneapi::experimental {

void unmap_external_image_memory(image_mem_handle mappedImageMem,
image_type imageType,
const sycl::device &syclDevice,
const sycl::context &syclContext);
void unmap_external_image_memory(image_mem_handle mappedImageMem,
image_type imageType,
const sycl::queue &syclQueue);

void unmap_external_linear_memory(void *mappedLinearMem,
const sycl::device &syclDevice,
const sycl::context &syclContext);
void unmap_external_linear_memory(void *mappedLinearMem,
const sycl::queue &syclQueue);
}
```

Once all memory mapped from a given `external_mem` handle has been unmapped,
and the user has finished operating on the external memory, they should then
release the `externa_mem` handle using `release_external_memory`.

```cpp
namespace sycl::ext::oneapi::experimental {
Expand All @@ -2182,9 +2216,6 @@ void release_external_memory(external_mem externalMem,
}
```

Destroying or freeing any imported memory through `image_mem_free` or
`sycl::free` will result in undefined behavior.

=== Importing external semaphores [[importing_external_semaphores]]

In addition to proposing importation of external memory resources, we also
Expand Down
42 changes: 42 additions & 0 deletions sycl/include/sycl/ext/oneapi/bindless_images.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -285,6 +285,48 @@ __SYCL_EXPORT void release_external_memory(external_mem externalMem,
__SYCL_EXPORT void release_external_memory(external_mem externalMem,
const sycl::queue &syclQueue);

/**
* @brief Unmap external linear memory region
*
* @param mappedLinearMem Pointer to the mapped linear memory region to unmap
* @param syclDevice The device in which the external memory was created
* @param syclContext The context in which the external memory was created
*/
__SYCL_EXPORT void
unmap_external_linear_memory(void *mappedLinearMem,
const sycl::device &syclDevice,
const sycl::context &syclContext);

/**
* @brief Unmap external linear memory region
*
* @param mappedLinearMem Pointer to the mapped linear memory region to unmap
* @param syclQueue The queue in which the external memory was created
*/
__SYCL_EXPORT void unmap_external_linear_memory(void *mappedLinearMem,
const sycl::queue &syclQueue);

/**
* @brief Unmap external image memory
*
* @param mappedImageMem Handle to the mapped image memory to unmap
* @param syclDevice The device in which the external memory was created
* @param syclContext The context in which the external memory was created
*/
__SYCL_EXPORT void unmap_external_image_memory(
image_mem_handle mappedImageMem, image_type imageType,
const sycl::device &syclDevice, const sycl::context &syclContext);

/**
* @brief Unmap external image memory
*
* @param mappedImageMem Handle to the mapped image memory to unmap
* @param syclQueue The queue in which the external memory was created
*/
__SYCL_EXPORT void unmap_external_image_memory(image_mem_handle mappedImageMem,
image_type imageType,
const sycl::queue &syclQueue);

/**
* @brief Create an image and return the device image handle
*
Expand Down
31 changes: 31 additions & 0 deletions sycl/source/detail/bindless_images.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -546,6 +546,37 @@ __SYCL_EXPORT void release_external_memory(external_mem extMem,
syclQueue.get_context());
}

__SYCL_EXPORT void
unmap_external_linear_memory(void *mappedLinearRegion,
const sycl::device &syclDevice,
const sycl::context &syclContext) {
auto [urDevice, urCtx, Adapter] = get_ur_handles(syclDevice, syclContext);

Adapter->call<
sycl::errc::invalid,
sycl::detail::UrApiKind::urBindlessImagesFreeMappedLinearMemoryExp>(
urCtx, urDevice, mappedLinearRegion);
}

__SYCL_EXPORT void unmap_external_linear_memory(void *mappedLinearRegion,
const sycl::queue &syclQueue) {
unmap_external_linear_memory(mappedLinearRegion, syclQueue.get_device(),
syclQueue.get_context());
}

__SYCL_EXPORT void unmap_external_image_memory(
image_mem_handle mappedImageMem, image_type imageType,
const sycl::device &syclDevice, const sycl::context &syclContext) {
free_image_mem(mappedImageMem, imageType, syclDevice, syclContext);
}

__SYCL_EXPORT void unmap_external_image_memory(image_mem_handle mappedImageMem,
image_type imageType,
const sycl::queue &syclQueue) {
unmap_external_image_memory(mappedImageMem, imageType, syclQueue.get_device(),
syclQueue.get_context());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this needs to be exported and not inlined on the user side?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. The functions taking a sycl::queue can be marked as inline in the bindless_images.hpp header and just call the variants taking sycl::device and sycl::context - instead of being marked as __SYCL_EXPORT.

I've amended the functions introduced in this PR which take a sycl::queue to be inline in the bindless_images.hpp header, and added definitions in that header that point to the variants taking the SYCL device and context which contain the actual definitions.

This should also apply to all of our other APIs, but I think this would be a refactoring change that should belong in a separate PR. We will track this issue internally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I can have arguments for either approach, that's why I didn't ask for the change :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that having the sycl::queue variants being inline should be an improvement. I'm not sure if it will have a large impact on compilation time, but at least the number of exported symbols will be reduced.

If you have a better intuition of the impact this might have on compilation time let me know, then we could expedite the change to the other Bindless Images API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing it in headers means we have to create new device/context objects and pay std::shared_ptr atomic price. If we do that inside the library we can get *_impl directly without increasing ref counts and also benefit from inlining.

I don't know how much is that an issue (if at all). Also, a longer term fix might be to completely avoid ref counting (at least for some types like device) so that it would become a non-issue in future.


template <>
__SYCL_EXPORT external_semaphore import_external_semaphore(
external_semaphore_descriptor<resource_fd> externalSemaphoreDesc,
Expand Down
Loading
Loading