Skip to content

Using non-default streams in CUDA #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
czgdp1807 opened this issue Feb 1, 2020 · 0 comments · Fixed by #32
Closed

Using non-default streams in CUDA #2

czgdp1807 opened this issue Feb 1, 2020 · 0 comments · Fixed by #32
Labels

Comments

@czgdp1807
Copy link
Member

czgdp1807 commented Feb 1, 2020

Description of the problem

Currently, in #6 all the operations are issued to default stream. However, I was thinking that we can use non-default streams for issuing various kernels to different operations for their parallel execution.
An example of such a situation is filling n vectors parallelly with fill_vector_kernel launched in n separate streams. In fact, one more example can be to fill n*m matrix with n or m kernels launched in separate streams.
Before moving on to the implementation we can discuss the API for the above use case.
Please comment below if you have thought of something. I will come up with the design soon.
One more advantage of using non-default streams as claimed by https://devblogs.nvidia.com/how-overlap-data-transfers-cuda-cc/ is the overlap of data transfers and kernel execution. However IMO, this isn't really useful for this library because it may be the case that user wants to copy back only small sized Vector to host and for that wasting time in creating streams isn't a good idea.

Example of the problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant