Using non-default streams in CUDA #2

czgdp1807 · 2020-02-01T18:12:12Z

Description of the problem

Currently, in #6 all the operations are issued to default stream. However, I was thinking that we can use non-default streams for issuing various kernels to different operations for their parallel execution.
An example of such a situation is filling n vectors parallelly with fill_vector_kernel launched in n separate streams. In fact, one more example can be to fill n*m matrix with n or m kernels launched in separate streams.
Before moving on to the implementation we can discuss the API for the above use case.
Please comment below if you have thought of something. I will come up with the design soon.
One more advantage of using non-default streams as claimed by https://devblogs.nvidia.com/how-overlap-data-transfers-cuda-cc/ is the overlap of data transfers and kernel execution. However IMO, this isn't really useful for this library because it may be the case that user wants to copy back only small sized Vector to host and for that wasting time in creating streams isn't a good idea.

Example of the problem

The text was updated successfully, but these errors were encountered:

czgdp1807 added the cuda label Feb 1, 2020

czgdp1807 mentioned this issue Mar 12, 2020

Refactoring Codebase #10

Open

3 tasks

Tanvi141 mentioned this issue Jul 22, 2020

Non-default streams for filling matrix #32

Merged

czgdp1807 closed this as completed in #32 Jul 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using non-default streams in CUDA #2

Using non-default streams in CUDA #2

czgdp1807 commented Feb 1, 2020 •

edited

Loading

Using non-default streams in CUDA #2

Using non-default streams in CUDA #2

Comments

czgdp1807 commented Feb 1, 2020 • edited Loading

Description of the problem

Example of the problem

czgdp1807 commented Feb 1, 2020 •

edited

Loading