Skip to content

[BUG]-ocl_af_app.rs crashes everytime #282

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nodeSpace opened this issue Feb 13, 2021 · 10 comments
Closed

[BUG]-ocl_af_app.rs crashes everytime #282

nodeSpace opened this issue Feb 13, 2021 · 10 comments
Assignees
Milestone

Comments

@nodeSpace
Copy link
Contributor

Reproducible Code and/or Steps

Running the example here: https://github.com/arrayfire/arrayfire-rust/blob/master/opencl-interop/examples/ocl_af_app.rs is crashing at this line: let ptr = af_buffer.device_ptr(); with the error:

(exit code: 0xc000041d)

Process finished with exit code -1073740771 (0xC000041D)

wierdly, if I spawn a new thread and run it in that and also if I run it directly from 'fn main()' (rather then embedded in the ui of my application), it gives this error instead:

(exit code: 0xc0000005, STATUS_ACCESS_VIOLATION)

Process finished with exit code -1073741819 (0xC0000005)

This has happened every time I've ran the full example code from ocl_af_app.rs

System Information

The af::info() prints out:

ArrayFire v3.8.0 (OpenCL, 64-bit Windows, build d99887a)
-0- NVIDIA: GeForce GTX 1060 3GB, 3072 MB
[1] NVIDIA: GeForce GTX 1060 3GB, 3072 MB

Driver version is: 27.21.14.5671
Any idea what could be causing this issue?

Checklist

  • [x ] Using the latest available ArrayFire release

  • [x ] GPU drivers are up to date

@nodeSpace nodeSpace added the Bug label Feb 13, 2021
@9prady9 9prady9 self-assigned this Feb 15, 2021
@9prady9
Copy link
Member

9prady9 commented Feb 15, 2021

I was able to reproduce some problem, not the exact one you are having I think. I am looking into it. I believe this has something to do with recent fix in v3.8.

@nodeSpace
Copy link
Contributor Author

Looking at the docs, at the end here: https://arrayfire.org/docs/unifiedbackend.htm

Don't: Do not use custom kernels (CUDA/OpenCL) with the Unified backend
This is another area that is a no go when using the Unified backend. It not recommended that you use custom kernels with unified backend. This is mainly becuase the Unified backend is meant to be ultra portable and should use only ArrayFire and native CPU code.

Do you think this might be causing the issue, since the set backend line was added?

af::set_backend(af::Backend::OPENCL);

Although if so I don't know of another way to force arrayfire to use the OpenCL backend

@9prady9
Copy link
Member

9prady9 commented Feb 15, 2021

I don't think that is the reason because when I tested these examples from interop crate, I always used unified API. In fact, unified API is the main way we provide other language wrappers. I don't think that is the reason since they worked fine earlier. Let me look into it. I will try to get back to you as soon as I can.

@9prady9
Copy link
Member

9prady9 commented Feb 15, 2021

I found the problem, it was a missing retain in the example code itself. You need the following additional lines before passing down the buffer to ArrayFire. A silly bug I introduced, sorry about the inconvenience it caused.

    unsafe {
        retain_mem_object(&buffer).unwrap();
    }
    let mut af_buffer = af::Array::new_from_device_ptr(
        buffer.as_ptr() as *mut f32,
        af::Dim4::new(&[dims[0] as u64, 1, 1, 1]),
    );

There seems to be another larger issue here and It could be something in ArrayFire itself. Not sure yet where the double release is happening. Theoretically, a retain before passing cl_mem to ArrayFire should handle the releases on that object just fine. Some how, there is an additional release call happening even with just the below code where we just create an Array using cl_mem and exit the program.

    af::set_backend(af::Backend::OPENCL);

    let platform_id = ocl_core::default_platform().unwrap();
    let device_ids = ocl_core::get_device_ids(&platform_id, None, None).unwrap();
    let device_id = device_ids[0];
    let context_properties = ContextProperties::new().platform(platform_id);
    let context =
        ocl_core::create_context(Some(&context_properties), &[device_id], None, None).unwrap();
    let queue = ocl_core::create_command_queue(&context, &device_id, None).unwrap();
    let dims = [8, 1, 1];

    let mut vec = vec![1.0f32; dims[0]];
    let buffer = unsafe {
        ocl_core::create_buffer(
            &context,
            ocl_core::MEM_READ_WRITE | ocl_core::MEM_COPY_HOST_PTR,
            dims[0],
            Some(&vec),
        )
        .unwrap()
    };
    ocl_core::finish(&queue).unwrap(); //sync up before switching to arrayfire

    afcl::add_device_context(device_id.as_raw(), context.as_ptr(), queue.as_ptr());
    afcl::set_device_context(device_id.as_raw(), context.as_ptr());
    af::info();

    println!("Ref Count: {}",
        ocl_core::get_mem_object_info(&buffer, ocl_core::MemInfo::ReferenceCount).unwrap());
    let mut af_buffer = {
        unsafe { retain_mem_object(&buffer).unwrap(); };
        af::Array::new_from_device_ptr(
            buffer.as_ptr() as *mut f32,
            af::Dim4::new(&[dims[0] as u64, 1, 1, 1])
        )
    };
    println!("Ref Count: {}",
        ocl_core::get_mem_object_info(&buffer, ocl_core::MemInfo::ReferenceCount).unwrap());

    af::af_print!("GPU Buffer before modification:", af_buffer);

    af::set_device(0); // Cannot pop when in Use, hence switch to another device
    afcl::delete_device_context(device_id.as_raw(), context.as_ptr());

I shall an update once I have fix for this, hopefully very soon.

@nodeSpace
Copy link
Contributor Author

nodeSpace commented Feb 16, 2021

Something funny is definitely going on...I couldn't get your fix to work for me until I did this instead:

unsafe {
    ocl_core::retain_mem_object(&buffer).unwrap();
    ocl_core::retain_mem_object(&buffer).unwrap();
}

so the final file is just the above 2 unsafe lines + https://github.com/arrayfire/arrayfire-rust/blob/master/opencl-interop/examples/ocl_af_app.rs is what ran without an error for me.

@9prady9
Copy link
Member

9prady9 commented Feb 16, 2021

@nodeSpace That is the double release I was referring to. Hence, the reason the extra retain is causing the program to exit fine. Only one retain is required before passing the buffer to ArrayFire so that release call by ArrayFire doesn't invalidate buffer object in rust and vice versa. Somehow there is a third release call on cl_mem that is happening inside ArrayFire upstream which I am trying to track down.

@9prady9
Copy link
Member

9prady9 commented Feb 16, 2021

Found the main issue. It is a regression introduced in v3.7.2. Interestingly it wasn't encountered until this particular use case.

I will soon send in a PR for it to the upstream and it would be available in the next fix release.

Sorry about the inconvenience, as far as this example is considered. I missed adding the required retain before passing the cl_mem to ArrayFire - that is the only fix in rust wrapper level. I will fix this too soon.

Thanks for reporting this!

@9prady9
Copy link
Member

9prady9 commented Feb 16, 2021

Here's the fix for the example - 87a331e

Even though it seems like an example fix only, I believe it kind of directs users on how to use the crate itself in a key way. I will do a quick new release as soon as I can.

@9prady9 9prady9 added this to the 3.8.1 milestone Feb 16, 2021
@9prady9
Copy link
Member

9prady9 commented Feb 16, 2021

arrayfire/arrayfire#3091 is the upstream. Closing since the example has been fixed.

@9prady9 9prady9 closed this as completed Feb 16, 2021
@nodeSpace
Copy link
Contributor Author

nodeSpace commented Feb 16, 2021

Thanks for reporting this!

No problem! Thanks for maintaining/creating arrayfire-rust!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants