Skip to content

Should NVPTX kernel functions be callable? #121655

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
workingjubilee opened this issue Jan 4, 2025 · 1 comment
Open

Should NVPTX kernel functions be callable? #121655

workingjubilee opened this issue Jan 4, 2025 · 1 comment
Labels
backend:NVPTX question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead!

Comments

@workingjubilee
Copy link
Contributor

If we attempt to call an amdgpu_kernel function from a device function, the LLVM backend will reject this, as entry points for host calls are not meant to be entered again by the device functions. If we attempt to call a ptx_kernel function from a device function using LLVMIR, however, it seems to compile fine. Is this an intentional difference due to a runtime distinction, or is this just erroneous behavior that the backend nonetheless accepts because LLVM prefers to comply with requests to generate code, no matter how completely nonsensical they might be?

source_filename = "example.9817a48348e8a2e6-cgu.0"
target datalayout = "e-i64:64-i128:128-v16:16-v32:32-n16:32:64"
target triple = "nvptx64-nvidia-cuda"

define ptx_kernel void @global_function() unnamed_addr #0 !dbg !6 {
  br label %bb1, !dbg !11

bb1: ; preds = %bb1, %start
  br label %bb1, !dbg !11
}

define void @_ZN7example15device_function17hba176ca620cc4fa0E() unnamed_addr #0 !dbg !12 {
  call ptx_kernel void @global_function() #1, !dbg !13
  ret void, !dbg !14
}

attributes #0 = { nounwind "target-cpu"="sm_86" }
attributes #1 = { nounwind }

!llvm.module.flags = !{!0, !1, !2}
!llvm.ident = !{!3}
!llvm.dbg.cu = !{!4}

!0 = !{i32 8, !"PIC Level", i32 2}
!1 = !{i32 2, !"Dwarf Version", i32 4}
!2 = !{i32 2, !"Debug Info Version", i32 3}
!3 = !{!"rustc version 1.85.0-nightly (4363f9b6f 2025-01-02)"}
!4 = distinct !DICompileUnit(language: DW_LANG_Rust, file: !5, producer: "clang LLVM (rustc version 1.85.0-nightly (4363f9b6f 2025-01-02))", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, splitDebugInlining: false, nameTableKind: None)
!5 = !DIFile(filename: "/app/example.rs/@/example.9817a48348e8a2e6-cgu.0", directory: "/app")
!6 = distinct !DISubprogram(name: "global_function", scope: !8, file: !7, line: 5, type: !9, scopeLine: 5, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !4, templateParams: !10)
!7 = !DIFile(filename: "example.rs", directory: "/app", checksumkind: CSK_MD5, checksum: "630ff75de1299520699c6090d4e43a8e")
!8 = !DINamespace(name: "example", scope: null)
!9 = !DISubroutineType(types: !10)
!10 = !{}
!11 = !DILocation(line: 6, column: 5, scope: !6)
!12 = distinct !DISubprogram(name: "device_function", linkageName: "_ZN7example15device_function17hba176ca620cc4fa0E", scope: !8, file: !7, line: 9, type: !9, scopeLine: 9, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !4, templateParams: !10)
!13 = !DILocation(line: 10, column: 14, scope: !12)
!14 = !DILocation(line: 11, column: 2, scope: !12)
@Artem-B
Copy link
Member

Artem-B commented Jan 7, 2025

Simple answer -- kernels are not callable. Kernels do have a distinctly different calling convention (e.g parameters are assumed to be identical in all threads, which would not be the case for a kernel called directly).

If we attempt to call a ptx_kernel function from a device function using LLVMIR, however, it seems to compile fine.

That's an implementation detail and, most likely, it "compiles fine" only until it gets to ptxas or whatever compiles the LLVM-generated PTX assembly.

To LLVM, a NVPTX kernel and a regular function differ only by the associated metadata. Use of metadata to distinguish kernels is largely historic, and we're considering switching to a calling convention, but there are no specific plan yet.

If you generate a kernel call, LLVM will be happy to comply (it mostly ignores metadata it does not know about) and will generate PTX for it.
In your case, the IR above has none of the kernel metadata, so that global_function is not a kernel. It's just a regular function.

It should have had something like !8 = !{ptr @global_function, !"kernel", i32 1} associated with it. That would produce PTX assembly with .entry directive which would be a kernel.

That said, ptxas does seem to accept calling kernels, at least on newer GPUs: https://godbolt.org/z/r9efjesYr
This is surprising to me, and I'm not sure whether the generated code will work. While the assemble is able to figure out the kernel address and jump there, it would not necessarily work. The bottom line, kernels do like functions to LLVM, but are "special" for the GPU runtime and hardware.

@dtcxzyw dtcxzyw added question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead! and removed new issue labels Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:NVPTX question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead!
Projects
None yet
Development

No branches or pull requests

5 participants