Skip to content

fp128 causes compilation failures when compiling for nvptx64-nvidia-cuda #95471

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
beetrees opened this issue Jun 13, 2024 · 6 comments · Fixed by #136006
Closed

fp128 causes compilation failures when compiling for nvptx64-nvidia-cuda #95471

beetrees opened this issue Jun 13, 2024 · 6 comments · Fixed by #136006
Labels
backend:NVPTX crash Prefer [crash-on-valid] or [crash-on-invalid]

Comments

@beetrees
Copy link
Contributor

When compiling the following IR for nvptx64-nvidia-cuda (compiler explorer):

define fp128 @identity(fp128 %x) {
  ret fp128 %x
}

LLVM will crash with the following assertion failure:

llc: /root/llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:11476: void llvm::SelectionDAGISel::LowerArguments(const llvm::Function&): Assertion `InVals.size() == Ins.size() && "LowerFormalArguments didn't emit the correct number of values!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /opt/compiler-explorer/clang-assertions-trunk/bin/llc -o /app/output.s -O3 <source>
1.	Running pass 'Function Pass Manager' on module '<source>'.
2.	Running pass 'NVPTX DAG->DAG Pattern Instruction Selection' on function '@id2'
 #0 0x00000000039ffc08 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x39ffc08)
 #1 0x00000000039fd35c SignalHandler(int) Signals.cpp:0:0
 #2 0x00007616a2e42520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #3 0x00007616a2e969fc pthread_kill (/lib/x86_64-linux-gnu/libc.so.6+0x969fc)
 #4 0x00007616a2e42476 gsignal (/lib/x86_64-linux-gnu/libc.so.6+0x42476)
 #5 0x00007616a2e287f3 abort (/lib/x86_64-linux-gnu/libc.so.6+0x287f3)
 #6 0x00007616a2e2871b (/lib/x86_64-linux-gnu/libc.so.6+0x2871b)
 #7 0x00007616a2e39e96 (/lib/x86_64-linux-gnu/libc.so.6+0x39e96)
 #8 0x00000000036fd90b llvm::SelectionDAGISel::LowerArguments(llvm::Function const&) (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x36fd90b)
 #9 0x00000000037c2335 llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x37c2335)
#10 0x00000000037c2f88 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x37c2f88)
#11 0x00000000037b417f llvm::SelectionDAGISelLegacy::runOnMachineFunction(llvm::MachineFunction&) (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x37b417f)
#12 0x00000000029e2a29 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (.part.0) MachineFunctionPass.cpp:0:0
#13 0x0000000002fbc7d3 llvm::FPPassManager::runOnFunction(llvm::Function&) (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x2fbc7d3)
#14 0x0000000002fbca11 llvm::FPPassManager::runOnModule(llvm::Module&) (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x2fbca11)
#15 0x0000000002fbd275 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x2fbd275)
#16 0x000000000083a37c compileModule(char**, llvm::LLVMContext&) llc.cpp:0:0
#17 0x000000000073005e main (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x73005e)
#18 0x00007616a2e29d90 (/lib/x86_64-linux-gnu/libc.so.6+0x29d90)
#19 0x00007616a2e29e40 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e40)
#20 0x0000000000830e9e _start (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x830e9e)
Program terminated with signal: SIGSEGV
Compiler returned: 139
@EugeneZelenko EugeneZelenko added crash Prefer [crash-on-valid] or [crash-on-invalid] backend:NVPTX llvm:SelectionDAG SelectionDAGISel as well and removed new issue labels Jun 13, 2024
@pvimal816
Copy link

I can take a look at this. Please assign this to me.

@oscarbg
Copy link

oscarbg commented Apr 4, 2025

Cuda 12.8 and blackwell gpus support fp128 in device code (kernels)

@Artem-B
Copy link
Member

Artem-B commented Apr 4, 2025

Cuda 12.8 and blackwell gpus support fp128 in device code (kernels)

I do not see fp128 mentioned anywhere in the PTX docs: https://docs.nvidia.com/cuda/parallel-thread-execution/
But CUDA docs do have few mentions of fp128 builtins:
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#:~:text=Quad%2DPrecision%20Floating%2DPoint%20Functions

__float128 type is supported for devices with compute capability 10.0 and later, when compiled in conjunction with a host compiler that supports the type. A constant expression of __float128 type may be processed by the compiler in a floating point representation with lower precision.

It's not clear to me what exactly they mean by "processed by the compiler in a floating point representation with lower precision" -- just downcast to double and back (e.g. fp128 as a storage type only) or actually emulate fp128 using double?

It's also not clear how/why it's limited to sm_100+ only, given that it does not seem to have any special ops for fp128 and any emulation via double is doable on older GPUs, too. So, we have more questions than answers here.

@Artem-B
Copy link
Member

Artem-B commented Apr 4, 2025

@AlexMaclean ^^^ Do you know what's up with fp128 support on the recent GPUs?

@AlexMaclean
Copy link
Member

This isn't an area I've personally worked much on but here is my understanding:

It's not clear to me what exactly they mean by "processed by the compiler in a floating point representation with lower precision" -- just downcast to double and back (e.g. fp128 as a storage type only) or actually emulate fp128 using double?

This sentence has been in the docs for a while (https://docs.nvidia.com/cuda/archive/12.4.0/cuda-c-programming-guide/index.html#host-compiler-extensions). I think this wording is referring to how host compilers may handle fp128 in host side code. It is not referring to recently added fp128 support. fp128 support goes beyond just a storage type and emulates these instructions in software via lots of operations on smaller types.

It's also not clear how/why it's limited to sm_100+ only, given that it does not seem to have any special ops for fp128 and any emulation via double is doable on older GPUs, too. So, we have more questions than answers here.

As your observation of the PTX spec shows, there isn't anything special introduced in sm_100 which is needed to support fp128. The constraint is imposed by how we're rolling out support in the compiler.

@Artem-B
Copy link
Member

Artem-B commented Apr 8, 2025

OK.

  • GPUs do not support fp128. It's not part of PTX spec, so the support is intended to be provided by compiler.
  • AFAICT, LLVM does not provide emulation of fp128. So, the choice is between soft-FP via libcalls (which we can't use as it would rely on external libraries which we don't have on GPU) or native h/w support (which we don't have).
  • CUDA docs only mention fp128 in the context of compiler builtins. So NVCC appears to provide them via some sort of built-in soft-float. I can kind of see that it may be implemented in the NVCC's back-end for the newer GPUs only.
  • the bug is about fp128 in LLVM IR. We do not have that on NVPTX at the moment, not even as the storage type.
  • we could implement storage-only support for f128, similar to what we do for i128, but that's about it.
  • IMO, storage-only support for f128 has fairly limited usefulness. I'd say implementing it would be a very low priority task.

Overall, my conclusion is that the failure is more or less expected -- not everything that can be expressed in IR is expected to be correctly lowered on a particular target, and there's no pressing need for implementing storage-only support. If someone sends a patch, it would be welcome. Otherwise, I'd suggest casting to i128 for f128 loads/stores as a workaround.

AlexMaclean added a commit that referenced this issue Apr 17, 2025
While fp128 operations are not natively supported in hardware, emulation
for them is supported by nvcc. This change adds basic support for
fp128 as a storage type allowing for lowering of IR containing these
types.

Fixes: #95471
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this issue Apr 17, 2025
While fp128 operations are not natively supported in hardware, emulation
for them is supported by nvcc. This change adds basic support for
fp128 as a storage type allowing for lowering of IR containing these
types.

Fixes: llvm/llvm-project#95471
@EugeneZelenko EugeneZelenko removed the llvm:SelectionDAG SelectionDAGISel as well label Apr 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:NVPTX crash Prefer [crash-on-valid] or [crash-on-invalid]
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants