`fp128` causes compilation failures when compiling for `nvptx64-nvidia-cuda` #95471

beetrees · 2024-06-13T20:50:32Z

When compiling the following IR for nvptx64-nvidia-cuda (compiler explorer):

define fp128 @identity(fp128 %x) {
  ret fp128 %x
}

LLVM will crash with the following assertion failure:

llc: /root/llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:11476: void llvm::SelectionDAGISel::LowerArguments(const llvm::Function&): Assertion `InVals.size() == Ins.size() && "LowerFormalArguments didn't emit the correct number of values!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /opt/compiler-explorer/clang-assertions-trunk/bin/llc -o /app/output.s -O3 <source>
1.	Running pass 'Function Pass Manager' on module '<source>'.
2.	Running pass 'NVPTX DAG->DAG Pattern Instruction Selection' on function '@id2'
 #0 0x00000000039ffc08 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x39ffc08)
 #1 0x00000000039fd35c SignalHandler(int) Signals.cpp:0:0
 #2 0x00007616a2e42520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #3 0x00007616a2e969fc pthread_kill (/lib/x86_64-linux-gnu/libc.so.6+0x969fc)
 #4 0x00007616a2e42476 gsignal (/lib/x86_64-linux-gnu/libc.so.6+0x42476)
 #5 0x00007616a2e287f3 abort (/lib/x86_64-linux-gnu/libc.so.6+0x287f3)
 #6 0x00007616a2e2871b (/lib/x86_64-linux-gnu/libc.so.6+0x2871b)
 #7 0x00007616a2e39e96 (/lib/x86_64-linux-gnu/libc.so.6+0x39e96)
 #8 0x00000000036fd90b llvm::SelectionDAGISel::LowerArguments(llvm::Function const&) (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x36fd90b)
 #9 0x00000000037c2335 llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x37c2335)
#10 0x00000000037c2f88 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x37c2f88)
#11 0x00000000037b417f llvm::SelectionDAGISelLegacy::runOnMachineFunction(llvm::MachineFunction&) (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x37b417f)
#12 0x00000000029e2a29 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (.part.0) MachineFunctionPass.cpp:0:0
#13 0x0000000002fbc7d3 llvm::FPPassManager::runOnFunction(llvm::Function&) (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x2fbc7d3)
#14 0x0000000002fbca11 llvm::FPPassManager::runOnModule(llvm::Module&) (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x2fbca11)
#15 0x0000000002fbd275 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x2fbd275)
#16 0x000000000083a37c compileModule(char**, llvm::LLVMContext&) llc.cpp:0:0
#17 0x000000000073005e main (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x73005e)
#18 0x00007616a2e29d90 (/lib/x86_64-linux-gnu/libc.so.6+0x29d90)
#19 0x00007616a2e29e40 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e40)
#20 0x0000000000830e9e _start (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x830e9e)
Program terminated with signal: SIGSEGV
Compiler returned: 139

The text was updated successfully, but these errors were encountered:

pvimal816 · 2024-08-24T05:04:53Z

I can take a look at this. Please assign this to me.

oscarbg · 2025-04-04T12:19:10Z

Cuda 12.8 and blackwell gpus support fp128 in device code (kernels)

Artem-B · 2025-04-04T21:09:48Z

Cuda 12.8 and blackwell gpus support fp128 in device code (kernels)

I do not see fp128 mentioned anywhere in the PTX docs: https://docs.nvidia.com/cuda/parallel-thread-execution/
But CUDA docs do have few mentions of fp128 builtins:
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#:~:text=Quad%2DPrecision%20Floating%2DPoint%20Functions

__float128 type is supported for devices with compute capability 10.0 and later, when compiled in conjunction with a host compiler that supports the type. A constant expression of __float128 type may be processed by the compiler in a floating point representation with lower precision.

It's not clear to me what exactly they mean by "processed by the compiler in a floating point representation with lower precision" -- just downcast to double and back (e.g. fp128 as a storage type only) or actually emulate fp128 using double?

It's also not clear how/why it's limited to sm_100+ only, given that it does not seem to have any special ops for fp128 and any emulation via double is doable on older GPUs, too. So, we have more questions than answers here.

Artem-B · 2025-04-04T21:11:01Z

@AlexMaclean ^^^ Do you know what's up with fp128 support on the recent GPUs?

AlexMaclean · 2025-04-08T00:06:44Z

This isn't an area I've personally worked much on but here is my understanding:

It's not clear to me what exactly they mean by "processed by the compiler in a floating point representation with lower precision" -- just downcast to double and back (e.g. fp128 as a storage type only) or actually emulate fp128 using double?

This sentence has been in the docs for a while (https://docs.nvidia.com/cuda/archive/12.4.0/cuda-c-programming-guide/index.html#host-compiler-extensions). I think this wording is referring to how host compilers may handle fp128 in host side code. It is not referring to recently added fp128 support. fp128 support goes beyond just a storage type and emulates these instructions in software via lots of operations on smaller types.

It's also not clear how/why it's limited to sm_100+ only, given that it does not seem to have any special ops for fp128 and any emulation via double is doable on older GPUs, too. So, we have more questions than answers here.

As your observation of the PTX spec shows, there isn't anything special introduced in sm_100 which is needed to support fp128. The constraint is imposed by how we're rolling out support in the compiler.

Artem-B · 2025-04-08T19:07:20Z

OK.

GPUs do not support fp128. It's not part of PTX spec, so the support is intended to be provided by compiler.
AFAICT, LLVM does not provide emulation of fp128. So, the choice is between soft-FP via libcalls (which we can't use as it would rely on external libraries which we don't have on GPU) or native h/w support (which we don't have).
CUDA docs only mention fp128 in the context of compiler builtins. So NVCC appears to provide them via some sort of built-in soft-float. I can kind of see that it may be implemented in the NVCC's back-end for the newer GPUs only.
the bug is about fp128 in LLVM IR. We do not have that on NVPTX at the moment, not even as the storage type.
we could implement storage-only support for f128, similar to what we do for i128, but that's about it.
IMO, storage-only support for f128 has fairly limited usefulness. I'd say implementing it would be a very low priority task.

Overall, my conclusion is that the failure is more or less expected -- not everything that can be expressed in IR is expected to be correctly lowered on a particular target, and there's no pressing need for implementing storage-only support. If someone sends a patch, it would be welcome. Otherwise, I'd suggest casting to i128 for f128 loads/stores as a workaround.

While fp128 operations are not natively supported in hardware, emulation for them is supported by nvcc. This change adds basic support for fp128 as a storage type allowing for lowering of IR containing these types. Fixes: #95471

While fp128 operations are not natively supported in hardware, emulation for them is supported by nvcc. This change adds basic support for fp128 as a storage type allowing for lowering of IR containing these types. Fixes: llvm/llvm-project#95471

github-actions bot added the new issue label Jun 13, 2024

beetrees mentioned this issue Jun 13, 2024

Tracking Issue for f16 and f128 float types rust-lang/rust#116909

Open

89 tasks

EugeneZelenko added crash Prefer [crash-on-valid] or [crash-on-invalid] backend:NVPTX llvm:SelectionDAG SelectionDAGISel as well and removed new issue labels Jun 13, 2024

AlexMaclean mentioned this issue Apr 16, 2025

[NVPTX] Basic support for fp128 as a storage type #136006

Merged

AlexMaclean closed this as completed in #136006 Apr 17, 2025

EugeneZelenko removed the llvm:SelectionDAG SelectionDAGISel as well label Apr 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`fp128` causes compilation failures when compiling for `nvptx64-nvidia-cuda` #95471

`fp128` causes compilation failures when compiling for `nvptx64-nvidia-cuda` #95471

beetrees commented Jun 13, 2024

pvimal816 commented Aug 24, 2024

oscarbg commented Apr 4, 2025

Artem-B commented Apr 4, 2025

Artem-B commented Apr 4, 2025

AlexMaclean commented Apr 8, 2025

Artem-B commented Apr 8, 2025

fp128 causes compilation failures when compiling for nvptx64-nvidia-cuda #95471

fp128 causes compilation failures when compiling for nvptx64-nvidia-cuda #95471

Comments

beetrees commented Jun 13, 2024

pvimal816 commented Aug 24, 2024

oscarbg commented Apr 4, 2025

Artem-B commented Apr 4, 2025

Artem-B commented Apr 4, 2025

AlexMaclean commented Apr 8, 2025

Artem-B commented Apr 8, 2025

`fp128` causes compilation failures when compiling for `nvptx64-nvidia-cuda` #95471

`fp128` causes compilation failures when compiling for `nvptx64-nvidia-cuda` #95471