-
Notifications
You must be signed in to change notification settings - Fork 13.4k
fp128
causes compilation failures when compiling for nvptx64-nvidia-cuda
#95471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I can take a look at this. Please assign this to me. |
Cuda 12.8 and blackwell gpus support fp128 in device code (kernels) |
I do not see fp128 mentioned anywhere in the PTX docs: https://docs.nvidia.com/cuda/parallel-thread-execution/
It's not clear to me what exactly they mean by "processed by the compiler in a floating point representation with lower precision" -- just downcast to double and back (e.g. fp128 as a storage type only) or actually emulate fp128 using double? It's also not clear how/why it's limited to sm_100+ only, given that it does not seem to have any special ops for fp128 and any emulation via double is doable on older GPUs, too. So, we have more questions than answers here. |
@AlexMaclean ^^^ Do you know what's up with fp128 support on the recent GPUs? |
This isn't an area I've personally worked much on but here is my understanding:
This sentence has been in the docs for a while (https://docs.nvidia.com/cuda/archive/12.4.0/cuda-c-programming-guide/index.html#host-compiler-extensions). I think this wording is referring to how host compilers may handle fp128 in host side code. It is not referring to recently added fp128 support. fp128 support goes beyond just a storage type and emulates these instructions in software via lots of operations on smaller types.
As your observation of the PTX spec shows, there isn't anything special introduced in sm_100 which is needed to support fp128. The constraint is imposed by how we're rolling out support in the compiler. |
OK.
Overall, my conclusion is that the failure is more or less expected -- not everything that can be expressed in IR is expected to be correctly lowered on a particular target, and there's no pressing need for implementing storage-only support. If someone sends a patch, it would be welcome. Otherwise, I'd suggest casting to i128 for f128 loads/stores as a workaround. |
While fp128 operations are not natively supported in hardware, emulation for them is supported by nvcc. This change adds basic support for fp128 as a storage type allowing for lowering of IR containing these types. Fixes: #95471
While fp128 operations are not natively supported in hardware, emulation for them is supported by nvcc. This change adds basic support for fp128 as a storage type allowing for lowering of IR containing these types. Fixes: llvm/llvm-project#95471
When compiling the following IR for
nvptx64-nvidia-cuda
(compiler explorer):LLVM will crash with the following assertion failure:
The text was updated successfully, but these errors were encountered: