Skip to content

Commit 37e3625

Browse files
committed
merge main into amd-staging
merges up to commit before 87b4108 [Libomptarget][NFC] Remove concept of optional plugin functions (llvm#82681) Change-Id: I1fc8cf975c675a254b29eb1beaf8e61bf3bef450
2 parents 1b2985c + bc5aba9 commit 37e3625

File tree

98 files changed

+2234
-1434
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

98 files changed

+2234
-1434
lines changed
+110
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
2+
Expected Differences vs DXC and FXC
3+
===================================
4+
5+
.. contents::
6+
:local:
7+
8+
Introduction
9+
============
10+
11+
HLSL currently has two reference compilers, the `DirectX Shader Compiler (DXC)
12+
<https://github.com/microsoft/DirectXShaderCompiler/>`_ and the
13+
`Effect-Compiler (FXC) <https://learn.microsoft.com/en-us/windows/win32/direct3dtools/fxc>`_.
14+
The two reference compilers do not fully agree. Some known disagreements in the
15+
references are tracked on
16+
`DXC's GitHub
17+
<https://github.com/microsoft/DirectXShaderCompiler/issues?q=is%3Aopen+is%3Aissue+label%3Afxc-disagrees>`_,
18+
but many more are known to exist.
19+
20+
HLSL as implemented by Clang will also not fully match either of the reference
21+
implementations, it is instead being written to match the `draft language
22+
specification <https://microsoft.github.io/hlsl-specs/specs/hlsl.pdf>`_.
23+
24+
This document is a non-exhaustive collection the known differences between
25+
Clang's implementation of HLSL and the existing reference compilers.
26+
27+
General Principles
28+
------------------
29+
30+
Most of the intended differences between Clang and the earlier reference
31+
compilers are focused on increased consistency and correctness. Both reference
32+
compilers do not always apply language rules the same in all contexts.
33+
34+
Clang also deviates from the reference compilers by providing different
35+
diagnostics, both in terms of the textual messages and the contexts in which
36+
diagnostics are produced. While striving for a high level of source
37+
compatibility with conforming HLSL code, Clang may produce earlier and more
38+
robust diagnostics for incorrect code or reject code that a reference compiler
39+
incorrectly accepted.
40+
41+
Language Version
42+
================
43+
44+
Clang targets language compatibility for HLSL 2021 as implemented by DXC.
45+
Language features that were removed in earlier versions of HLSL may be added on
46+
a case-by-case basis, but are not planned for the initial implementation.
47+
48+
Overload Resolution
49+
===================
50+
51+
Clang's HLSL implementation adopts C++ overload resolution rules as proposed for
52+
HLSL 202x based on proposal
53+
`0007 <https://github.com/microsoft/hlsl-specs/blob/main/proposals/0007-const-instance-methods.md>`_
54+
and
55+
`0008 <https://github.com/microsoft/hlsl-specs/blob/main/proposals/0008-non-member-operator-overloading.md>`_.
56+
57+
Clang's implementation extends standard overload resolution rules to HLSL
58+
library functionality. This causes subtle changes in overload resolution
59+
behavior between Clang and DXC. Some examples include:
60+
61+
.. code-block:: c++
62+
63+
void halfOrInt16(half H);
64+
void halfOrInt16(uint16_t U);
65+
void halfOrInt16(int16_t I);
66+
67+
void takesDoubles(double, double, double);
68+
69+
cbuffer CB {
70+
uint U;
71+
int I;
72+
float X, Y, Z;
73+
double3 A, B;
74+
}
75+
76+
export void call() {
77+
halfOrInt16(U); // DXC: Fails with call ambiguous between int16_t and uint16_t overloads
78+
// Clang: Resolves to halfOrInt16(uint16_t).
79+
halfOrInt16(I); // All: Resolves to halfOrInt16(int16_t).
80+
half H;
81+
#ifndef IGNORE_ERRORS
82+
// asfloat16 is a builtin with overloads for half, int16_t, and uint16_t.
83+
H = asfloat16(I); // DXC: Fails to resolve overload for int.
84+
// Clang: Resolves to asfloat16(int16_t).
85+
H = asfloat16(U); // DXC: Fails to resolve overload for int.
86+
// Clang: Resolves to asfloat16(uint16_t).
87+
#endif
88+
H = asfloat16(0x01); // DXC: Resolves to asfloat16(half).
89+
// Clang: Resolves to asfloat16(uint16_t).
90+
91+
takesDoubles(X, Y, Z); // Works on all compilers
92+
#ifndef IGNORE_ERRORS
93+
fma(X, Y, Z); // DXC: Fails to resolve no known conversion from float to double.
94+
// Clang: Resolves to fma(double,double,double).
95+
#endif
96+
97+
double D = dot(A, B); // DXC: Resolves to dot(double3, double3), fails DXIL Validation.
98+
// FXC: Expands to compute double dot product with fmul/fadd
99+
// Clang: Resolves to dot(float3, float3), emits conversion warnings.
100+
101+
}
102+
103+
.. note::
104+
105+
In Clang, a conscious decision was made to exclude the ``dot(vector<double,N>, vector<double,N>)``
106+
overload and allow overload resolution to resolve the
107+
``vector<float,N>`` overload. This approach provides ``-Wconversion``
108+
diagnostic notifying the user of the conversion rather than silently altering
109+
precision relative to the other overloads (as FXC does) or generating code
110+
that will fail validation (as DXC does).

clang/docs/HLSL/HLSLDocs.rst

+1
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ HLSL Design and Implementation
1111
.. toctree::
1212
:maxdepth: 1
1313

14+
ExpectedDifferences
1415
HLSLIRReference
1516
ResourceTypes
1617
EntryFunctions

clang/docs/ReleaseNotes.rst

+11
Original file line numberDiff line numberDiff line change
@@ -307,6 +307,17 @@ X86 Support
307307
Arm and AArch64 Support
308308
^^^^^^^^^^^^^^^^^^^^^^^
309309

310+
- ARMv7+ targets now default to allowing unaligned access, except Armv6-M, and
311+
Armv8-M without the Main Extension. Baremetal targets should check that the
312+
new default will work with their system configurations, since it requires
313+
that SCTLR.A is 0, SCTLR.U is 1, and that the memory in question is
314+
configured as "normal" memory. This brings Clang in-line with the default
315+
settings for GCC and Arm Compiler. Aside from making Clang align with other
316+
compilers, changing the default brings major performance and code size
317+
improvements for most targets. We have not changed the default behavior for
318+
ARMv6, but may revisit that decision in the future. Users can restore the old
319+
behavior with -m[no-]unaligned-access.
320+
310321
Android Support
311322
^^^^^^^^^^^^^^^
312323

clang/lib/CodeGen/CodeGenPGO.cpp

+7-4
Original file line numberDiff line numberDiff line change
@@ -239,9 +239,12 @@ struct MapRegionCounters : public RecursiveASTVisitor<MapRegionCounters> {
239239
if (MCDCMaxCond == 0)
240240
return true;
241241

242-
/// At the top of the logical operator nest, reset the number of conditions.
243-
if (LogOpStack.empty())
242+
/// At the top of the logical operator nest, reset the number of conditions,
243+
/// also forget previously seen split nesting cases.
244+
if (LogOpStack.empty()) {
244245
NumCond = 0;
246+
SplitNestedLogicalOp = false;
247+
}
245248

246249
if (const Expr *E = dyn_cast<Expr>(S)) {
247250
const BinaryOperator *BinOp = dyn_cast<BinaryOperator>(E->IgnoreParens());
@@ -292,7 +295,7 @@ struct MapRegionCounters : public RecursiveASTVisitor<MapRegionCounters> {
292295
"contains an operation with a nested boolean expression. "
293296
"Expression will not be covered");
294297
Diag.Report(S->getBeginLoc(), DiagID);
295-
return false;
298+
return true;
296299
}
297300

298301
/// Was the maximum number of conditions encountered?
@@ -303,7 +306,7 @@ struct MapRegionCounters : public RecursiveASTVisitor<MapRegionCounters> {
303306
"number of conditions (%0) exceeds max (%1). "
304307
"Expression will not be covered");
305308
Diag.Report(S->getBeginLoc(), DiagID) << NumCond << MCDCMaxCond;
306-
return false;
309+
return true;
307310
}
308311

309312
// Otherwise, allocate the number of bytes required for the bitmap

clang/lib/Driver/ToolChains/Arch/ARM.cpp

+12-12
Original file line numberDiff line numberDiff line change
@@ -890,25 +890,25 @@ llvm::ARM::FPUKind arm::getARMTargetFeatures(const Driver &D,
890890
// SCTLR.U bit, which is architecture-specific. We assume ARMv6
891891
// Darwin and NetBSD targets support unaligned accesses, and others don't.
892892
//
893-
// ARMv7 always has SCTLR.U set to 1, but it has a new SCTLR.A bit
894-
// which raises an alignment fault on unaligned accesses. Linux
895-
// defaults this bit to 0 and handles it as a system-wide (not
896-
// per-process) setting. It is therefore safe to assume that ARMv7+
897-
// Linux targets support unaligned accesses. The same goes for NaCl
898-
// and Windows.
893+
// ARMv7 always has SCTLR.U set to 1, but it has a new SCTLR.A bit which
894+
// raises an alignment fault on unaligned accesses. Assume ARMv7+ supports
895+
// unaligned accesses, except ARMv6-M, and ARMv8-M without the Main
896+
// Extension. This aligns with the default behavior of ARM's downstream
897+
// versions of GCC and Clang.
899898
//
900-
// The above behavior is consistent with GCC.
899+
// Users can change the default behavior via -m[no-]unaliged-access.
901900
int VersionNum = getARMSubArchVersionNumber(Triple);
902901
if (Triple.isOSDarwin() || Triple.isOSNetBSD()) {
903902
if (VersionNum < 6 ||
904903
Triple.getSubArch() == llvm::Triple::SubArchType::ARMSubArch_v6m)
905904
Features.push_back("+strict-align");
906-
} else if (Triple.isOSLinux() || Triple.isOSNaCl() ||
907-
Triple.isOSWindows()) {
908-
if (VersionNum < 7)
909-
Features.push_back("+strict-align");
910-
} else
905+
} else if (VersionNum < 7 ||
906+
Triple.getSubArch() ==
907+
llvm::Triple::SubArchType::ARMSubArch_v6m ||
908+
Triple.getSubArch() ==
909+
llvm::Triple::SubArchType::ARMSubArch_v8m_baseline) {
911910
Features.push_back("+strict-align");
911+
}
912912
}
913913

914914
// llvm does not support reserving registers in general. There is support

clang/lib/Driver/ToolChains/CommonArgs.cpp

+34-3
Original file line numberDiff line numberDiff line change
@@ -1157,10 +1157,41 @@ static void addOpenMPDeviceLibC(const ToolChain &TC, const ArgList &Args,
11571157
"llvm-libc-decls");
11581158
bool HasLibC = llvm::sys::fs::exists(LibCDecls) &&
11591159
llvm::sys::fs::is_directory(LibCDecls);
1160-
if (Args.hasFlag(options::OPT_gpulibc, options::OPT_nogpulibc, HasLibC)) {
1161-
CmdArgs.push_back("-lcgpu");
1162-
CmdArgs.push_back("-lmgpu");
1160+
if (!Args.hasFlag(options::OPT_gpulibc, options::OPT_nogpulibc, HasLibC))
1161+
return;
1162+
1163+
// We don't have access to the offloading toolchains here, so determine from
1164+
// the arguments if we have any active NVPTX or AMDGPU toolchains.
1165+
llvm::DenseSet<const char *> Libraries;
1166+
if (const Arg *Targets = Args.getLastArg(options::OPT_fopenmp_targets_EQ)) {
1167+
if (llvm::any_of(Targets->getValues(),
1168+
[](auto S) { return llvm::Triple(S).isAMDGPU(); })) {
1169+
Libraries.insert("-lcgpu-amdgpu");
1170+
Libraries.insert("-lmgpu-amdgpu");
1171+
}
1172+
if (llvm::any_of(Targets->getValues(),
1173+
[](auto S) { return llvm::Triple(S).isNVPTX(); })) {
1174+
Libraries.insert("-lcgpu-nvptx");
1175+
Libraries.insert("-lmgpu-nvptx");
1176+
}
11631177
}
1178+
1179+
for (StringRef Arch : Args.getAllArgValues(options::OPT_offload_arch_EQ)) {
1180+
if (llvm::any_of(llvm::split(Arch, ","), [](StringRef Str) {
1181+
return IsAMDGpuArch(StringToCudaArch(Str));
1182+
})) {
1183+
Libraries.insert("-lcgpu-amdgpu");
1184+
Libraries.insert("-lmgpu-amdgpu");
1185+
}
1186+
if (llvm::any_of(llvm::split(Arch, ","), [](StringRef Str) {
1187+
return IsNVIDIAGpuArch(StringToCudaArch(Str));
1188+
})) {
1189+
Libraries.insert("-lcgpu-nvptx");
1190+
Libraries.insert("-lmgpu-nvptx");
1191+
}
1192+
}
1193+
1194+
llvm::append_range(CmdArgs, Libraries);
11641195
}
11651196

11661197
void tools::addOpenMPRuntimeLibraryPath(const ToolChain &TC,

clang/lib/Driver/ToolChains/Cuda.cpp

+21-22
Original file line numberDiff line numberDiff line change
@@ -630,11 +630,6 @@ void NVPTX::Linker::ConstructJob(Compilation &C, const JobAction &JA,
630630
continue;
631631
}
632632

633-
// Currently, we only pass the input files to the linker, we do not pass
634-
// any libraries that may be valid only for the host.
635-
if (!II.isFilename())
636-
continue;
637-
638633
AddStaticDeviceLibsLinking(C, *this, JA, Inputs, Args, CmdArgs, "nvptx",
639634
GPUArch, /*isBitCodeSDL=*/false,
640635
/*postClangLink=*/false);
@@ -643,26 +638,30 @@ void NVPTX::Linker::ConstructJob(Compilation &C, const JobAction &JA,
643638
// file and device linking when given a '.cubin' file. We always want to
644639
// perform device linking, so just rename any '.o' files.
645640
// FIXME: This should hopefully be removed if NVIDIA updates their tooling.
646-
auto InputFile = getToolChain().getInputFilename(II);
647-
if (llvm::sys::path::extension(InputFile) != ".cubin") {
648-
// If there are no actions above this one then this is direct input and we
649-
// can copy it. Otherwise the input is internal so a `.cubin` file should
650-
// exist.
651-
if (II.getAction() && II.getAction()->getInputs().size() == 0) {
652-
const char *CubinF =
653-
Args.MakeArgString(getToolChain().getDriver().GetTemporaryPath(
654-
llvm::sys::path::stem(InputFile), "cubin"));
655-
if (llvm::sys::fs::copy_file(InputFile, C.addTempFile(CubinF)))
656-
continue;
641+
if (II.isFilename()) {
642+
auto InputFile = getToolChain().getInputFilename(II);
643+
if (llvm::sys::path::extension(InputFile) != ".cubin") {
644+
// If there are no actions above this one then this is direct input and
645+
// we can copy it. Otherwise the input is internal so a `.cubin` file
646+
// should exist.
647+
if (II.getAction() && II.getAction()->getInputs().size() == 0) {
648+
const char *CubinF =
649+
Args.MakeArgString(getToolChain().getDriver().GetTemporaryPath(
650+
llvm::sys::path::stem(InputFile), "cubin"));
651+
if (llvm::sys::fs::copy_file(InputFile, C.addTempFile(CubinF)))
652+
continue;
657653

658-
CmdArgs.push_back(CubinF);
654+
CmdArgs.push_back(CubinF);
655+
} else {
656+
SmallString<256> Filename(InputFile);
657+
llvm::sys::path::replace_extension(Filename, "cubin");
658+
CmdArgs.push_back(Args.MakeArgString(Filename));
659+
}
659660
} else {
660-
SmallString<256> Filename(InputFile);
661-
llvm::sys::path::replace_extension(Filename, "cubin");
662-
CmdArgs.push_back(Args.MakeArgString(Filename));
661+
CmdArgs.push_back(Args.MakeArgString(InputFile));
663662
}
664-
} else {
665-
CmdArgs.push_back(Args.MakeArgString(InputFile));
663+
} else if (!II.isNothing()) {
664+
II.getInputArg().renderAsInput(Args, CmdArgs);
666665
}
667666
}
668667

clang/test/Driver/arm-alignment.c

+15
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,21 @@
2222
// RUN: %clang -target armv7-windows -### %s 2> %t
2323
// RUN: FileCheck --check-prefix=CHECK-UNALIGNED-ARM < %t %s
2424

25+
// RUN: %clang --target=armv6 -### %s 2> %t
26+
// RUN: FileCheck --check-prefix=CHECK-ALIGNED-ARM < %t %s
27+
28+
// RUN: %clang --target=armv7 -### %s 2> %t
29+
// RUN: FileCheck --check-prefix=CHECK-UNALIGNED-ARM < %t %s
30+
31+
// RUN: %clang -target thumbv6m-none-gnueabi -mcpu=cortex-m0 -### %s 2> %t
32+
// RUN: FileCheck --check-prefix CHECK-ALIGNED-ARM <%t %s
33+
34+
// RUN: %clang -target thumb-none-gnueabi -mcpu=cortex-m0 -### %s 2> %t
35+
// RUN: FileCheck --check-prefix CHECK-ALIGNED-ARM <%t %s
36+
37+
// RUN: %clang -target thumbv8m.base-none-gnueabi -### %s 2> %t
38+
// RUN: FileCheck --check-prefix CHECK-ALIGNED-ARM <%t %s
39+
2540
// RUN: %clang --target=aarch64 -munaligned-access -### %s 2> %t
2641
// RUN: FileCheck --check-prefix=CHECK-UNALIGNED-AARCH64 < %t %s
2742

clang/test/Driver/cuda-cross-compiling.c

+8-1
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,13 @@
6969
// LOWERING: -cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}} "-mllvm" "--nvptx-lower-global-ctor-dtor"
7070

7171
//
72+
// Test passing arguments directly to nvlink.
73+
//
74+
// RUN: %clang -target nvptx64-nvidia-cuda -Wl,-v -Wl,a,b -### %s 2>&1 \
75+
// RUN: | FileCheck -check-prefix=LINKER-ARGS %s
76+
77+
// LINKER-ARGS: nvlink{{.*}}"-v"{{.*}}"a" "b"
78+
7279
// Tests for handling a missing architecture.
7380
//
7481
// RUN: not %clang -target nvptx64-nvidia-cuda %s -### 2>&1 \
@@ -80,4 +87,4 @@
8087
// RUN: %clang -target nvptx64-nvidia-cuda -flto -c %s -### 2>&1 \
8188
// RUN: | FileCheck -check-prefix=GENERIC %s
8289

83-
// GENERIC-NOT: -cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}} "-target-cpu"
90+
// GENERIC-NOT: -cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}} "-target-cpu"

clang/test/Driver/openmp-offload-gpu.c

+17-3
Original file line numberDiff line numberDiff line change
@@ -393,14 +393,28 @@
393393
//
394394
// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp=libomp \
395395
// RUN: --libomptarget-nvptx-bc-path=%S/Inputs/libomptarget/libomptarget-nvptx-test.bc \
396+
// RUN: --libomptarget-amdgpu-bc-path=%S/Inputs/hip_dev_lib/libomptarget-amdgpu-gfx803.bc \
396397
// RUN: --cuda-path=%S/Inputs/CUDA_102/usr/local/cuda \
397-
// RUN: --offload-arch=sm_52 -gpulibc -nogpuinc %s 2>&1 \
398+
// RUN: --rocm-path=%S/Inputs/rocm \
399+
// RUN: --offload-arch=sm_52,gfx803 -gpulibc -nogpuinc %s 2>&1 \
398400
// RUN: | FileCheck --check-prefix=LIBC-GPU %s
399-
// LIBC-GPU: "-lcgpu"{{.*}}"-lmgpu"
401+
// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp=libomp \
402+
// RUN: --libomptarget-nvptx-bc-path=%S/Inputs/libomptarget/libomptarget-nvptx-test.bc \
403+
// RUN: --libomptarget-amdgpu-bc-path=%S/Inputs/hip_dev_lib/libomptarget-amdgpu-gfx803.bc \
404+
// RUN: --cuda-path=%S/Inputs/CUDA_102/usr/local/cuda \
405+
// RUN: --rocm-path=%S/Inputs/rocm \
406+
// RUN: -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_52 \
407+
// RUN: -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx803 \
408+
// RUN: -fopenmp-targets=nvptx64-nvidia-cuda,amdgcn-amd-amdhsa -gpulibc -nogpuinc %s 2>&1 \
409+
// RUN: | FileCheck --check-prefix=LIBC-GPU %s
410+
// LIBC-GPU-DAG: "-lcgpu-amdgpu"
411+
// LIBC-GPU-DAG: "-lmgpu-amdgpu"
412+
// LIBC-GPU-DAG: "-lcgpu-nvptx"
413+
// LIBC-GPU-DAG: "-lmgpu-nvptx"
400414

401415
// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp=libomp \
402416
// RUN: --libomptarget-nvptx-bc-path=%S/Inputs/libomptarget/libomptarget-nvptx-test.bc \
403417
// RUN: --cuda-path=%S/Inputs/CUDA_102/usr/local/cuda \
404418
// RUN: --offload-arch=sm_52 -nogpulibc -nogpuinc %s 2>&1 \
405419
// RUN: | FileCheck --check-prefix=NO-LIBC-GPU %s
406-
// NO-LIBC-GPU-NOT: "-lcgpu"{{.*}}"-lmgpu"
420+
// NO-LIBC-GPU-NOT: -lmgpu{{.*}}-lcgpu

0 commit comments

Comments
 (0)