GCC passes and returns complex numbers in registers #2

chbessonova · 2019-02-04T07:20:08Z

I noticed that GCC passes and returns complex numbers (complex.h) in registers while user-defined structures, even contain 2 floats are still passed and returned by reference.

#include <complex.h>

struct s { float a[2]; };

void f1(struct s S); 
void f2(float complex C); 

extern struct s SS; 
extern float complex CC; 

int main() {
  f1(SS);
  f2(CC);
}

main:
	MOV.W	#SS, R12
	CALL	#f1
	MOV.W	&CC, R12
	MOV.W	&CC+2, R13
	MOV.W	&CC+4, R14
	MOV.W	&CC+6, R15
	CALL	#f2
	MOV.B	#0, R12

TI CCS passes everything by reference:

main:
        MOV.W     #SS+0,r12
        CALL      #f1 
        MOV.W     #CC+0,r12  
        CALL      #f2  
        MOV.W     #0,r12 
        RET

The text was updated successfully, but these errors were encountered:

… Solaris/x86 A dozen 32-bit `AddressSanitizer` testcases FAIL on the latest beta of Solaris 11.4/x86, e.g. `AddressSanitizer-i386-sunos :: TestCases/null_deref.cpp` produces AddressSanitizer:DEADLYSIGNAL ================================================================= ==29274==ERROR: AddressSanitizer: stack-overflow on address 0x00000028 (pc 0x08135efd bp 0xfeffdfd8 sp 0x00000000 T0) #0 0x8135efd in NullDeref(int*) /vol/llvm/src/llvm-project/dist/compiler-rt/test/asan/TestCases/null_deref.cpp:15:10 #1 0x8135ea6 in main /vol/llvm/src/llvm-project/dist/compiler-rt/test/asan/TestCases/null_deref.cpp:21:3 #2 0x8084b85 in _start (null_deref.cpp.tmp+0x8084b85) SUMMARY: AddressSanitizer: stack-overflow /vol/llvm/src/llvm-project/dist/compiler-rt/test/asan/TestCases/null_deref.cpp:15:10 in NullDeref(int*) ==29274==ABORTING instead of the expected AddressSanitizer:DEADLYSIGNAL ================================================================= ==29276==ERROR: AddressSanitizer: SEGV on unknown address 0x00000028 (pc 0x08135f1f bp 0xfeffdf48 sp 0xfeffdf40 T0) ==29276==The signal is caused by a WRITE memory access. ==29276==Hint: address points to the zero page. #0 0x8135f1f in NullDeref(int*) /vol/llvm/src/llvm-project/local/compiler-rt/test/asan/TestCases/null_deref.cpp:15:10 #1 0x8135efa in main /vol/llvm/src/llvm-project/local/compiler-rt/test/asan/TestCases/null_deref.cpp:21:3 #2 0x8084be5 in _start (null_deref.cpp.tmp+0x8084be5) AddressSanitizer can not provide additional info. SUMMARY: AddressSanitizer: SEGV /vol/llvm/src/llvm-project/local/compiler-rt/test/asan/TestCases/null_deref.cpp:15:10 in NullDeref(int*) ==29276==ABORTING I managed to trace this to a change in `<sys/regset.h>`: previously the header would primarily define the short register indices (like `UESP`). While they are required by the i386 psABI, they are only required in `<ucontext.h>` and could previously leak into unsuspecting user code, polluting the namespace and requiring elaborate workarounds like that in `llvm/include/llvm/Support/Solaris/sys/regset.h`. The change fixed that by restricting the definition of the short forms appropriately, at the same time defining all `REG_` prefixed forms for compatiblity with other systems. This exposed a bug in `compiler-rt/lib/sanitizer_common/sanitizer_linux.cpp`, however: Previously, the index for the user stack pointer would be hardcoded if `REG_ESP` wasn't defined. Now with that definition present, it turned out that `REG_ESP` was the wrong index to use: the previous value 17 (and `REG_SP`) corresponds to `REG_UESP` instead. With that change, the failures are all gone. Tested on `amd-pc-solaris2.11`. Differential Revision: https://reviews.llvm.org/D83664

atrosinenko · 2020-07-18T16:20:06Z

The 67422e4 commit is expected to make sure _Complex types are laid out identically by both gcc and clang.

Hmm, now for the test.c with -Os optimization level, clang generates (R1 == SP)

main: 
        sub     #8, r1
        mov     &SS+6, 6(r1)
        mov     &SS+4, 4(r1)
        mov     &SS+2, 2(r1)
        mov     &SS, 0(r1)
        call    #f1
        mov     &CC+6, r15
        mov     &CC+4, r14
        mov     &CC+2, r13
        mov     &CC, r12
        call    #f2
        clr     r12
        add     #8, r1
        ret

And gcc generates

main:
        MOV.W   #SS, R12
        CALL    #f1
        MOV.W   &CC, R12
        MOV.W   &CC+2, R13
        MOV.W   &CC+4, R14
        MOV.W   &CC+6, R15
        CALL    #f2
        MOV.B   #0, R12
        RET

So it seems there is no difference for f2 but something strange is generated for f1 and needs further investigation.

…RM` was undefined after definition. `PP->getMacroInfo()` returns nullptr for undefined macro, which leads to null-dereference at `MI->tockens().back()`. Stack dump: ``` #0 0x000000000217d15a llvm::sys::PrintStackTrace(llvm::raw_ostream&) (/llvm-project/build/bin/clang-tidy+0x217d15a) #1 0x000000000217b17c llvm::sys::RunSignalHandlers() (/llvm-project/build/bin/clang-tidy+0x217b17c) #2 0x000000000217b2e3 SignalHandler(int) (/llvm-project/build/bin/clang-tidy+0x217b2e3) #3 0x00007f39be5b1390 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x11390) #4 0x0000000000593532 clang::tidy::bugprone::BadSignalToKillThreadCheck::check(clang::ast_matchers::MatchFinder::MatchResult const&) (/llvm-project/build/bin/clang-tidy+0x593532) ``` Reviewed By: hokein Differential Revision: https://reviews.llvm.org/D85401

…RM` is not a literal. If `SIGTERM` is not a literal (e.g. `#define SIGTERM ((unsigned)15)`) bugprone-bad-signal-to-kill-thread check crashes. Stack dump: ``` #0 0x000000000217d15a llvm::sys::PrintStackTrace(llvm::raw_ostream&) (/llvm-project/build/bin/clang-tidy+0x217d15a) #1 0x000000000217b17c llvm::sys::RunSignalHandlers() (/llvm-project/build/bin/clang-tidy+0x217b17c) #2 0x000000000217b2e3 SignalHandler(int) (/llvm-project/build/bin/clang-tidy+0x217b2e3) #3 0x00007f6a7efb1390 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x11390) #4 0x000000000212ac9b llvm::StringRef::getAsInteger(unsigned int, llvm::APInt&) const (/llvm-project/build/bin/clang-tidy+0x212ac9b) #5 0x0000000000593501 clang::tidy::bugprone::BadSignalToKillThreadCheck::check(clang::ast_matchers::MatchFinder::MatchResult const&) (/llvm-project/build/bin/clang-tidy+0x593501) ``` Reviewed By: hokein Differential Revision: https://reviews.llvm.org/D85398

… when `__STDC_WANT_LIB_EXT1__` was undefined after definition. PP->getMacroInfo() returns nullptr for undefined macro, so we need to check this return value before dereference. Stack dump: ``` #0 0x0000000002185e6a llvm::sys::PrintStackTrace(llvm::raw_ostream&) (/llvm-project/build/bin/clang-tidy+0x2185e6a) #1 0x0000000002183e8c llvm::sys::RunSignalHandlers() (/llvm-project/build/bin/clang-tidy+0x2183e8c) #2 0x0000000002183ff3 SignalHandler(int) (/llvm-project/build/bin/clang-tidy+0x2183ff3) #3 0x00007f37df9b1390 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x11390) #4 0x000000000052054e clang::tidy::bugprone::NotNullTerminatedResultCheck::check(clang::ast_matchers::MatchFinder::MatchResult const&) (/llvm-project/build/bin/clang-tidy+0x52054e) ``` Reviewed By: hokein Differential Revision: https://reviews.llvm.org/D85523

… when `__STDC_WANT_LIB_EXT1__` is not a literal. If `__STDC_WANT_LIB_EXT1__` is not a literal (e.g. `#define __STDC_WANT_LIB_EXT1__ ((unsigned)1)`) bugprone-not-null-terminated-result check crashes. Stack dump: ``` #0 0x0000000002185e6a llvm::sys::PrintStackTrace(llvm::raw_ostream&) (/llvm-project/build/bin/clang-tidy+0x2185e6a) #1 0x0000000002183e8c llvm::sys::RunSignalHandlers() (/llvm-project/build/bin/clang-tidy+0x2183e8c) #2 0x0000000002183ff3 SignalHandler(int) (/llvm-project/build/bin/clang-tidy+0x2183ff3) #3 0x00007f08d91b1390 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x11390) #4 0x00000000021338bb llvm::StringRef::getAsInteger(unsigned int, llvm::APInt&) const (/llvm-project/build/bin/clang-tidy+0x21338bb) #5 0x000000000052051c clang::tidy::bugprone::NotNullTerminatedResultCheck::check(clang::ast_matchers::MatchFinder::MatchResult const&) (/llvm-project/build/bin/clang-tidy+0x52051c) ``` Reviewed By: hokein Differential Revision: https://reviews.llvm.org/D85525

When `Target::GetEntryPointAddress()` calls `exe_module->GetObjectFile()->GetEntryPointAddress()`, and the returned `entry_addr` is valid, it can immediately be returned. However, just before that, an `llvm::Error` value has been setup, but in this case it is not consumed before returning, like is done further below in the function. In https://bugs.freebsd.org/248745 we got a bug report for this, where a very simple test case aborts and dumps core: ``` * thread #1, name = 'testcase', stop reason = breakpoint 1.1 frame #0: 0x00000000002018d4 testcase`main(argc=1, argv=0x00007fffffffea18) at testcase.c:3:5 1 int main(int argc, char *argv[]) 2 { -> 3 return 0; 4 } (lldb) p argc Program aborted due to an unhandled Error: Error value was Success. (Note: Success values must still be checked prior to being destroyed). Thread 1 received signal SIGABRT, Aborted. thr_kill () at thr_kill.S:3 3 thr_kill.S: No such file or directory. (gdb) bt #0 thr_kill () at thr_kill.S:3 #1 0x00000008049a0004 in __raise (s=6) at /usr/src/lib/libc/gen/raise.c:52 #2 0x0000000804916229 in abort () at /usr/src/lib/libc/stdlib/abort.c:67 #3 0x000000000451b5f5 in fatalUncheckedError () at /usr/src/contrib/llvm-project/llvm/lib/Support/Error.cpp:112 #4 0x00000000019cf008 in GetEntryPointAddress () at /usr/src/contrib/llvm-project/llvm/include/llvm/Support/Error.h:267 #5 0x0000000001bccbd8 in ConstructorSetup () at /usr/src/contrib/llvm-project/lldb/source/Target/ThreadPlanCallFunction.cpp:67 #6 0x0000000001bcd2c0 in ThreadPlanCallFunction () at /usr/src/contrib/llvm-project/lldb/source/Target/ThreadPlanCallFunction.cpp:114 #7 0x00000000020076d4 in InferiorCallMmap () at /usr/src/contrib/llvm-project/lldb/source/Plugins/Process/Utility/InferiorCallPOSIX.cpp:97 #8 0x0000000001f4be33 in DoAllocateMemory () at /usr/src/contrib/llvm-project/lldb/source/Plugins/Process/FreeBSD/ProcessFreeBSD.cpp:604 #9 0x0000000001fe51b9 in AllocatePage () at /usr/src/contrib/llvm-project/lldb/source/Target/Memory.cpp:347 #10 0x0000000001fe5385 in AllocateMemory () at /usr/src/contrib/llvm-project/lldb/source/Target/Memory.cpp:383 #11 0x0000000001974da2 in AllocateMemory () at /usr/src/contrib/llvm-project/lldb/source/Target/Process.cpp:2301 #12 CanJIT () at /usr/src/contrib/llvm-project/lldb/source/Target/Process.cpp:2331 #13 0x0000000001a1bf3d in Evaluate () at /usr/src/contrib/llvm-project/lldb/source/Expression/UserExpression.cpp:190 #14 0x00000000019ce7a2 in EvaluateExpression () at /usr/src/contrib/llvm-project/lldb/source/Target/Target.cpp:2372 #15 0x0000000001ad784c in EvaluateExpression () at /usr/src/contrib/llvm-project/lldb/source/Commands/CommandObjectExpression.cpp:414 #16 0x0000000001ad86ae in DoExecute () at /usr/src/contrib/llvm-project/lldb/source/Commands/CommandObjectExpression.cpp:646 #17 0x0000000001a5e3ed in Execute () at /usr/src/contrib/llvm-project/lldb/source/Interpreter/CommandObject.cpp:1003 #18 0x0000000001a6c4a3 in HandleCommand () at /usr/src/contrib/llvm-project/lldb/source/Interpreter/CommandInterpreter.cpp:1762 #19 0x0000000001a6f98c in IOHandlerInputComplete () at /usr/src/contrib/llvm-project/lldb/source/Interpreter/CommandInterpreter.cpp:2760 #20 0x0000000001a90b08 in Run () at /usr/src/contrib/llvm-project/lldb/source/Core/IOHandler.cpp:548 #21 0x00000000019a6c6a in ExecuteIOHandlers () at /usr/src/contrib/llvm-project/lldb/source/Core/Debugger.cpp:903 #22 0x0000000001a70337 in RunCommandInterpreter () at /usr/src/contrib/llvm-project/lldb/source/Interpreter/CommandInterpreter.cpp:2946 #23 0x0000000001d9d812 in RunCommandInterpreter () at /usr/src/contrib/llvm-project/lldb/source/API/SBDebugger.cpp:1169 #24 0x0000000001918be8 in MainLoop () at /usr/src/contrib/llvm-project/lldb/tools/driver/Driver.cpp:675 #25 0x000000000191a114 in main () at /usr/src/contrib/llvm-project/lldb/tools/driver/Driver.cpp:890``` Fix the incorrect error catch by only instantiating an `Error` object if it is necessary. Reviewed By: JDevlieghere Differential Revision: https://reviews.llvm.org/D86355

Using the same strategy as c38e425. D69825 revealed (introduced?) a problem when building with ASan, and some memory leaks somewhere. More details are available in the original patch. Looks like we missed one failing tests, this patch adds the workaround to this test as well. (cherry picked from commit e174da4)

Somewhat surprisingly, signature help is emitted as a side-effect of computing the expected type of a function argument. The reason is that both actions require enumerating the possible function signatures and running partial overload resolution, and doing this twice would be wasteful and complicated. Change #1: document this, it's subtle :-) However, sometimes we need to compute the expected type without having reached the code completion cursor yet - in particular to allow completion of designators. eb4ab33 did this but introduced a regression - it emits signature help in the wrong location as a side-effect. Change #2: only emit signature help if the code completion cursor was reached. Currently there is PP.isCodeCompletionReached(), but we can't use it because it's set *after* running code completion. It'd be nice to set this implicitly when the completion token is lexed, but ConsumeCodeCompletionToken() makes this complicated. Change #3: call cutOffParsing() *first* when seeing a completion token. After this, the fact that the Sema::Produce*SignatureHelp() functions are even more confusing, as they only sometimes do that. I don't want to rename them in this patch as it's another large mechanical change, but we should soon. Change #4: prepare to rename ProduceSignatureHelp() to GuessArgumentType() etc. Differential Revision: https://reviews.llvm.org/D98488

@foo

…ocation functions Elimination of bitcasts with void pointer arguments results in GEPs with pure byte indexes. These GEPs do not preserve struct/array information and interrupt phi address translation in later pipeline stages. Here is the original motivation for this patch: ``` #include<stdio.h> #include<malloc.h> typedef struct __Node{ double f; struct __Node *next; } Node; void foo () { Node *a = (Node*) malloc (sizeof(Node)); a->next = NULL; a->f = 11.5f; Node *ptr = a; double sum = 0.0f; while (ptr) { sum += ptr->f; ptr = ptr->next; } printf("%f\n", sum); } ``` By explicit assignment `a->next = NULL`, we can infer the length of the link list is `1`. In this case we can eliminate while loop traversal entirely. This elimination is supposed to be performed by GVN/MemoryDependencyAnalysis/PhiTranslation . The final IR before this patch: ``` define dso_local void @foo(i32* nocapture readnone %r) local_unnamed_addr #0 { entry: %call = tail call noalias dereferenceable_or_null(16) i8* @malloc(i64 16) #2 %next = getelementptr inbounds i8, i8* %call, i64 8 %0 = bitcast i8* %next to %struct.__Node** store %struct.__Node* null, %struct.__Node** %0, align 8, !tbaa !2 %f = bitcast i8* %call to double* store double 1.150000e+01, double* %f, align 8, !tbaa !8 %tobool12 = icmp eq i8* %call, null br i1 %tobool12, label %while.end, label %while.body.lr.ph while.body.lr.ph: ; preds = %entry %1 = bitcast i8* %call to %struct.__Node* br label %while.body while.body: ; preds = %while.body.lr.ph, %while.body %sum.014 = phi double [ 0.000000e+00, %while.body.lr.ph ], [ %add, %while.body ] %ptr.013 = phi %struct.__Node* [ %1, %while.body.lr.ph ], [ %3, %while.body ] %f1 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 0 %2 = load double, double* %f1, align 8, !tbaa !8 %add = fadd contract double %sum.014, %2 %next2 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 1 %3 = load %struct.__Node*, %struct.__Node** %next2, align 8, !tbaa !2 %tobool = icmp eq %struct.__Node* %3, null br i1 %tobool, label %while.end, label %while.body while.end: ; preds = %while.body, %entry %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add, %while.body ] %call3 = tail call i32 (i8*, ...) @printf(i8* nonnull dereferenceable(1) getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %sum.0.lcssa) ret void } ``` Final IR after this patch: ``` ; Function Attrs: nofree nounwind define dso_local void @foo(i32* nocapture readnone %r) local_unnamed_addr #0 { while.end: %call3 = tail call i32 (i8*, ...) @printf(i8* nonnull dereferenceable(1) getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double 1.150000e+01) ret void } ``` IR before GVN before this patch: ``` define dso_local void @foo(i32* nocapture readnone %r) local_unnamed_addr #0 { entry: %call = tail call noalias dereferenceable_or_null(16) i8* @malloc(i64 16) #2 %next = getelementptr inbounds i8, i8* %call, i64 8 %0 = bitcast i8* %next to %struct.__Node** store %struct.__Node* null, %struct.__Node** %0, align 8, !tbaa !2 %f = bitcast i8* %call to double* store double 1.150000e+01, double* %f, align 8, !tbaa !8 %tobool12 = icmp eq i8* %call, null br i1 %tobool12, label %while.end, label %while.body.lr.ph while.body.lr.ph: ; preds = %entry %1 = bitcast i8* %call to %struct.__Node* br label %while.body while.body: ; preds = %while.body.lr.ph, %while.body %sum.014 = phi double [ 0.000000e+00, %while.body.lr.ph ], [ %add, %while.body ] %ptr.013 = phi %struct.__Node* [ %1, %while.body.lr.ph ], [ %3, %while.body ] %f1 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 0 %2 = load double, double* %f1, align 8, !tbaa !8 %add = fadd contract double %sum.014, %2 %next2 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 1 %3 = load %struct.__Node*, %struct.__Node** %next2, align 8, !tbaa !2 %tobool = icmp eq %struct.__Node* %3, null br i1 %tobool, label %while.end.loopexit, label %while.body while.end.loopexit: ; preds = %while.body %add.lcssa = phi double [ %add, %while.body ] br label %while.end while.end: ; preds = %while.end.loopexit, %entry %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add.lcssa, %while.end.loopexit ] %call3 = tail call i32 (i8*, ...) @printf(i8* nonnull dereferenceable(1) getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %sum.0.lcssa) ret void } ``` IR before GVN after this patch: ``` define dso_local void @foo(i32* nocapture readnone %r) local_unnamed_addr #0 { entry: %call = tail call noalias dereferenceable_or_null(16) i8* @malloc(i64 16) #2 %0 = bitcast i8* %call to %struct.__Node* %next = getelementptr inbounds %struct.__Node, %struct.__Node* %0, i64 0, i32 1 store %struct.__Node* null, %struct.__Node** %next, align 8, !tbaa !2 %f = getelementptr inbounds %struct.__Node, %struct.__Node* %0, i64 0, i32 0 store double 1.150000e+01, double* %f, align 8, !tbaa !8 %tobool12 = icmp eq i8* %call, null br i1 %tobool12, label %while.end, label %while.body.preheader while.body.preheader: ; preds = %entry br label %while.body while.body: ; preds = %while.body.preheader, %while.body %sum.014 = phi double [ %add, %while.body ], [ 0.000000e+00, %while.body.preheader ] %ptr.013 = phi %struct.__Node* [ %2, %while.body ], [ %0, %while.body.preheader ] %f1 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 0 %1 = load double, double* %f1, align 8, !tbaa !8 %add = fadd contract double %sum.014, %1 %next2 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 1 %2 = load %struct.__Node*, %struct.__Node** %next2, align 8, !tbaa !2 %tobool = icmp eq %struct.__Node* %2, null br i1 %tobool, label %while.end.loopexit, label %while.body while.end.loopexit: ; preds = %while.body %add.lcssa = phi double [ %add, %while.body ] br label %while.end while.end: ; preds = %while.end.loopexit, %entry %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add.lcssa, %while.end.loopexit ] %call3 = tail call i32 (i8*, ...) @printf(i8* nonnull dereferenceable(1) getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %sum.0.lcssa) ret void } ``` The phi translation fails before this patch and it prevents GVN to remove the loop. The reason for this failure is in InstCombine. When the Instruction combining pass decides to convert: ``` %call = tail call noalias dereferenceable_or_null(16) i8* @malloc(i64 16) %0 = bitcast i8* %call to %struct.__Node* %next = getelementptr inbounds %struct.__Node, %struct.__Node* %0, i64 0, i32 1 store %struct.__Node* null, %struct.__Node** %next ``` to ``` %call = tail call noalias dereferenceable_or_null(16) i8* @malloc(i64 16) %next = getelementptr inbounds i8, i8* %call, i64 8 %0 = bitcast i8* %next to %struct.__Node** store %struct.__Node* null, %struct.__Node** %0 ``` GEP instructions with pure byte indexes (e.g. `getelementptr inbounds i8, i8* %call, i64 8`) are obstacles for address translation. address translation is looking for structural similarity between GEPs and these GEPs usually do not match since they have different structure. This change will cause couple of failures in LLVM-tests. However, in all cases we need to change expected result by the test. I will update those tests as soon as I get green light on this patch. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D96881

This reverts commit 9be8f8b. This breaks tsan on Ubuntu 16.04: $ cat tiny_race.c #include <pthread.h> int Global; void *Thread1(void *x) { Global = 42; return x; } int main() { pthread_t t; pthread_create(&t, NULL, Thread1, NULL); Global = 43; pthread_join(t, NULL); return Global; } $ out/gn/bin/clang -fsanitize=thread -g -O1 tiny_race.c --sysroot ~/src/chrome/src/build/linux/debian_sid_amd64-sysroot/ $ docker run -v $PWD:/foo ubuntu:xenial /foo/a.out FATAL: ThreadSanitizer CHECK failed: ../../compiler-rt/lib/tsan/rtl/tsan_platform_linux.cpp:447 "((thr_beg)) >= ((tls_addr))" (0x7fddd76beb80, 0xfffffffffffff980) #0 <null> <null> (a.out+0x4960b6) #1 <null> <null> (a.out+0x4b677f) #2 <null> <null> (a.out+0x49cf94) #3 <null> <null> (a.out+0x499bd2) #4 <null> <null> (a.out+0x42aaf1) #5 <null> <null> (libpthread.so.0+0x76b9) #6 <null> <null> (libc.so.6+0x1074dc) (Get the sysroot from here: https://commondatastorage.googleapis.com/chrome-linux-sysroot/toolchain/500976182686961e34974ea7bdc0a21fca32be06/debian_sid_amd64_sysroot.tar.xz) Also reverts follow-on commits: This reverts commit 58c62fd. This reverts commit 31e541e.

This patch implements expansion of llvm.vp.* intrinsics (https://llvm.org/docs/LangRef.html#vector-predication-intrinsics). VP expansion is required for targets that do not implement VP code generation. Since expansion is controllable with TTI, targets can switch on the VP intrinsics they do support in their backend offering a smooth transition strategy for VP code generation (VE, RISC-V V, ARM SVE, AVX512, ..). Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D78203

This reverts commit 43bc584. The commit broke the -DLLVM_ENABLE_MODULES=1 builds. http://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/31603/consoleFull#2136199809a1ca8a51-895e-46c6-af87-ce24fa4cd561

This reverts the revert 02c5ba8 Fix: Pass was registered as DUMMY_FUNCTION_PASS causing the newpm-pass functions to be doubly defined. Triggered in -DLLVM_ENABLE_MODULE=1 builds. Original commit: This patch implements expansion of llvm.vp.* intrinsics (https://llvm.org/docs/LangRef.html#vector-predication-intrinsics). VP expansion is required for targets that do not implement VP code generation. Since expansion is controllable with TTI, targets can switch on the VP intrinsics they do support in their backend offering a smooth transition strategy for VP code generation (VE, RISC-V V, ARM SVE, AVX512, ..). Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D78203

…ing it Having nested macros in the C code could cause clangd to fail an assert in clang::Preprocessor::setLoadedMacroDirective() and crash. #1 0x00000000007ace30 PrintStackTraceSignalHandler(void*) /qdelacru/llvm-project/llvm/lib/Support/Unix/Signals.inc:632:1 #2 0x00000000007aaded llvm::sys::RunSignalHandlers() /qdelacru/llvm-project/llvm/lib/Support/Signals.cpp:76:20 #3 0x00000000007ac7c1 SignalHandler(int) /qdelacru/llvm-project/llvm/lib/Support/Unix/Signals.inc:407:1 #4 0x00007f096604db20 __restore_rt (/lib64/libpthread.so.0+0x12b20) #5 0x00007f0964b307ff raise (/lib64/libc.so.6+0x377ff) #6 0x00007f0964b1ac35 abort (/lib64/libc.so.6+0x21c35) #7 0x00007f0964b1ab09 _nl_load_domain.cold.0 (/lib64/libc.so.6+0x21b09) #8 0x00007f0964b28de6 (/lib64/libc.so.6+0x2fde6) #9 0x0000000001004d1a clang::Preprocessor::setLoadedMacroDirective(clang::IdentifierInfo*, clang::MacroDirective*, clang::MacroDirective*) /qdelacru/llvm-project/clang/lib/Lex/PPMacroExpansion.cpp:116:5 An example of the code that causes the assert failure: ``` ... ``` During code completion in clangd, the macros will be loaded in loadMainFilePreambleMacros() by iterating over the macro names and calling PreambleIdentifiers->get(). Since these macro names are store in a StringSet (has StringMap underlying container), the order of the iterator is not guaranteed to be same as the order seen in the source code. When clangd is trying to resolve nested macros it sometimes attempts to load them out of order which causes a macro to be stored twice. In the example above, ECHO2 macro gets resolved first, but since it uses another macro that has not been resolved it will try to resolve/store that as well. Now there are two MacroDirectives stored in the Preprocessor, ECHO and ECHO2. When clangd tries to load the next macro, ECHO, the preprocessor fails an assert in clang::Preprocessor::setLoadedMacroDirective() because there is already a MacroDirective stored for that macro name. In this diff, I check if the macro is already inside the IdentifierTable and if it is skip it so that it is not resolved twice. Reviewed By: kadircet Differential Revision: https://reviews.llvm.org/D101870

Running this on Amazon Ubuntu the final backtrace is: ``` (lldb) thread backtrace * thread #1, name = 'a.out', stop reason = breakpoint 1.1 * frame #0: 0x0000aaaaaaaa07d0 a.out`func_c at main.c:10:3 frame #1: 0x0000aaaaaaaa07c4 a.out`func_b at main.c:14:3 frame #2: 0x0000aaaaaaaa07b4 a.out`func_a at main.c:18:3 frame #3: 0x0000aaaaaaaa07a4 a.out`main(argc=<unavailable>, argv=<unavailable>) at main.c:22:3 frame #4: 0x0000fffff7b373fc libc.so.6`___lldb_unnamed_symbol2962 + 108 frame #5: 0x0000fffff7b374cc libc.so.6`__libc_start_main + 152 frame #6: 0x0000aaaaaaaa06b0 a.out`_start + 48 ``` This causes the test to fail because of the extra ___lldb_unnamed_symbol2962 frame (an inlined function?). To fix this, strictly check all the frames in main.c then for the rest just check we find __libc_start_main and _start in that order regardless of other frames in between. Reviewed By: omjavaid Differential Revision: https://reviews.llvm.org/D154204

…tput The crash happens in clang::driver::tools::SplitDebugName when Output is InputInfo::Nothing. It doesn't happen with standalone clang driver because output is created in Driver::BuildJobsForActionNoCache. Example backtrace: ``` * thread #1, name = 'clangd', stop reason = hit program assert * frame #0: 0x00007ffff5c4eacf libc.so.6`raise + 271 frame #1: 0x00007ffff5c21ea5 libc.so.6`abort + 295 frame #2: 0x00007ffff5c21d79 libc.so.6`__assert_fail_base.cold.0 + 15 frame #3: 0x00007ffff5c47426 libc.so.6`__assert_fail + 70 frame #4: 0x000055555dc0923c clangd`clang::driver::InputInfo::getFilename(this=0x00007fffffff9398) const at InputInfo.h:84:5 frame #5: 0x000055555dcd0d8d clangd`clang::driver::tools::SplitDebugName(JA=0x000055555f6c6a50, Args=0x000055555f6d0b80, Input=0x00007fffffff9678, Output=0x00007fffffff9398) at CommonArgs.cpp:1275:40 frame #6: 0x000055555dc955a5 clangd`clang::driver::tools::Clang::ConstructJob(this=0x000055555f6c69d0, C=0x000055555f6c64a0, JA=0x000055555f6c6a50, Output=0x00007fffffff9398, Inputs=0x00007fffffff9668, Args=0x000055555f6d0b80, LinkingOutput=0x0000000000000000) const at Clang.cpp:5690:33 frame #7: 0x000055555dbf6b54 clangd`clang::driver::Driver::BuildJobsForActionNoCache(this=0x00007fffffffb5e0, C=0x000055555f6c64a0, A=0x000055555f6c6a50, TC=0x000055555f6c4be0, BoundArch=(Data = 0x0000000000000000, Length = 0), AtTopLevel=true, MultipleArchs=false, LinkingOutput=0x0000000000000000, CachedResults=size=1, TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:5618:10 frame #8: 0x000055555dbf4ef0 clangd`clang::driver::Driver::BuildJobsForAction(this=0x00007fffffffb5e0, C=0x000055555f6c64a0, A=0x000055555f6c6a50, TC=0x000055555f6c4be0, BoundArch=(Data = 0x0000000000000000, Length = 0), AtTopLevel=true, MultipleArchs=false, LinkingOutput=0x0000000000000000, CachedResults=size=1, TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:5306:26 frame #9: 0x000055555dbeb590 clangd`clang::driver::Driver::BuildJobs(this=0x00007fffffffb5e0, C=0x000055555f6c64a0) const at Driver.cpp:4844:5 frame #10: 0x000055555dbe6b0f clangd`clang::driver::Driver::BuildCompilation(this=0x00007fffffffb5e0, ArgList=ArrayRef<const char *> @ 0x00007fffffffb268) at Driver.cpp:1496:3 frame #11: 0x000055555b0cc0d9 clangd`clang::createInvocation(ArgList=ArrayRef<const char *> @ 0x00007fffffffbb38, Opts=CreateInvocationOptions @ 0x00007fffffffbb90) at CreateInvocationFromCommandLine.cpp:53:52 frame #12: 0x000055555b378e7b clangd`clang::clangd::buildCompilerInvocation(Inputs=0x00007fffffffca58, D=0x00007fffffffc158, CC1Args=size=0) at Compiler.cpp:116:44 frame #13: 0x000055555895a6c8 clangd`clang::clangd::(anonymous namespace)::Checker::buildInvocation(this=0x00007fffffffc760, TFS=0x00007fffffffe570, Contents= Has Value=false ) at Check.cpp:212:9 frame #14: 0x0000555558959cec clangd`clang::clangd::check(File=(Data = "build/test.cpp", Length = 64), TFS=0x00007fffffffe570, Opts=0x00007fffffffe600) at Check.cpp:486:34 frame #15: 0x000055555892164a clangd`main(argc=4, argv=0x00007fffffffecd8) at ClangdMain.cpp:993:12 frame #16: 0x00007ffff5c3ad85 libc.so.6`__libc_start_main + 229 frame #17: 0x00005555585bbe9e clangd`_start + 46 ``` Test Plan: ninja ClangDriverTests && tools/clang/unittests/Driver/ClangDriverTests Differential Revision: https://reviews.llvm.org/D154602

ParmVarDecl of BlockDecl is unnecessarily dumped twice. Remove this duplication as other FunctionDecls. Fixes llvm/llvm-project#64005 (#2) Differential Revision: https://reviews.llvm.org/D155985

ThreadSanitizer reports the following issue: ``` Write of size 8 at 0x00010a70abb0 by thread T3 (mutexes: write M0): #0 lldb_private::ThreadList::Update(lldb_private::ThreadList&) ThreadList.cpp:741 (liblldb.18.0.0git.dylib:arm64+0x5dedf4) (BuildId: 9bced2aafa373580ae9d750d9cf79a8f32000000200000000100000000000e00) #1 lldb_private::Process::UpdateThreadListIfNeeded() Process.cpp:1212 (liblldb.18.0.0git.dylib:arm64+0x53bbec) (BuildId: 9bced2aafa373580ae9d750d9cf79a8f32000000200000000100000000000e00) Previous read of size 8 at 0x00010a70abb0 by main thread (mutexes: write M1): #0 lldb_private::ThreadList::GetMutex() const ThreadList.cpp:785 (liblldb.18.0.0git.dylib:arm64+0x5df138) (BuildId: 9bced2aafa373580ae9d750d9cf79a8f32000000200000000100000000000e00) #1 lldb_private::ThreadList::DidResume() ThreadList.cpp:656 (liblldb.18.0.0git.dylib:arm64+0x5de5c0) (BuildId: 9bced2aafa373580ae9d750d9cf79a8f32000000200000000100000000000e00) #2 lldb_private::Process::PrivateResume() Process.cpp:3130 (liblldb.18.0.0git.dylib:arm64+0x53cd7c) (BuildId: 9bced2aafa373580ae9d750d9cf79a8f32000000200000000100000000000e00) ``` Fix this by only using the mutex in ThreadList and removing the one in process entirely. Differential Revision: https://reviews.llvm.org/D158034

@foo

Replace `BPFMIPeepholeTruncElim` by adding an overload for `TargetLowering::isZExtFree()` aware that zero extension is free for `ISD::LOAD`. Short description ================= The `BPFMIPeepholeTruncElim` handles two patterns: Pattern #1: %1 = LDB %0, ... %1 = LDB %0, ... %2 = AND_ri %1, 0xff -> %2 = MOV_ri %1 <-- (!) Pattern #2: bb.1: bb.1: %a = LDB %0, ... %a = LDB %0, ... br %bb3 br %bb3 bb.2: bb.2: %b = LDB %0, ... -> %b = LDB %0, ... br %bb3 br %bb3 bb.3: bb.3: %1 = PHI %a, %b %1 = PHI %a, %b %2 = AND_ri %1, 0xff %2 = MOV_ri %1 <-- (!) Plus variations: - AND_ri_32 instead of AND_ri - SLL/SLR instead of AND_ri - LDH, LDW, LDB32, LDH32, LDW32 Both patterns could be handled by built-in transformations at instruction selection phase if suitable `isZExtFree()` implementation is provided. The idea is borrowed from `ARMTargetLowering::isZExtFree`. When evaluating on BPF kernel selftests and remove_truncate_*.ll LLVM test cases this revisions performs slightly better than BPFMIPeepholeTruncElim, see "Impact" section below for details. Commit also adds a few test cases to make sure that patterns in question are handled. Long description ================ Why this works: Pattern #1 -------------------------- Consider the following example: define i1 @foo(ptr %p) { entry: %a = load i8, ptr %p, align 1 %cond = icmp eq i8 %a, 0 ret i1 %cond } Log for `llc -mcpu=v2 -mtriple=bpfel -debug-only=isel` command: ... Type-legalized selection DAG: %bb.0 'foo:entry' SelectionDAG has 13 nodes: t0: ch,glue = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %0 t16: i64,ch = load<(load (s8) from %ir.p), anyext from i8> t0, t2, undef:i64 t19: i64 = and t16, Constant:i64<255> t17: i64 = setcc t19, Constant:i64<0>, seteq:ch t11: ch,glue = CopyToReg t0, Register:i64 $r0, t17 t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1 ... Replacing.1 t19: i64 = and t16, Constant:i64<255> With: t16: i64,ch = load<(load (s8) from %ir.p), anyext from i8> t0, t2, undef:i64 and 0 other values ... Optimized type-legalized selection DAG: %bb.0 'foo:entry' SelectionDAG has 11 nodes: t0: ch,glue = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %0 t20: i64,ch = load<(load (s8) from %ir.p), zext from i8> t0, t2, undef:i64 t17: i64 = setcc t20, Constant:i64<0>, seteq:ch t11: ch,glue = CopyToReg t0, Register:i64 $r0, t17 t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1 ... Note: - Optimized type-legalized selection DAG: - `t19 = and t16, 255` had been replaced by `t16` (load). - Patterns like `(and (load ... i8), 255)` are replaced by `load` in `DAGCombiner::BackwardsPropagateMask` called from `DAGCombiner::visitAND`. - Similarly patterns like `(shl (srl ..., 56), 56)` are replaced by `(and ..., 255)` in `DAGCombiner::visitSRL` (this function is huge, look for `TLI.shouldFoldConstantShiftPairToMask()` call). Why this works: Pattern #2 -------------------------- Consider the following example: define i1 @foo(ptr %p) { entry: %a = load i8, ptr %p, align 1 br label %next next: %cond = icmp eq i8 %a, 0 ret i1 %cond } Consider log for `llc -mcpu=v2 -mtriple=bpfel -debug-only=isel` command. Log for first basic block: Initial selection DAG: %bb.0 'foo:entry' SelectionDAG has 9 nodes: t0: ch,glue = EntryToken t3: i64 = Constant<0> t2: i64,ch = CopyFromReg t0, Register:i64 %1 t5: i8,ch = load<(load (s8) from %ir.p)> t0, t2, undef:i64 t6: i64 = zero_extend t5 t8: ch = CopyToReg t0, Register:i64 %0, t6 ... Replacing.1 t6: i64 = zero_extend t5 With: t9: i64,ch = load<(load (s8) from %ir.p), zext from i8> t0, t2, undef:i64 and 0 other values ... Optimized lowered selection DAG: %bb.0 'foo:entry' SelectionDAG has 7 nodes: t0: ch,glue = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %1 t9: i64,ch = load<(load (s8) from %ir.p), zext from i8> t0, t2, undef:i64 t8: ch = CopyToReg t0, Register:i64 %0, t9 Note: - Initial selection DAG: - `%a = load ...` is lowered as `t6 = (zero_extend (load ...))` w/o special `isZExtFree()` overload added by this commit it is instead lowered as `t6 = (any_extend (load ...))`. - The decision to generate `zero_extend` or `any_extend` is done in `RegsForValue::getCopyToRegs` called from `SelectionDAGBuilder::CopyValueToVirtualRegister`: - if `isZExtFree()` for load returns true `zero_extend` is used; - `any_extend` is used otherwise. - Optimized lowered selection DAG: - `t6 = (any_extend (load ...))` is replaced by `t9 = load ..., zext from i8` This is done by `DagCombiner.cpp:tryToFoldExtOfLoad()` called from `DAGCombiner::visitZERO_EXTEND`. Log for second basic block: Initial selection DAG: %bb.1 'foo:next' SelectionDAG has 13 nodes: t0: ch,glue = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %0 t4: i64 = AssertZext t2, ValueType:ch:i8 t5: i8 = truncate t4 t8: i1 = setcc t5, Constant:i8<0>, seteq:ch t9: i64 = any_extend t8 t11: ch,glue = CopyToReg t0, Register:i64 $r0, t9 t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1 ... Replacing.2 t18: i64 = and t4, Constant:i64<255> With: t4: i64 = AssertZext t2, ValueType:ch:i8 ... Type-legalized selection DAG: %bb.1 'foo:next' SelectionDAG has 13 nodes: t0: ch,glue = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %0 t4: i64 = AssertZext t2, ValueType:ch:i8 t18: i64 = and t4, Constant:i64<255> t16: i64 = setcc t18, Constant:i64<0>, seteq:ch t11: ch,glue = CopyToReg t0, Register:i64 $r0, t16 t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1 ... Optimized type-legalized selection DAG: %bb.1 'foo:next' SelectionDAG has 11 nodes: t0: ch,glue = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %0 t4: i64 = AssertZext t2, ValueType:ch:i8 t16: i64 = setcc t4, Constant:i64<0>, seteq:ch t11: ch,glue = CopyToReg t0, Register:i64 $r0, t16 t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1 ... Note: - Initial selection DAG: - `t0` is an input value for this basic block, it corresponds load instruction (`t9`) from the first basic block. - It is accessed within basic block via `t4` (AssertZext (CopyFromReg t0, ...)). - The `AssertZext` is generated by RegsForValue::getCopyFromRegs called from SelectionDAGBuilder::getCopyFromRegs, it is generated only when `LiveOutInfo` with known number of leading zeros is present for `t0`. - Known register bits in `LiveOutInfo` are computed by `SelectionDAG::computeKnownBits` called from `SelectionDAGISel::ComputeLiveOutVRegInfo`. - `computeKnownBits()` generates leading zeros information for `(load ..., zext from ...)` but *does not* generate leading zeros information for `(load ..., anyext from ...)`. This is why `isZExtFree()` added in this commit is important. - Type-legalized selection DAG: - `t5 = truncate t4` is replaced by `t18 = and t4, 255` - Optimized type-legalized selection DAG: - `t18 = and t4, 255` is replaced by `t4`, this is done by `DAGCombiner::SimplifyDemandedBits` called from `DAGCombiner::visitAND`, which simplifies patterns like `(and (assertzext ...))` Impact ------ This change covers all remove_truncate_*.ll test cases: - for -mcpu=v4 there are no changes in the generated code; - for -mcpu=v2 code generated for remove_truncate_7 and remove_truncate_8 improved slightly, for other tests it is unchanged. For remove_truncate_7: Before this revision After this revision -------------------- ------------------- r1 <<= 0x20 r1 <<= 0x20 r1 >>= 0x20 r1 >>= 0x20 if r1 == 0x0 goto +0x2 <LBB0_2> if r1 == 0x0 goto +0x2 <LBB0_2> r1 = *(u32 *)(r2 + 0x0) r0 = *(u32 *)(r2 + 0x0) goto +0x1 <LBB0_3> goto +0x1 <LBB0_3> <LBB0_2>: <LBB0_2>: r1 = *(u32 *)(r2 + 0x4) r0 = *(u32 *)(r2 + 0x4) <LBB0_3>: <LBB0_3>: r0 = r1 exit exit For remove_truncate_8: Before this revision After this revision -------------------- ------------------- r2 = *(u32 *)(r1 + 0x0) r2 = *(u32 *)(r1 + 0x0) r3 = r2 r3 = r2 r3 <<= 0x20 r3 <<= 0x20 r4 = r3 r3 s>>= 0x20 r4 s>>= 0x20 if r4 s> 0x2 goto +0x5 <LBB0_3> if r3 s> 0x2 goto +0x4 <LBB0_3> r4 = *(u32 *)(r1 + 0x4) r3 = *(u32 *)(r1 + 0x4) r3 >>= 0x20 if r3 >= r4 goto +0x2 <LBB0_3> if r2 >= r3 goto +0x2 <LBB0_3> r2 += 0x2 r2 += 0x2 *(u32 *)(r1 + 0x0) = r2 *(u32 *)(r1 + 0x0) = r2 <LBB0_3>: <LBB0_3>: r0 = 0x3 r0 = 0x3 exit exit For kernel BPF selftests statistics is as follows: (-mcpu=v4): - For -mcpu=v4: 9 out of 655 object files have differences, in all cases total number of instructions marginally decreased (-27 instructions). - For -mcpu=v2: 9 out of 655 object files have differences: - For 19 object files number of instruction decreased (-129 instruction in total): some redundant `rX &= 0xffff` and register to register assignments removed; - For 2 object files number of instructions increased +2 instructions in each file. Both -mcpu=v2 instruction increases could be reduced to the same example: define void @foo(ptr %p) { entry: %a = load i32, ptr %p, align 4 %b = sext i32 %a to i64 %c = icmp ult i64 1, %b br i1 %c, label %next, label %end next: call void inttoptr (i64 62 to ptr)(i32 %a) br label %end end: ret void } Note that this example uses value loaded to `%a` both as a sign extended (`%b`) and as zero extended (`%a` passed as parameter). Here is the difference in final assembly code: Before this revision After this revision -------------------- ------------------- r1 = *(u32 *)(r1 + 0) r1 = *(u32 *)(r1 + 0) r1 <<= 32 r1 <<= 32 r1 s>>= 32 r1 s>>= 32 if r1 < 2 goto <LBB0_2> if r1 < 2 goto <LBB0_2> r1 <<= 32 r1 >>= 32 call 62 call 62 <LBB0_2>: <LBB0_2>: exit exit Before this commit `%a` is passed to call as a sign extended value, after this commit `%a` is passed to call as a zero extended value, both are correct as 32-bit sub-register is the same. The difference comes from `DAGCombiner` operation on the initial DAG: Initial selection DAG before this commit: t5: i32,ch = load<(load (s32) from %ir.p)> t0, t2, undef:i64 t6: i64 = any_extend t5 <--------------------- (1) t8: ch = CopyToReg t0, Register:i64 %0, t6 t9: i64 = sign_extend t5 t12: i1 = setcc Constant:i64<1>, t9, setult:ch Initial selection DAG after this commit: t5: i32,ch = load<(load (s32) from %ir.p)> t0, t2, undef:i64 t6: i64 = zero_extend t5 <--------------------- (2) t8: ch = CopyToReg t0, Register:i64 %0, t6 t9: i64 = sign_extend t5 t12: i1 = setcc Constant:i64<1>, t9, setult:ch The node `t9` is processed before node `t6` and `load` instruction is combined to load with sign extension: Replacing.1 t9: i64 = sign_extend t5 With: t30: i64,ch = load<(load (s32) from %ir.p), sext from i32> t0, t2, undef:i64 and 0 other values Replacing.1 t5: i32,ch = load<(load (s32) from %ir.p)> t0, t2, undef:i64 With: t31: i32 = truncate t30 and 1 other values This is done by `DAGCombiner.cpp:tryToFoldExtOfLoad` called from `DAGCombiner::visitSIGN_EXTEND`. Note that `t5` is used by `t6` which is `any_extend` in (1) and `zero_extend` in (2). `tryToFoldExtOfLoad()` rewrites such uses of `t5` differently: - `any_extend` is simply removed - `zero_extend` is replaced by `and t30, 0xffffffff`, which is later converted to a pair of shifts. This pair of shifts survives till the end of translation. Differential Revision: https://reviews.llvm.org/D157870

This reverts commit 0e63f1a. clang-format started to crash with contents like: a.h: ``` ``` $ clang-format a.h ``` PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace. Stack dump: 0. Program arguments: ../llvm/build/bin/clang-format a.h #0 0x0000560b689fe177 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /usr/local/google/home/kadircet/repos/llvm/llvm/lib/Support/Unix/Signals.inc:723:13 #1 0x0000560b689fbfbe llvm::sys::RunSignalHandlers() /usr/local/google/home/kadircet/repos/llvm/llvm/lib/Support/Signals.cpp:106:18 #2 0x0000560b689feaca SignalHandler(int) /usr/local/google/home/kadircet/repos/llvm/llvm/lib/Support/Unix/Signals.inc:413:1 #3 0x00007f030405a540 (/lib/x86_64-linux-gnu/libc.so.6+0x3c540) #4 0x0000560b68a9a980 is /usr/local/google/home/kadircet/repos/llvm/clang/include/clang/Lex/Token.h:98:44 #5 0x0000560b68a9a980 is /usr/local/google/home/kadircet/repos/llvm/clang/lib/Format/FormatToken.h:562:51 #6 0x0000560b68a9a980 startsSequenceInternal<clang::tok::TokenKind, clang::tok::TokenKind> /usr/local/google/home/kadircet/repos/llvm/clang/lib/Format/FormatToken.h:831:9 #7 0x0000560b68a9a980 startsSequence<clang::tok::TokenKind, clang::tok::TokenKind> /usr/local/google/home/kadircet/repos/llvm/clang/lib/Format/FormatToken.h:600:12 #8 0x0000560b68a9a980 getFunctionName /usr/local/google/home/kadircet/repos/llvm/clang/lib/Format/TokenAnnotator.cpp:3131:17 #9 0x0000560b68a9a980 clang::format::TokenAnnotator::annotate(clang::format::AnnotatedLine&) /usr/local/google/home/kadircet/repos/llvm/clang/lib/Format/TokenAnnotator.cpp:3191:17 Segmentation fault ```

The new ACLE PR#225[1] now combines the slice parameters for some builtins. This patch is the #2 of 3 patches to update the interface. Slice specifies the ZA slice number directly and needs to be explicity implemented by the "user" with the base register plus the immediate offset [1]https://github.com/ARM-software/acle/pull/225/files

…tePluginObject After llvm/llvm-project#68052 this function changed from returning a nullptr with `return {};` to returning Expected and hitting `llvm_unreachable` before it could do so. I gather that we're never supposed to call this function, but on Windows we actually do call this function because `interpreter->CreateScriptedProcessInterface()` returns `ScriptedProcessInterface` not `ScriptedProcessPythonInterface`. Likely because `target_sp->GetDebugger().GetScriptInterpreter()` also does not return a Python related class. The previously XFAILed test crashed with: ``` # .---command stderr------------ # | PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace. # | Stack dump: # | 0. Program arguments: c:\\users\\tcwg\\david.spickett\\build-llvm\\bin\\lldb-test.exe ir-memory-map C:\\Users\\tcwg\\david.spickett\\build-llvm\\tools\\lldb\\test\\Shell\\Expr\\Output\\TestIRMemoryMapWindows.test.tmp C:\\Users\\tcwg\\david.spickett\\llvm-project\\lldb\\test\\Shell\\Expr/Inputs/ir-memory-map-basic # | 1. HandleCommand(command = "run") # | Exception Code: 0xC000001D # | #0 0x00007ff696b5f588 lldb_private::ScriptedProcessInterface::CreatePluginObject(class llvm::StringRef, class lldb_private::ExecutionContext &, class std::shared_ptr<class lldb_private::StructuredData::Dictionary>, class lldb_private::StructuredData::Generic *) C:\Users\tcwg\david.spickett\llvm-project\lldb\include\lldb\Interpreter\Interfaces\ScriptedProcessInterface.h:28:0 # | #1 0x00007ff696b1d808 llvm::Expected<std::shared_ptr<lldb_private::StructuredData::Generic> >::operator bool C:\Users\tcwg\david.spickett\llvm-project\llvm\include\llvm\Support\Error.h:567:0 # | #2 0x00007ff696b1d808 lldb_private::ScriptedProcess::ScriptedProcess(class std::shared_ptr<class lldb_private::Target>, class std::shared_ptr<class lldb_private::Listener>, class lldb_private::ScriptedMetadata const &, class lldb_private::Status &) C:\Users\tcwg\david.spickett\llvm-project\lldb\source\Plugins\Process\scripted\ScriptedProcess.cpp:115:0 # | #3 0x00007ff696b1d124 std::shared_ptr<lldb_private::ScriptedProcess>::shared_ptr C:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.35.32124\include\memory:1478:0 # | #4 0x00007ff696b1d124 lldb_private::ScriptedProcess::CreateInstance(class std::shared_ptr<class lldb_private::Target>, class std::shared_ptr<class lldb_private::Listener>, class lldb_private::FileSpec const *, bool) C:\Users\tcwg\david.spickett\llvm-project\lldb\source\Plugins\Process\scripted\ScriptedProcess.cpp:61:0 # | #5 0x00007ff69699c8f4 std::_Ptr_base<lldb_private::Process>::_Move_construct_from C:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.35.32124\include\memory:1237:0 # | #6 0x00007ff69699c8f4 std::shared_ptr<lldb_private::Process>::shared_ptr C:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.35.32124\include\memory:1534:0 # | #7 0x00007ff69699c8f4 std::shared_ptr<lldb_private::Process>::operator= C:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.35.32124\include\memory:1594:0 # | #8 0x00007ff69699c8f4 lldb_private::Process::FindPlugin(class std::shared_ptr<class lldb_private::Target>, class llvm::StringRef, class std::shared_ptr<class lldb_private::Listener>, class lldb_private::FileSpec const *, bool) C:\Users\tcwg\david.spickett\llvm-project\lldb\source\Target\Process.cpp:396:0 # | #9 0x00007ff6969bd708 std::_Ptr_base<lldb_private::Process>::_Move_construct_from C:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.35.32124\include\memory:1237:0 # | #10 0x00007ff6969bd708 std::shared_ptr<lldb_private::Process>::shared_ptr C:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.35.32124\include\memory:1534:0 # | #11 0x00007ff6969bd708 std::shared_ptr<lldb_private::Process>::operator= C:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.35.32124\include\memory:1594:0 # | #12 0x00007ff6969bd708 lldb_private::Target::CreateProcess(class std::shared_ptr<class lldb_private::Listener>, class llvm::StringRef, class lldb_private::FileSpec const *, bool) C:\Users\tcwg\david.spickett\llvm-project\lldb\source\Target\Target.cpp:215:0 # | #13 0x00007ff696b13af0 std::_Ptr_base<lldb_private::Process>::_Ptr_base C:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.35.32124\include\memory:1230:0 # | #14 0x00007ff696b13af0 std::shared_ptr<lldb_private::Process>::shared_ptr C:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.35.32124\include\memory:1524:0 # | #15 0x00007ff696b13af0 lldb_private::PlatformWindows::DebugProcess(class lldb_private::ProcessLaunchInfo &, class lldb_private::Debugger &, class lldb_private::Target &, class lldb_private::Status &) C:\Users\tcwg\david.spickett\llvm-project\lldb\source\Plugins\Platform\Windows\PlatformWindows.cpp:495:0 # | #16 0x00007ff6969cf590 std::_Ptr_base<lldb_private::Process>::_Move_construct_from C:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.35.32124\include\memory:1237:0 # | #17 0x00007ff6969cf590 std::shared_ptr<lldb_private::Process>::shared_ptr C:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.35.32124\include\memory:1534:0 # | #18 0x00007ff6969cf590 std::shared_ptr<lldb_private::Process>::operator= C:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.35.32124\include\memory:1594:0 # | #19 0x00007ff6969cf590 lldb_private::Target::Launch(class lldb_private::ProcessLaunchInfo &, class lldb_private::Stream *) C:\Users\tcwg\david.spickett\llvm-project\lldb\source\Target\Target.cpp:3274:0 # | #20 0x00007ff696fff82c CommandObjectProcessLaunch::DoExecute(class lldb_private::Args &, class lldb_private::CommandReturnObject &) C:\Users\tcwg\david.spickett\llvm-project\lldb\source\Commands\CommandObjectProcess.cpp:258:0 # | #21 0x00007ff696fab6c0 lldb_private::CommandObjectParsed::Execute(char const *, class lldb_private::CommandReturnObject &) C:\Users\tcwg\david.spickett\llvm-project\lldb\source\Interpreter\CommandObject.cpp:751:0 # `----------------------------- # error: command failed with exit status: 0xc000001d ``` That might be a bug on the Windows side, or an artifact of how our build is setup, but whatever it is, having `CreatePluginObject` return an error and the caller check it, fixes the failing test. The built lldb can run the script command to use Python, but I'm not sure if that means anything.

…e defintion if available (#71004)" This reverts commit ef3feba. This caused an LLDB test failure on Linux for `lang/cpp/symbols/TestSymbols.test_dwo`: ``` make: Leaving directory '/home/worker/2.0.1/lldb-x86_64-debian/build/lldb-test-build.noindex/lang/cpp/symbols/TestSymbols.test_dwo' runCmd: expression -- D::i PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace. Stack dump: 0. HandleCommand(command = "expression -- D::i") 1. <user expression 0>:1:4: current parser token 'i' 2. <lldb wrapper prefix>:44:1: parsing function body '$__lldb_expr' 3. <lldb wrapper prefix>:44:1: in compound statement ('{}') Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it): 0 _lldb.cpython-39-x86_64-linux-gnu.so 0x00007fbcfcb08b87 1 _lldb.cpython-39-x86_64-linux-gnu.so 0x00007fbcfcb067ae 2 _lldb.cpython-39-x86_64-linux-gnu.so 0x00007fbcfcb0923f 3 libpthread.so.0 0x00007fbd07ab7140 ``` And a failure in `TestCallStdStringFunction.py` on Linux aarch64: ``` -- Exit Code: -11 Command Output (stdout): -- lldb version 18.0.0git (https://github.com/llvm/llvm-project.git revision ef3feba) clang revision ef3feba llvm revision ef3feba -- Command Output (stderr): -- PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace. Stack dump: 0. HandleCommand(command = "expression str") 1. <lldb wrapper prefix>:45:34: current parser token ';' 2. <lldb wrapper prefix>:44:1: parsing function body '$__lldb_expr' 3. <lldb wrapper prefix>:44:1: in compound statement ('{}') #0 0x0000ffffb72a149c llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_[lldb.cpython-38-aarch64-linux-gnu.so](http://lldb.cpython-38-aarch64-linux-gnu.so/)+0x58c749c) #1 0x0000ffffb729f458 llvm::sys::RunSignalHandlers() (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_[lldb.cpython-38-aarch64-linux-gnu.so](http://lldb.cpython-38-aarch64-linux-gnu.so/)+0x58c5458) #2 0x0000ffffb72a1bd0 SignalHandler(int) (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_[lldb.cpython-38-aarch64-linux-gnu.so](http://lldb.cpython-38-aarch64-linux-gnu.so/)+0x58c7bd0) #3 0x0000ffffbdd9e7dc (linux-vdso.so.1+0x7dc) #4 0x0000ffffb71799d8 lldb_private::plugin::dwarf::SymbolFileDWARF::FindGlobalVariables(lldb_private::ConstString, lldb_private::CompilerDeclContext const&, unsigned int, lldb_private::VariableList&) (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_[lldb.cpython-38-aarch64-linux-gnu.so](http://lldb.cpython-38-aarch64-linux-gnu.so/)+0x579f9d8) #5 0x0000ffffb7197508 DWARFASTParserClang::FindConstantOnVariableDefinition(lldb_private::plugin::dwarf::DWARFDIE) (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_[lldb.cpython-38-aarch64-linux-gnu.so](http://lldb.cpython-38-aarch64-linux-gnu.so/)+0x57bd508) ```

The const.cpp testcase fails when running in MSVC mode, while it does succeed in MinGW mode. In MSVC mode, there are more constructor invocations than expected, as the printout looks like this: A(1), this = 0000025597930000 A(1), this = 0000025597930000 f: this = 0000025597930000, val = 1 A(1), this = 0000025597930000 f: this = 0000025597930000, val = 1 ~A, this = 0000025597930000, val = 1 ~A, this = 0000025597930000, val = 1 ~A, this = 0000025597930000, val = 1 While the expected printout looks like this: A(1), this = 000002C903E10000 f: this = 000002C903E10000, val = 1 f: this = 000002C903E10000, val = 1 ~A, this = 000002C903E10000, val = 1 Reapplying #70991 with the XFAIL changed to check the host triple, not the target triple. On an MSVC based build of Clang, but with the default target triple set to PS4/PS5, we will still see the failure. And a Linux based build of Clang that targets PS4/PS5 won't see the issue.

…ooking options for a custom subcommand (#71975) …ooking options for a custom subcommand. (#71776)" This reverts commit b88308b. The build-bot is unhappy (https://lab.llvm.org/buildbot/#/builders/186/builds/13096), `GroupingAndPrefix` fails after `TopLevelOptInSubcommand` (the newly added test). Revert while I look into this (might be related with test sharding but not sure) ``` [----------] 3 tests from CommandLineTest [ RUN ] CommandLineTest.TokenizeWindowsCommandLine2 [ OK ] CommandLineTest.TokenizeWindowsCommandLine2 (0 ms) [ RUN ] CommandLineTest.TopLevelOptInSubcommand [ OK ] CommandLineTest.TopLevelOptInSubcommand (0 ms) [ RUN ] CommandLineTest.GroupingAndPrefix #0 0x00ba8118 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/tcwg-buildbot/worker/clang-armv7-global-isel/stage1/unittests/Support/./SupportTests+0x594118) #1 0x00ba5914 llvm::sys::RunSignalHandlers() (/home/tcwg-buildbot/worker/clang-armv7-global-isel/stage1/unittests/Support/./SupportTests+0x591914) #2 0x00ba89c4 SignalHandler(int) (/home/tcwg-buildbot/worker/clang-armv7-global-isel/stage1/unittests/Support/./SupportTests+0x5949c4) #3 0xf7828530 __default_sa_restorer /build/glibc-9MGTF6/glibc-2.31/signal/../sysdeps/unix/sysv/linux/arm/sigrestorer.S:67:0 #4 0x00af91f0 (anonymous namespace)::CommandLineParser::ResetAllOptionOccurrences() (/home/tcwg-buildbot/worker/clang-armv7-global-isel/stage1/unittests/Support/./SupportTests+0x4e51f0) #5 0x00af8e1c llvm::cl::ResetCommandLineParser() (/home/tcwg-buildbot/worker/clang-armv7-global-isel/stage1/unittests/Support/./SupportTests+0x4e4e1c) #6 0x0077cda0 (anonymous namespace)::CommandLineTest_GroupingAndPrefix_Test::TestBody() (/home/tcwg-buildbot/worker/clang-armv7-global-isel/stage1/unittests/Support/./SupportTests+0x168da0) #7 0x00bc5adc testing::Test::Run() (/home/tcwg-buildbot/worker/clang-armv7-global-isel/stage1/unittests/Support/./SupportTests+0x5b1adc) #8 0x00bc6cc0 testing::TestInfo::Run() (/home/tcwg-buildbot/worker/clang-armv7-global-isel/stage1/unittests/Support/./SupportTests+0x5b2cc0) #9 0x00bc7880 testing::TestSuite::Run() (/home/tcwg-buildbot/worker/clang-armv7-global-isel/stage1/unittests/Support/./SupportTests+0x5b3880) #10 0x00bd7974 testing::internal::UnitTestImpl::RunAllTests() (/home/tcwg-buildbot/worker/clang-armv7-global-isel/stage1/unittests/Support/./SupportTests+0x5c3974) #11 0x00bd6ebc testing::UnitTest::Run() (/home/tcwg-buildbot/worker/clang-armv7-global-isel/stage1/unittests/Support/./SupportTests+0x5c2ebc) #12 0x00bb1058 main (/home/tcwg-buildbot/worker/clang-armv7-global-isel/stage1/unittests/Support/./SupportTests+0x59d058) #13 0xf78185a4 __libc_start_main /build/glibc-9MGTF6/glibc-2.31/csu/libc-start.c:342:3 ```

… functions (#72069) Fixes a bug introduced by commit f95b2f1 ("Reland [InstrProf][compiler-rt] Enable MC/DC Support in LLVM Source-based Code Coverage (1/3)") The InstrProfiling pass was refactored when introducing support for MC/DC such that the creation of the data variable was abstracted and called only once per function from ::run(). Because ::run() only iterated over functions there were not fully inlined, and because it only created the data variable for the first intrinsic that it saw, data variables corresponding to functions fully inlined into other instrumented callers would end up without a data variable, resulting in loss of coverage information. This patch does the following: 1.) Move the call of createDataVariable() to getOrCreateRegionCounters() so that the creation of the data variable will happen indirectly either from ::new() or during profile intrinsic lowering when it is needed. This effectively restores the behavior prior to the refactor and ensures that all data variables are created when needed (and not duplicated). 2.) Process all MC/DC bitmap parameter intrinsics in ::run() prior to calling getOrCreateRegionCounters(). This ensures bitmap regions are created for each function including functions that are fully inlined. It also ensures that the bitmap region is created for each function prior to the creation of the data variable because it is referenced by the data variable. Again, duplication is prevented if the same parameter intrinsic is inlined into multiple functions. 3.) No longer pass the MC/DC intrinsic to createDataVariable(). This decouples the creation of the data variable from a specific MC/DC intrinsic. Instead, with #2 above, store the number of bitmap bytes required in the PerFunctionProfileData in the ProfileDataMap along with the function's CounterRegion and BitmapRegion variables. This ties the bitmap information directly to the function to which it belongs, and the data variable created for that function can reference that.

…(#73463) Despite CWG2497 not being resolved, it is reasonable to expect the following code to compile (and which is supported by other compilers) ```cpp template<typename T> constexpr T f(); constexpr int g() { return f<int>(); } // #1 template<typename T> constexpr T f() { return 123; } int k[g()]; // #2 ``` To that end, we eagerly instantiate all referenced specializations of constexpr functions when they are defined. We maintain a map of (pattern, [instantiations]) independent of `PendingInstantiations` to avoid having to iterate that list after each function definition. We should apply the same logic to constexpr variables, but I wanted to keep the PR small. Fixes #73232

TestCases/Misc/Linux/sigaction.cpp fails because dlsym() may call malloc on failure. And then the wrapped malloc appears to access thread local storage using global dynamic accesses, thus calling ___interceptor___tls_get_addr, before REAL(__tls_get_addr) has been set, so we get a crash inside ___interceptor___tls_get_addr. For example, this can happen when looking up __isoc23_scanf which might not exist in some libcs. Fix this by marking the thread local variable accessed inside the debug checks as "initial-exec", which does not require __tls_get_addr. This is probably a better alternative to llvm/llvm-project#83886. This fixes a different crash but is related to llvm/llvm-project#46204. Backtrace: ``` #0 0x0000000000000000 in ?? () #1 0x00007ffff6a9d89e in ___interceptor___tls_get_addr (arg=0x7ffff6b27be8) at /path/to/llvm/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:2759 #2 0x00007ffff6a46bc6 in __sanitizer::CheckedMutex::LockImpl (this=0x7ffff6b27be8, pc=140737331846066) at /path/to/llvm/compiler-rt/lib/sanitizer_common/sanitizer_mutex.cpp:218 #3 0x00007ffff6a448b2 in __sanitizer::CheckedMutex::Lock (this=0x7ffff6b27be8, this@entry=0x730000000580) at /path/to/llvm/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_mutex.h:129 #4 __sanitizer::Mutex::Lock (this=0x7ffff6b27be8, this@entry=0x730000000580) at /path/to/llvm/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_mutex.h:167 #5 0x00007ffff6abdbb2 in __sanitizer::GenericScopedLock<__sanitizer::Mutex>::GenericScopedLock (mu=0x730000000580, this=<optimized out>) at /path/to/llvm/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_mutex.h:383 #6 __sanitizer::SizeClassAllocator64<__tsan::AP64>::GetFromAllocator (this=0x7ffff7487dc0 <__tsan::allocator_placeholder>, stat=stat@entry=0x7ffff570db68, class_id=11, chunks=chunks@entry=0x7ffff5702cc8, n_chunks=n_chunks@entry=128) at /path/to/llvm/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_allocator_primary64.h:207 #7 0x00007ffff6abdaa0 in __sanitizer::SizeClassAllocator64LocalCache<__sanitizer::SizeClassAllocator64<__tsan::AP64> >::Refill (this=<optimized out>, c=c@entry=0x7ffff5702cb8, allocator=<optimized out>, class_id=<optimized out>) at /path/to/llvm/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_allocator_local_cache.h:103 #8 0x00007ffff6abd731 in __sanitizer::SizeClassAllocator64LocalCache<__sanitizer::SizeClassAllocator64<__tsan::AP64> >::Allocate (this=0x7ffff6b27be8, allocator=0x7ffff5702cc8, class_id=140737311157448) at /path/to/llvm/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_allocator_local_cache.h:39 #9 0x00007ffff6abc397 in __sanitizer::CombinedAllocator<__sanitizer::SizeClassAllocator64<__tsan::AP64>, __sanitizer::LargeMmapAllocatorPtrArrayDynamic>::Allocate (this=0x7ffff5702cc8, cache=0x7ffff6b27be8, size=<optimized out>, size@entry=175, alignment=alignment@entry=16) at /path/to/llvm/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_allocator_combined.h:69 #10 0x00007ffff6abaa6a in __tsan::user_alloc_internal (thr=0x7ffff7ebd980, pc=140737331499943, sz=sz@entry=175, align=align@entry=16, signal=true) at /path/to/llvm/compiler-rt/lib/tsan/rtl/tsan_mman.cpp:198 #11 0x00007ffff6abb0d1 in __tsan::user_alloc (thr=0x7ffff6b27be8, pc=140737331846066, sz=11, sz@entry=175) at /path/to/llvm/compiler-rt/lib/tsan/rtl/tsan_mman.cpp:223 #12 0x00007ffff6a693b5 in ___interceptor_malloc (size=175) at /path/to/llvm/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:666 #13 0x00007ffff7fce7f2 in malloc (size=175) at ../include/rtld-malloc.h:56 #14 __GI__dl_exception_create_format (exception=exception@entry=0x7fffffffd0d0, objname=0x7ffff7fc3550 "/path/to/llvm/compiler-rt/cmake-build-all-sanitizers/lib/linux/libclang_rt.tsan-x86_64.so", fmt=fmt@entry=0x7ffff7ff2db9 "undefined symbol: %s%s%s") at ./elf/dl-exception.c:157 #15 0x00007ffff7fd50e8 in _dl_lookup_symbol_x (undef_name=0x7ffff6af868b "__isoc23_scanf", undef_map=<optimized out>, ref=0x7fffffffd148, symbol_scope=<optimized out>, version=<optimized out>, type_class=0, flags=2, skip_map=0x7ffff7fc35e0) at ./elf/dl-lookup.c:793 --Type <RET> for more, q to quit, c to continue without paging-- #16 0x00007ffff656d6ed in do_sym (handle=<optimized out>, name=0x7ffff6af868b "__isoc23_scanf", who=0x7ffff6a3bb84 <__interception::InterceptFunction(char const*, unsigned long*, unsigned long, unsigned long)+36>, vers=vers@entry=0x0, flags=flags@entry=2) at ./elf/dl-sym.c:146 #17 0x00007ffff656d9dd in _dl_sym (handle=<optimized out>, name=<optimized out>, who=<optimized out>) at ./elf/dl-sym.c:195 #18 0x00007ffff64a2854 in dlsym_doit (a=a@entry=0x7fffffffd3b0) at ./dlfcn/dlsym.c:40 #19 0x00007ffff7fcc489 in __GI__dl_catch_exception (exception=exception@entry=0x7fffffffd310, operate=0x7ffff64a2840 <dlsym_doit>, args=0x7fffffffd3b0) at ./elf/dl-catch.c:237 #20 0x00007ffff7fcc5af in _dl_catch_error (objname=0x7fffffffd368, errstring=0x7fffffffd370, mallocedp=0x7fffffffd367, operate=<optimized out>, args=<optimized out>) at ./elf/dl-catch.c:256 #21 0x00007ffff64a2257 in _dlerror_run (operate=operate@entry=0x7ffff64a2840 <dlsym_doit>, args=args@entry=0x7fffffffd3b0) at ./dlfcn/dlerror.c:138 #22 0x00007ffff64a28e5 in dlsym_implementation (dl_caller=<optimized out>, name=<optimized out>, handle=<optimized out>) at ./dlfcn/dlsym.c:54 #23 ___dlsym (handle=<optimized out>, name=<optimized out>) at ./dlfcn/dlsym.c:68 #24 0x00007ffff6a3bb84 in __interception::GetFuncAddr (name=0x7ffff6af868b "__isoc23_scanf", trampoline=140737311157448) at /path/to/llvm/compiler-rt/lib/interception/interception_linux.cpp:42 #25 __interception::InterceptFunction (name=0x7ffff6af868b "__isoc23_scanf", ptr_to_real=0x7ffff74850e8 <__interception::real___isoc23_scanf>, func=11, trampoline=140737311157448) at /path/to/llvm/compiler-rt/lib/interception/interception_linux.cpp:61 #26 0x00007ffff6a9f2d9 in InitializeCommonInterceptors () at /path/to/llvm/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_common_interceptors.inc:10315 ``` Reviewed By: vitalybuka, MaskRay Pull Request: llvm/llvm-project#83890

This reverts commit daebe5c. This commit causes the following asan issue: ``` <snip>/llvm-project/build/bin/mlir-opt <snip>/llvm-project/mlir/test/Dialect/XeGPU/XeGPUOps.mlir | <snip>/llvm-project/build/bin/FileCheck <snip>/llvm-project/mlir/test/Dialect/XeGPU/XeGPUOps.mlir # executed command: <snip>/llvm-project/build/bin/mlir-opt <snip>/llvm-project/mlir/test/Dialect/XeGPU/XeGPUOps.mlir # .---command stderr------------ # | ================================================================= # | ==2772558==ERROR: AddressSanitizer: stack-use-after-return on address 0x7fd2c2c42b90 at pc 0x55e406d54614 bp 0x7ffc810e4070 sp 0x7ffc810e4068 # | READ of size 8 at 0x7fd2c2c42b90 thread T0 # | #0 0x55e406d54613 in operator()<long int const*> /usr/include/c++/13/bits/predefined_ops.h:318 # | #1 0x55e406d54613 in __count_if<long int const*, __gnu_cxx::__ops::_Iter_pred<mlir::verifyListOfOperandsOrIntegers(Operation*, llvm::StringRef, unsigned int, llvm::ArrayRef<long int>, ValueRange)::<lambda(int64_t)> > > /usr/include/c++/13/bits/stl_algobase.h:2125 # | #2 0x55e406d54613 in count_if<long int const*, mlir::verifyListOfOperandsOrIntegers(Operation*, ... ```

…oint. (#83821)" This reverts commit c2c1e6e. It creates a use after free. ==8342==ERROR: AddressSanitizer: heap-use-after-free on address 0x50f000001760 at pc 0x55b9fb84a8fb bp 0x7ffc18468a10 sp 0x7ffc18468a08 READ of size 1 at 0x50f000001760 thread T0 #0 0x55b9fb84a8fa in dropPoisonGeneratingFlags llvm/lib/Transforms/Vectorize/VPlan.h:1040:13 #1 0x55b9fb84a8fa in llvm::VPlanTransforms::dropPoisonGeneratingRecipes(llvm::VPlan&, llvm::function_ref<bool (llvm::BasicBlock*)>)::$_0::operator()(llvm::VPRecipeBase*) const llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:1236:23 #2 0x55b9fb84a196 in llvm::VPlanTransforms::dropPoisonGeneratingRecipes(llvm::VPlan&, llvm::function_ref<bool (llvm::BasicBlock*)>) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp Can be reproduced with asan on Transforms/LoopVectorize/AArch64/sve-interleaved-masked-accesses.ll Transforms/LoopVectorize/X86/pr81872.ll Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll

…e exception specification of a function (#90760) [temp.deduct.general] p6 states: > At certain points in the template argument deduction process it is necessary to take a function type that makes use of template parameters and replace those template parameters with the corresponding template arguments. This is done at the beginning of template argument deduction when any explicitly specified template arguments are substituted into the function type, and again at the end of template argument deduction when any template arguments that were deduced or obtained from default arguments are substituted. [temp.deduct.general] p7 goes on to say: > The _deduction substitution loci_ are > - the function type outside of the _noexcept-specifier_, > - the explicit-specifier, > - the template parameter declarations, and > - the template argument list of a partial specialization > > The substitution occurs in all types and expressions that are used in the deduction substitution loci. [...] Consider the following: ```cpp struct A { static constexpr bool x = true; }; template<typename T, typename U> void f(T, U) noexcept(T::x); // #1 template<typename T, typename U> void f(T, U*) noexcept(T::y); // #2 template<> void f<A>(A, int*) noexcept; // clang currently accepts, GCC and EDG reject ``` Currently, `Sema::SubstituteExplicitTemplateArguments` will substitute into the _noexcept-specifier_ when deducing template arguments from a function declaration or when deducing template arguments for taking the address of a function template (and the substitution is treated as a SFINAE context). In the above example, `#1` is selected as the primary template because substitution of the explicit template arguments into the _noexcept-specifier_ of `#2` failed, which resulted in the candidate being ignored. This behavior is incorrect ([temp.deduct.general] note 4 says as much), and this patch corrects it by deferring all substitution into the _noexcept-specifier_ until it is instantiated. As part of the necessary changes to make this patch work, the instantiation of the exception specification of a function template specialization when taking the address of a function template is changed to only occur for the function selected by overload resolution per [except.spec] p13.1 (as opposed to being instantiated for every candidate).

…ined member functions & member function templates (#88963) Consider the following snippet from the discussion of CWG2847 on the core reflector: ``` template<typename T> concept C = sizeof(T) <= sizeof(long); template<typename T> struct A { template<typename U> void f(U) requires C<U>; // #1, declares a function template void g() requires C<T>; // #2, declares a function template<> void f(char); // #3, an explicit specialization of a function template that declares a function }; template<> template<typename U> void A<short>::f(U) requires C<U>; // #4, an explicit specialization of a function template that declares a function template template<> template<> void A<int>::f(int); // #5, an explicit specialization of a function template that declares a function template<> void A<long>::g(); // #6, an explicit specialization of a function that declares a function ``` A number of problems exist: - Clang rejects `#4` because the trailing _requires-clause_ has `U` substituted with the wrong template parameter depth when `Sema::AreConstraintExpressionsEqual` is called to determine whether it matches the trailing _requires-clause_ of the implicitly instantiated function template. - Clang rejects `#5` because the function template specialization instantiated from `A<int>::f` has a trailing _requires-clause_, but `#5` does not (nor can it have one as it isn't a templated function). - Clang rejects `#6` for the same reasons it rejects `#5`. This patch resolves these issues by making the following changes: - To fix `#4`, `Sema::AreConstraintExpressionsEqual` is passed `FunctionTemplateDecl`s when comparing the trailing _requires-clauses_ of `#4` and the function template instantiated from `#1`. - To fix `#5` and `#6`, the trailing _requires-clauses_ are not compared for explicit specializations that declare functions. In addition to these changes, `CheckMemberSpecialization` now considers constraint satisfaction/constraint partial ordering when determining which member function is specialized by an explicit specialization of a member function for an implicit instantiation of a class template (we previously would select the first function that has the same type as the explicit specialization). With constraints taken under consideration, we match EDG's behavior for these declarations.

...which caused issues like > ==42==ERROR: AddressSanitizer failed to deallocate 0x32 (50) bytes at address 0x117e0000 (error code: 28) > ==42==Cannot dump memory map on emscriptenAddressSanitizer: CHECK failed: sanitizer_common.cpp:81 "((0 && "unable to unmmap")) != (0)" (0x0, 0x0) (tid=288045824) > #0 0x14f73b0c in __asan::CheckUnwind()+0x14f73b0c (this.program+0x14f73b0c) > #1 0x14f8a3c2 in __sanitizer::CheckFailed(char const*, int, char const*, unsigned long long, unsigned long long)+0x14f8a3c2 (this.program+0x14f8a3c2) > #2 0x14f7d6e1 in __sanitizer::ReportMunmapFailureAndDie(void*, unsigned long, int, bool)+0x14f7d6e1 (this.program+0x14f7d6e1) > #3 0x14f81fbd in __sanitizer::UnmapOrDie(void*, unsigned long)+0x14f81fbd (this.program+0x14f81fbd) > #4 0x14f875df in __sanitizer::SuppressionContext::ParseFromFile(char const*)+0x14f875df (this.program+0x14f875df) > #5 0x14f74eab in __asan::InitializeSuppressions()+0x14f74eab (this.program+0x14f74eab) > #6 0x14f73a1a in __asan::AsanInitInternal()+0x14f73a1a (this.program+0x14f73a1a) when trying to use an ASan suppressions file under Emscripten: Even though it would be considered OK by SUSv4, the Emscripten runtime states "We don't support partial munmapping" (see <emscripten-core/emscripten@f4115eb> "Implement MAP_ANONYMOUS on top of malloc in STANDALONE_WASM mode (#16289)"). Co-authored-by: Stephan Bergmann <stephan.bergmann@allotropia.de>

…ication as used during partial ordering (#91534) We do not deduce template arguments from the exception specification when determining the primary template of a function template specialization or when taking the address of a function template. Therefore, this patch changes `isAtLeastAsSpecializedAs` such that we do not mark template parameters in the exception specification as 'used' during partial ordering (per [temp.deduct.partial] p12) to prevent the following from being ambiguous: ``` template<typename T, typename U> void f(U) noexcept(noexcept(T())); // #1 template<typename T> void f(T*) noexcept; // #2 template<> void f<int>(int*) noexcept; // currently ambiguous, selects #2 with this patch applied ``` Although there is no corresponding wording in the standard (see core issue filed here cplusplus/CWG#537), this seems to be the intended behavior given the definition of _deduction substitution loci_ in [temp.deduct.general] p7 (and EDG does the same thing).

…erSize (#67657)" This reverts commit f0b3654. This commit triggers UB by reading an uninitialized variable. `UP.PartialThreshold` is used uninitialized in `getUnrollingPreferences()` when it is called from `LoopVectorizationPlanner::executePlan()`. In this case the `UP` variable is created on the stack and its fields are not initialized. ``` ==8802==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x557c0b081b99 in llvm::BasicTTIImplBase<llvm::X86TTIImpl>::getUnrollingPreferences(llvm::Loop*, llvm::ScalarEvolution&, llvm::TargetTransformInfo::UnrollingPreferences&, llvm::OptimizationRemarkEmitter*) llvm-project/llvm/include/llvm/CodeGen/BasicTTIImpl.h #1 0x557c0b07a40c in llvm::TargetTransformInfo::Model<llvm::X86TTIImpl>::getUnrollingPreferences(llvm::Loop*, llvm::ScalarEvolution&, llvm::TargetTransformInfo::UnrollingPreferences&, llvm::OptimizationRemarkEmitter*) llvm-project/llvm/include/llvm/Analysis/TargetTransformInfo.h:2277:17 #2 0x557c0f5d69ee in llvm::TargetTransformInfo::getUnrollingPreferences(llvm::Loop*, llvm::ScalarEvolution&, llvm::TargetTransformInfo::UnrollingPreferences&, llvm::OptimizationRemarkEmitter*) const llvm-project/llvm/lib/Analysis/TargetTransformInfo.cpp:387:19 #3 0x557c0e6b96a0 in llvm::LoopVectorizationPlanner::executePlan(llvm::ElementCount, unsigned int, llvm::VPlan&, llvm::InnerLoopVectorizer&, llvm::DominatorTree*, bool, llvm::DenseMap<llvm::SCEV const*, llvm::Value*, llvm::DenseMapInfo<llvm::SCEV const*, void>, llvm::detail::DenseMapPair<llvm::SCEV const*, llvm::Value*>> const*) llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7624:7 #4 0x557c0e6e4b63 in llvm::LoopVectorizePass::processLoop(llvm::Loop*) llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:10253:13 #5 0x557c0e6f2429 in llvm::LoopVectorizePass::runImpl(llvm::Function&, llvm::ScalarEvolution&, llvm::LoopInfo&, llvm::TargetTransformInfo&, llvm::DominatorTree&, llvm::BlockFrequencyInfo*, llvm::TargetLibraryInfo*, llvm::DemandedBits&, llvm::AssumptionCache&, llvm::LoopAccessInfoManager&, llvm::OptimizationRemarkEmitter&, llvm::ProfileSummaryInfo*) llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:10344:30 #6 0x557c0e6f2f97 in llvm::LoopVectorizePass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:10383:9 [...] Uninitialized value was created by an allocation of 'UP' in the stack frame #0 0x557c0e6b961e in llvm::LoopVectorizationPlanner::executePlan(llvm::ElementCount, unsigned int, llvm::VPlan&, llvm::InnerLoopVectorizer&, llvm::DominatorTree*, bool, llvm::DenseMap<llvm::SCEV const*, llvm::Value*, llvm::DenseMapInfo<llvm::SCEV const*, void>, llvm::detail::DenseMapPair<llvm::SCEV const*, llvm::Value*>> const*) llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7623:3 ```

…0820) This solves some ambuguity introduced in P0522 regarding how template template parameters are partially ordered, and should reduce the negative impact of enabling `-frelaxed-template-template-args` by default. When performing template argument deduction, a template template parameter containing no packs should be more specialized than one that does. Given the following example: ```C++ template<class T2> struct A; template<template<class ...T3s> class TT1, class T4> struct A<TT1<T4>>; // #1 template<template<class T5 > class TT2, class T6> struct A<TT2<T6>>; // #2 template<class T1> struct B; template struct A<B<char>>; ``` Prior to P0522, candidate `#2` would be more specialized. After P0522, neither is more specialized, so this becomes ambiguous. With this change, `#2` becomes more specialized again, maintaining compatibility with pre-P0522 implementations. The problem is that in P0522, candidates are at least as specialized when matching packs to fixed-size lists both ways, whereas before, a fixed-size list is more specialized. This patch keeps the original behavior when checking template arguments outside deduction, but restores this aspect of pre-P0522 matching during deduction. --- Since this changes provisional implementation of CWG2398 which has not been released yet, and already contains a changelog entry, we don't provide a changelog entry here.

'reduction' has a few restrictions over normal 'var-list' clauses: 1- On parallel, a num_gangs can only have 1 argument when combined with reduction. These two aren't able to be combined on any other of the compute constructs however. 2- The vars all must be 'numerical data types' types of some sort, or a 'composite of numerical data types'. A list of types is given in the standard as a minimum, so we choose 'isScalar', which covers all of these types and keeps types that are actually numeric. Other compilers don't seem to implement the 'composite of numerical data types', though we do. 3- Because of the above restrictions, member-of-composite is not allowed, so any access via a memberexpr is disallowed. Array-element and sub-arrays (aka array sections) are both permitted, so long as they meet the requirements of #2. This patch implements all of these for compute constructs.

… (#92855) This solves some ambuguity introduced in P0522 regarding how template template parameters are partially ordered, and should reduce the negative impact of enabling `-frelaxed-template-template-args` by default. When performing template argument deduction, we extend the provisional wording introduced in llvm/llvm-project#89807 so it also covers deduction of class templates. Given the following example: ```C++ template <class T1, class T2 = float> struct A; template <class T3> struct B; template <template <class T4> class TT1, class T5> struct B<TT1<T5>>; // #1 template <class T6, class T7> struct B<A<T6, T7>>; // #2 template struct B<A<int>>; ``` Prior to P0522, `#2` was picked. Afterwards, this became ambiguous. This patch restores the pre-P0522 behavior, `#2` is picked again. This has the beneficial side effect of making the following code valid: ```C++ template<class T, class U> struct A {}; A<int, float> v; template<template<class> class TT> void f(TT<int>); // OK: TT picks 'float' as the default argument for the second parameter. void g() { f(v); } ``` --- Since this changes provisional implementation of CWG2398 which has not been released yet, and already contains a changelog entry, we don't provide a changelog entry here.

On macOS, to make DYLD_INSERT_LIBRARIES and the Python shim work together, we have a workaroud that copies the "real" Python interpreter into the build directory. This doesn't work when running in a virtual environment, as the copied interpreter cannot find the packages installed in the virtual environment relative to itself. Address this issue by copying the Python interpreter into the virtual environment's `bin` folder, rather than the build folder, when the test suite detects that it's being run inside a virtual environment. I'm not thrilled about this solution because it puts a file outside the build directory. However, given virtual environments are considered disposable, this seems reasonable.

….5) The Python interpreter in Xcode cannot be copied because of a relative RPATH. Our workaround would just use that Python interpreter directly when it detects this. For the reasons explained in my previous commit, that doesn't work in a virtual environment. Address this case by creating a symlink to the "real" interpreter in the virtual environment.

…on (#94752) Fixes #62925. The following code: ```cpp #include <map> int main() { std::map m1 = {std::pair{"foo", 2}, {"bar", 3}}; // guide #2 std::map m2(m1.begin(), m1.end()); // guide #1 } ``` Is rejected by clang, but accepted by both gcc and msvc: https://godbolt.org/z/6v4fvabb5 . So basically CTAD with copy-list-initialization is rejected. Note that this exact code is also used in a cppreference article: https://en.cppreference.com/w/cpp/container/map/deduction_guides I checked the C++11 and C++20 standard drafts to see whether suppressing user conversion is the correct thing to do for user conversions. Based on the standard I don't think that it is correct. ``` 13.3.1.4 Copy-initialization of class by user-defined conversion [over.match.copy] Under the conditions specified in 8.5, as part of a copy-initialization of an object of class type, a user-defined conversion can be invoked to convert an initializer expression to the type of the object being initialized. Overload resolution is used to select the user-defined conversion to be invoked ``` So we could use user defined conversions according to the standard. ``` If a narrowing conversion is required to initialize any of the elements, the program is ill-formed. ``` We should not do narrowing. ``` In copy-list-initialization, if an explicit constructor is chosen, the initialization is ill-formed. ``` We should not use explicit constructors.

asl transferred this issue from access-softek/msp430-clang Apr 6, 2020

asl transferred this issue from another repository Apr 10, 2020

asl transferred this issue from access-softek/msp430-clang Apr 10, 2020

asl added the abi label Apr 16, 2020

asl added the msp430 MSP430 target label Nov 13, 2023

asl closed this as completed Nov 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GCC passes and returns complex numbers in registers #2

GCC passes and returns complex numbers in registers #2

chbessonova commented Feb 4, 2019

atrosinenko commented Jul 18, 2020

GCC passes and returns complex numbers in registers #2

GCC passes and returns complex numbers in registers #2

Comments

chbessonova commented Feb 4, 2019

atrosinenko commented Jul 18, 2020