-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[Arm] Regenerate tests (NFC) #121801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Arm] Regenerate tests (NFC) #121801
Conversation
For some ABIs `update_cc_test_checks.py` is unable to generate tests because of the mismatch between the mangled function names reported by clang's `-asd-dump` and the function names in LLVM IR. This patch fixes it by striping the leading underscore from the mangled name for global functions if the data layout string says the have one.
This patch adds `instcombine` to some tests that were passing Clang's output to `opt -S --passes=mem2reg`, and converts some other tests to using `update_cc_test_checks.py`. Assembly part of some tests was also split out and moved to LLVM. This makes it easier to compare test changes in upcoming patches.
@llvm/pr-subscribers-backend-aarch64 @llvm/pr-subscribers-clang Author: Momchil Velikov (momchil-velikov) ChangesThis patch adds Assembly part of some tests was also split out and moved to LLVM. This makes it easier to compare test changes in upcoming patches. Patch is 6.30 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/121801.diff 48 Files Affected:
diff --git a/clang/test/CodeGen/AArch64/bf16-dotprod-intrinsics.c b/clang/test/CodeGen/AArch64/bf16-dotprod-intrinsics.c
index 877d83c0fa3954..caa803ee794603 100644
--- a/clang/test/CodeGen/AArch64/bf16-dotprod-intrinsics.c
+++ b/clang/test/CodeGen/AArch64/bf16-dotprod-intrinsics.c
@@ -1,6 +1,6 @@
// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
// RUN: %clang_cc1 -triple aarch64 -target-feature +neon -target-feature +bf16 \
-// RUN: -disable-O0-optnone -emit-llvm %s -o - | opt -S -passes=mem2reg | FileCheck %s
+// RUN: -disable-O0-optnone -emit-llvm %s -o - | opt -S -passes=mem2reg,instcombine | FileCheck %s
// REQUIRES: aarch64-registered-target || arm-registered-target
@@ -8,10 +8,7 @@
// CHECK-LABEL: @test_vbfdot_f32(
// CHECK-NEXT: entry:
-// CHECK-NEXT: [[TMP0:%.*]] = bitcast <2 x float> [[R:%.*]] to <8 x i8>
-// CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x bfloat> [[A:%.*]] to <8 x i8>
-// CHECK-NEXT: [[TMP2:%.*]] = bitcast <4 x bfloat> [[B:%.*]] to <8 x i8>
-// CHECK-NEXT: [[VBFDOT3_I:%.*]] = call <2 x float> @llvm.aarch64.neon.bfdot.v2f32.v4bf16(<2 x float> [[R]], <4 x bfloat> [[A]], <4 x bfloat> [[B]])
+// CHECK-NEXT: [[VBFDOT3_I:%.*]] = call <2 x float> @llvm.aarch64.neon.bfdot.v2f32.v4bf16(<2 x float> [[R:%.*]], <4 x bfloat> [[A:%.*]], <4 x bfloat> [[B:%.*]])
// CHECK-NEXT: ret <2 x float> [[VBFDOT3_I]]
//
float32x2_t test_vbfdot_f32(float32x2_t r, bfloat16x4_t a, bfloat16x4_t b) {
@@ -20,10 +17,7 @@ float32x2_t test_vbfdot_f32(float32x2_t r, bfloat16x4_t a, bfloat16x4_t b) {
// CHECK-LABEL: @test_vbfdotq_f32(
// CHECK-NEXT: entry:
-// CHECK-NEXT: [[TMP0:%.*]] = bitcast <4 x float> [[R:%.*]] to <16 x i8>
-// CHECK-NEXT: [[TMP1:%.*]] = bitcast <8 x bfloat> [[A:%.*]] to <16 x i8>
-// CHECK-NEXT: [[TMP2:%.*]] = bitcast <8 x bfloat> [[B:%.*]] to <16 x i8>
-// CHECK-NEXT: [[VBFDOT3_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfdot.v4f32.v8bf16(<4 x float> [[R]], <8 x bfloat> [[A]], <8 x bfloat> [[B]])
+// CHECK-NEXT: [[VBFDOT3_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfdot.v4f32.v8bf16(<4 x float> [[R:%.*]], <8 x bfloat> [[A:%.*]], <8 x bfloat> [[B:%.*]])
// CHECK-NEXT: ret <4 x float> [[VBFDOT3_I]]
//
float32x4_t test_vbfdotq_f32(float32x4_t r, bfloat16x8_t a, bfloat16x8_t b){
@@ -32,19 +26,10 @@ float32x4_t test_vbfdotq_f32(float32x4_t r, bfloat16x8_t a, bfloat16x8_t b){
// CHECK-LABEL: @test_vbfdot_lane_f32(
// CHECK-NEXT: entry:
-// CHECK-NEXT: [[__REINT_128:%.*]] = alloca <4 x bfloat>, align 8
-// CHECK-NEXT: [[__REINT1_128:%.*]] = alloca <2 x float>, align 8
-// CHECK-NEXT: store <4 x bfloat> [[B:%.*]], ptr [[__REINT_128]], align 8
-// CHECK-NEXT: [[TMP0:%.*]] = load <2 x float>, ptr [[__REINT_128]], align 8
-// CHECK-NEXT: [[TMP1:%.*]] = bitcast <2 x float> [[TMP0]] to <8 x i8>
-// CHECK-NEXT: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <2 x float>
-// CHECK-NEXT: [[LANE:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> [[TMP2]], <2 x i32> zeroinitializer
-// CHECK-NEXT: store <2 x float> [[LANE]], ptr [[__REINT1_128]], align 8
-// CHECK-NEXT: [[TMP3:%.*]] = load <4 x bfloat>, ptr [[__REINT1_128]], align 8
-// CHECK-NEXT: [[TMP4:%.*]] = bitcast <2 x float> [[R:%.*]] to <8 x i8>
-// CHECK-NEXT: [[TMP5:%.*]] = bitcast <4 x bfloat> [[A:%.*]] to <8 x i8>
-// CHECK-NEXT: [[TMP6:%.*]] = bitcast <4 x bfloat> [[TMP3]] to <8 x i8>
-// CHECK-NEXT: [[VBFDOT3_I:%.*]] = call <2 x float> @llvm.aarch64.neon.bfdot.v2f32.v4bf16(<2 x float> [[R]], <4 x bfloat> [[A]], <4 x bfloat> [[TMP3]])
+// CHECK-NEXT: [[DOTCAST:%.*]] = bitcast <4 x bfloat> [[B:%.*]] to <2 x float>
+// CHECK-NEXT: [[LANE:%.*]] = shufflevector <2 x float> [[DOTCAST]], <2 x float> poison, <2 x i32> zeroinitializer
+// CHECK-NEXT: [[DOTCAST1:%.*]] = bitcast <2 x float> [[LANE]] to <4 x bfloat>
+// CHECK-NEXT: [[VBFDOT3_I:%.*]] = call <2 x float> @llvm.aarch64.neon.bfdot.v2f32.v4bf16(<2 x float> [[R:%.*]], <4 x bfloat> [[A:%.*]], <4 x bfloat> [[DOTCAST1]])
// CHECK-NEXT: ret <2 x float> [[VBFDOT3_I]]
//
float32x2_t test_vbfdot_lane_f32(float32x2_t r, bfloat16x4_t a, bfloat16x4_t b){
@@ -53,19 +38,10 @@ float32x2_t test_vbfdot_lane_f32(float32x2_t r, bfloat16x4_t a, bfloat16x4_t b){
// CHECK-LABEL: @test_vbfdotq_laneq_f32(
// CHECK-NEXT: entry:
-// CHECK-NEXT: [[__REINT_130:%.*]] = alloca <8 x bfloat>, align 16
-// CHECK-NEXT: [[__REINT1_130:%.*]] = alloca <4 x float>, align 16
-// CHECK-NEXT: store <8 x bfloat> [[B:%.*]], ptr [[__REINT_130]], align 16
-// CHECK-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[__REINT_130]], align 16
-// CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x float> [[TMP0]] to <16 x i8>
-// CHECK-NEXT: [[TMP2:%.*]] = bitcast <16 x i8> [[TMP1]] to <4 x float>
-// CHECK-NEXT: [[LANE:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> [[TMP2]], <4 x i32> <i32 3, i32 3, i32 3, i32 3>
-// CHECK-NEXT: store <4 x float> [[LANE]], ptr [[__REINT1_130]], align 16
-// CHECK-NEXT: [[TMP3:%.*]] = load <8 x bfloat>, ptr [[__REINT1_130]], align 16
-// CHECK-NEXT: [[TMP4:%.*]] = bitcast <4 x float> [[R:%.*]] to <16 x i8>
-// CHECK-NEXT: [[TMP5:%.*]] = bitcast <8 x bfloat> [[A:%.*]] to <16 x i8>
-// CHECK-NEXT: [[TMP6:%.*]] = bitcast <8 x bfloat> [[TMP3]] to <16 x i8>
-// CHECK-NEXT: [[VBFDOT3_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfdot.v4f32.v8bf16(<4 x float> [[R]], <8 x bfloat> [[A]], <8 x bfloat> [[TMP3]])
+// CHECK-NEXT: [[DOTCAST:%.*]] = bitcast <8 x bfloat> [[B:%.*]] to <4 x float>
+// CHECK-NEXT: [[LANE:%.*]] = shufflevector <4 x float> [[DOTCAST]], <4 x float> poison, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
+// CHECK-NEXT: [[DOTCAST1:%.*]] = bitcast <4 x float> [[LANE]] to <8 x bfloat>
+// CHECK-NEXT: [[VBFDOT3_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfdot.v4f32.v8bf16(<4 x float> [[R:%.*]], <8 x bfloat> [[A:%.*]], <8 x bfloat> [[DOTCAST1]])
// CHECK-NEXT: ret <4 x float> [[VBFDOT3_I]]
//
float32x4_t test_vbfdotq_laneq_f32(float32x4_t r, bfloat16x8_t a, bfloat16x8_t b) {
@@ -74,19 +50,10 @@ float32x4_t test_vbfdotq_laneq_f32(float32x4_t r, bfloat16x8_t a, bfloat16x8_t b
// CHECK-LABEL: @test_vbfdot_laneq_f32(
// CHECK-NEXT: entry:
-// CHECK-NEXT: [[__REINT_132:%.*]] = alloca <8 x bfloat>, align 16
-// CHECK-NEXT: [[__REINT1_132:%.*]] = alloca <2 x float>, align 8
-// CHECK-NEXT: store <8 x bfloat> [[B:%.*]], ptr [[__REINT_132]], align 16
-// CHECK-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[__REINT_132]], align 16
-// CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x float> [[TMP0]] to <16 x i8>
-// CHECK-NEXT: [[TMP2:%.*]] = bitcast <16 x i8> [[TMP1]] to <4 x float>
-// CHECK-NEXT: [[LANE:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> [[TMP2]], <2 x i32> <i32 3, i32 3>
-// CHECK-NEXT: store <2 x float> [[LANE]], ptr [[__REINT1_132]], align 8
-// CHECK-NEXT: [[TMP3:%.*]] = load <4 x bfloat>, ptr [[__REINT1_132]], align 8
-// CHECK-NEXT: [[TMP4:%.*]] = bitcast <2 x float> [[R:%.*]] to <8 x i8>
-// CHECK-NEXT: [[TMP5:%.*]] = bitcast <4 x bfloat> [[A:%.*]] to <8 x i8>
-// CHECK-NEXT: [[TMP6:%.*]] = bitcast <4 x bfloat> [[TMP3]] to <8 x i8>
-// CHECK-NEXT: [[VBFDOT3_I:%.*]] = call <2 x float> @llvm.aarch64.neon.bfdot.v2f32.v4bf16(<2 x float> [[R]], <4 x bfloat> [[A]], <4 x bfloat> [[TMP3]])
+// CHECK-NEXT: [[DOTCAST:%.*]] = bitcast <8 x bfloat> [[B:%.*]] to <4 x float>
+// CHECK-NEXT: [[LANE:%.*]] = shufflevector <4 x float> [[DOTCAST]], <4 x float> poison, <2 x i32> <i32 3, i32 3>
+// CHECK-NEXT: [[DOTCAST1:%.*]] = bitcast <2 x float> [[LANE]] to <4 x bfloat>
+// CHECK-NEXT: [[VBFDOT3_I:%.*]] = call <2 x float> @llvm.aarch64.neon.bfdot.v2f32.v4bf16(<2 x float> [[R:%.*]], <4 x bfloat> [[A:%.*]], <4 x bfloat> [[DOTCAST1]])
// CHECK-NEXT: ret <2 x float> [[VBFDOT3_I]]
//
float32x2_t test_vbfdot_laneq_f32(float32x2_t r, bfloat16x4_t a, bfloat16x8_t b) {
@@ -95,19 +62,10 @@ float32x2_t test_vbfdot_laneq_f32(float32x2_t r, bfloat16x4_t a, bfloat16x8_t b)
// CHECK-LABEL: @test_vbfdotq_lane_f32(
// CHECK-NEXT: entry:
-// CHECK-NEXT: [[__REINT_126:%.*]] = alloca <4 x bfloat>, align 8
-// CHECK-NEXT: [[__REINT1_126:%.*]] = alloca <4 x float>, align 16
-// CHECK-NEXT: store <4 x bfloat> [[B:%.*]], ptr [[__REINT_126]], align 8
-// CHECK-NEXT: [[TMP0:%.*]] = load <2 x float>, ptr [[__REINT_126]], align 8
-// CHECK-NEXT: [[TMP1:%.*]] = bitcast <2 x float> [[TMP0]] to <8 x i8>
-// CHECK-NEXT: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <2 x float>
-// CHECK-NEXT: [[LANE:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> [[TMP2]], <4 x i32> zeroinitializer
-// CHECK-NEXT: store <4 x float> [[LANE]], ptr [[__REINT1_126]], align 16
-// CHECK-NEXT: [[TMP3:%.*]] = load <8 x bfloat>, ptr [[__REINT1_126]], align 16
-// CHECK-NEXT: [[TMP4:%.*]] = bitcast <4 x float> [[R:%.*]] to <16 x i8>
-// CHECK-NEXT: [[TMP5:%.*]] = bitcast <8 x bfloat> [[A:%.*]] to <16 x i8>
-// CHECK-NEXT: [[TMP6:%.*]] = bitcast <8 x bfloat> [[TMP3]] to <16 x i8>
-// CHECK-NEXT: [[VBFDOT3_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfdot.v4f32.v8bf16(<4 x float> [[R]], <8 x bfloat> [[A]], <8 x bfloat> [[TMP3]])
+// CHECK-NEXT: [[DOTCAST:%.*]] = bitcast <4 x bfloat> [[B:%.*]] to <2 x float>
+// CHECK-NEXT: [[LANE:%.*]] = shufflevector <2 x float> [[DOTCAST]], <2 x float> poison, <4 x i32> zeroinitializer
+// CHECK-NEXT: [[DOTCAST1:%.*]] = bitcast <4 x float> [[LANE]] to <8 x bfloat>
+// CHECK-NEXT: [[VBFDOT3_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfdot.v4f32.v8bf16(<4 x float> [[R:%.*]], <8 x bfloat> [[A:%.*]], <8 x bfloat> [[DOTCAST1]])
// CHECK-NEXT: ret <4 x float> [[VBFDOT3_I]]
//
float32x4_t test_vbfdotq_lane_f32(float32x4_t r, bfloat16x8_t a, bfloat16x4_t b) {
@@ -116,11 +74,7 @@ float32x4_t test_vbfdotq_lane_f32(float32x4_t r, bfloat16x8_t a, bfloat16x4_t b)
// CHECK-LABEL: @test_vbfmmlaq_f32(
// CHECK-NEXT: entry:
-// CHECK-NEXT: [[TMP0:%.*]] = bitcast <4 x float> [[R:%.*]] to <16 x i8>
-// CHECK-NEXT: [[TMP1:%.*]] = bitcast <8 x bfloat> [[A:%.*]] to <16 x i8>
-// CHECK-NEXT: [[TMP2:%.*]] = bitcast <8 x bfloat> [[B:%.*]] to <16 x i8>
-// CHECK-NEXT: [[VBFMMLAQ_F323_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfmmla(<4 x float> [[R]], <8 x bfloat> [[A]], <8 x bfloat> [[B]])
-// CHECK-NEXT: [[VBFMMLAQ_F324_I:%.*]] = bitcast <4 x float> [[VBFMMLAQ_F323_I]] to <16 x i8>
+// CHECK-NEXT: [[VBFMMLAQ_F323_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfmmla(<4 x float> [[R:%.*]], <8 x bfloat> [[A:%.*]], <8 x bfloat> [[B:%.*]])
// CHECK-NEXT: ret <4 x float> [[VBFMMLAQ_F323_I]]
//
float32x4_t test_vbfmmlaq_f32(float32x4_t r, bfloat16x8_t a, bfloat16x8_t b) {
@@ -129,11 +83,7 @@ float32x4_t test_vbfmmlaq_f32(float32x4_t r, bfloat16x8_t a, bfloat16x8_t b) {
// CHECK-LABEL: @test_vbfmlalbq_f32(
// CHECK-NEXT: entry:
-// CHECK-NEXT: [[TMP0:%.*]] = bitcast <4 x float> [[R:%.*]] to <16 x i8>
-// CHECK-NEXT: [[TMP1:%.*]] = bitcast <8 x bfloat> [[A:%.*]] to <16 x i8>
-// CHECK-NEXT: [[TMP2:%.*]] = bitcast <8 x bfloat> [[B:%.*]] to <16 x i8>
-// CHECK-NEXT: [[VBFMLALBQ_F323_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfmlalb(<4 x float> [[R]], <8 x bfloat> [[A]], <8 x bfloat> [[B]])
-// CHECK-NEXT: [[VBFMLALBQ_F324_I:%.*]] = bitcast <4 x float> [[VBFMLALBQ_F323_I]] to <16 x i8>
+// CHECK-NEXT: [[VBFMLALBQ_F323_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfmlalb(<4 x float> [[R:%.*]], <8 x bfloat> [[A:%.*]], <8 x bfloat> [[B:%.*]])
// CHECK-NEXT: ret <4 x float> [[VBFMLALBQ_F323_I]]
//
float32x4_t test_vbfmlalbq_f32(float32x4_t r, bfloat16x8_t a, bfloat16x8_t b) {
@@ -142,11 +92,7 @@ float32x4_t test_vbfmlalbq_f32(float32x4_t r, bfloat16x8_t a, bfloat16x8_t b) {
// CHECK-LABEL: @test_vbfmlaltq_f32(
// CHECK-NEXT: entry:
-// CHECK-NEXT: [[TMP0:%.*]] = bitcast <4 x float> [[R:%.*]] to <16 x i8>
-// CHECK-NEXT: [[TMP1:%.*]] = bitcast <8 x bfloat> [[A:%.*]] to <16 x i8>
-// CHECK-NEXT: [[TMP2:%.*]] = bitcast <8 x bfloat> [[B:%.*]] to <16 x i8>
-// CHECK-NEXT: [[VBFMLALTQ_F323_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfmlalt(<4 x float> [[R]], <8 x bfloat> [[A]], <8 x bfloat> [[B]])
-// CHECK-NEXT: [[VBFMLALTQ_F324_I:%.*]] = bitcast <4 x float> [[VBFMLALTQ_F323_I]] to <16 x i8>
+// CHECK-NEXT: [[VBFMLALTQ_F323_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfmlalt(<4 x float> [[R:%.*]], <8 x bfloat> [[A:%.*]], <8 x bfloat> [[B:%.*]])
// CHECK-NEXT: ret <4 x float> [[VBFMLALTQ_F323_I]]
//
float32x4_t test_vbfmlaltq_f32(float32x4_t r, bfloat16x8_t a, bfloat16x8_t b) {
@@ -155,27 +101,8 @@ float32x4_t test_vbfmlaltq_f32(float32x4_t r, bfloat16x8_t a, bfloat16x8_t b) {
// CHECK-LABEL: @test_vbfmlalbq_lane_f32(
// CHECK-NEXT: entry:
-// CHECK-NEXT: [[VGET_LANE:%.*]] = extractelement <4 x bfloat> [[B:%.*]], i32 0
-// CHECK-NEXT: [[VECINIT:%.*]] = insertelement <8 x bfloat> poison, bfloat [[VGET_LANE]], i32 0
-// CHECK-NEXT: [[VGET_LANE3:%.*]] = extractelement <4 x bfloat> [[B]], i32 0
-// CHECK-NEXT: [[VECINIT5:%.*]] = insertelement <8 x bfloat> [[VECINIT]], bfloat [[VGET_LANE3]], i32 1
-// CHECK-NEXT: [[VGET_LANE8:%.*]] = extractelement <4 x bfloat> [[B]], i32 0
-// CHECK-NEXT: [[VECINIT10:%.*]] = insertelement <8 x bfloat> [[VECINIT5]], bfloat [[VGET_LANE8]], i32 2
-// CHECK-NEXT: [[VGET_LANE13:%.*]] = extractelement <4 x bfloat> [[B]], i32 0
-// CHECK-NEXT: [[VECINIT15:%.*]] = insertelement <8 x bfloat> [[VECINIT10]], bfloat [[VGET_LANE13]], i32 3
-// CHECK-NEXT: [[VGET_LANE18:%.*]] = extractelement <4 x bfloat> [[B]], i32 0
-// CHECK-NEXT: [[VECINIT20:%.*]] = insertelement <8 x bfloat> [[VECINIT15]], bfloat [[VGET_LANE18]], i32 4
-// CHECK-NEXT: [[VGET_LANE23:%.*]] = extractelement <4 x bfloat> [[B]], i32 0
-// CHECK-NEXT: [[VECINIT25:%.*]] = insertelement <8 x bfloat> [[VECINIT20]], bfloat [[VGET_LANE23]], i32 5
-// CHECK-NEXT: [[VGET_LANE28:%.*]] = extractelement <4 x bfloat> [[B]], i32 0
-// CHECK-NEXT: [[VECINIT30:%.*]] = insertelement <8 x bfloat> [[VECINIT25]], bfloat [[VGET_LANE28]], i32 6
-// CHECK-NEXT: [[VGET_LANE33:%.*]] = extractelement <4 x bfloat> [[B]], i32 0
-// CHECK-NEXT: [[VECINIT35:%.*]] = insertelement <8 x bfloat> [[VECINIT30]], bfloat [[VGET_LANE33]], i32 7
-// CHECK-NEXT: [[TMP0:%.*]] = bitcast <4 x float> [[R:%.*]] to <16 x i8>
-// CHECK-NEXT: [[TMP1:%.*]] = bitcast <8 x bfloat> [[A:%.*]] to <16 x i8>
-// CHECK-NEXT: [[TMP2:%.*]] = bitcast <8 x bfloat> [[VECINIT35]] to <16 x i8>
-// CHECK-NEXT: [[VBFMLALBQ_F323_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfmlalb(<4 x float> [[R]], <8 x bfloat> [[A]], <8 x bfloat> [[VECINIT35]])
-// CHECK-NEXT: [[VBFMLALBQ_F324_I:%.*]] = bitcast <4 x float> [[VBFMLALBQ_F323_I]] to <16 x i8>
+// CHECK-NEXT: [[VECINIT35:%.*]] = shufflevector <4 x bfloat> [[B:%.*]], <4 x bfloat> poison, <8 x i32> zeroinitializer
+// CHECK-NEXT: [[VBFMLALBQ_F323_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfmlalb(<4 x float> [[R:%.*]], <8 x bfloat> [[A:%.*]], <8 x bfloat> [[VECINIT35]])
// CHECK-NEXT: ret <4 x float> [[VBFMLALBQ_F323_I]]
//
float32x4_t test_vbfmlalbq_lane_f32(float32x4_t r, bfloat16x8_t a, bfloat16x4_t b) {
@@ -184,27 +111,8 @@ float32x4_t test_vbfmlalbq_lane_f32(float32x4_t r, bfloat16x8_t a, bfloat16x4_t
// CHECK-LABEL: @test_vbfmlalbq_laneq_f32(
// CHECK-NEXT: entry:
-// CHECK-NEXT: [[VGETQ_LANE:%.*]] = extractelement <8 x bfloat> [[B:%.*]], i32 3
-// CHECK-NEXT: [[VECINIT:%.*]] = insertelement <8 x bfloat> poison, bfloat [[VGETQ_LANE]], i32 0
-// CHECK-NEXT: [[VGETQ_LANE3:%.*]] = extractelement <8 x bfloat> [[B]], i32 3
-// CHECK-NEXT: [[VECINIT5:%.*]] = insertelement <8 x bfloat> [[VECINIT]], bfloat [[VGETQ_LANE3]], i32 1
-// CHECK-NEXT: [[VGETQ_LANE8:%.*]] = extractelement <8 x bfloat> [[B]], i32 3
-// CHECK-NEXT: [[VECINIT10:%.*]] = insertelement <8 x bfloat> [[VECINIT5]], bfloat [[VGETQ_LANE8]], i32 2
-// CHECK-NEXT: [[VGETQ_LANE13:%.*]] = extractelement <8 x bfloat> [[B]], i32 3
-// CHECK-NEXT: [[VECINIT15:%.*]] = insertelement <8 x bfloat> [[VECINIT10]], bfloat [[VGETQ_LANE13]], i32 3
-// CHECK-NEXT: [[VGETQ_LANE18:%.*]] = extractelement <8 x bfloat> [[B]], i32 3
-// CHECK-NEXT: [[VECINIT20:%.*]] = insertelement <8 x bfloat> [[VECINIT15]], bfloat [[VGETQ_LANE18]], i32 4
-// CHECK-NEXT: [[VGETQ_LANE23:%.*]] = extractelement <8 x bfloat> [[B]], i32 3
-// CHECK-NEXT: [[VECINIT25:%.*]] = insertelement <8 x bfloat> [[VECINIT20]], bfloat [[VGETQ_LANE23]], i32 5
-// CHECK-NEXT: [[VGETQ_LANE28:%.*]] = extractelement <8 x bfloat> [[B]], i32 3
-// CHECK-NEXT: [[VECINIT30:%.*]] = insertelement <8 x bfloat> [[VECINIT25]], bfloat [[VGETQ_LANE28]], i32 6
-// CHECK-NEXT: [[VGETQ_LANE33:%.*]] = extractelement <8 x bfloat> [[B]], i32 3
-// CHECK-NEXT: [[VECINIT35:%.*]] = insertelement <8 x bfloat> [[VECINIT30]], bfloat [[VGETQ_LANE33]], i32 7
-// CHECK-NEXT: [[TMP0:%.*]] = bitcast <4 x float> [[R:%.*]] to <16 x i8>
-// CHECK-NEXT: [[TMP1:%.*]] = bitcast <8 x bfloat> [[A:%.*]] to <16 x i8>
-// CHECK-NEXT: [[TMP2:%.*]] = bitcast <8 x bfloat> [[VECINIT35]] to <16 x i8>
-// CHECK-NEXT: [[VBFMLALBQ_F323_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfmlalb(<4 x float> [[R]], <8 x bfloat> [[A]], <8 x bfloat> [[VECINIT35]])
-// CHECK-NEXT: [[VBFMLALBQ_F324_I:%.*]] = bitcast <4 x float> [[VBFMLALBQ_F323_I]] to <16 x i8>
+// CHECK-NEXT: [[VECINIT35:%.*]] = shufflevector <8 x bfloat> [[B:%.*]], <8 x bfloat> poison, <8 x i32> <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>
+// CHECK-NEXT: [[VBFMLALBQ_F323_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfmlalb(<4 x float> [[R:%.*]], <8 x bfloat> [[A:%.*]], <8 x bfloat> [[VECINIT35]])
// CHECK-NEXT: ret <4 x float> [[VBFMLALBQ_F323_I]]
//
float32x4_t test_vbfmlalbq_laneq_f32(float32x4_t r, bfloat16x8_t a, bfloat16x8_t b) {
@@ -213,27 +121,8 @@ float32x4_t test_vbfmlalbq_laneq_f32(float32x4_t r, bfloat16x8_t a, bfloat16x8_t
// CHECK-LABEL: @test_vbfmlaltq_lane_f32(
// CHECK-NEXT: entry:
-// CHECK-NEXT: [[VGET_LANE:%.*]] = extractelement <4 x bfloat> [[B:%.*]], i32 0
-// CHECK-NEXT: [[VECINIT:%.*]] = insertelement <8 x bfloat> poison, bfloat [[VGET_LANE]], i32 0
-// CHECK-NEXT: [[VGET_LANE3:%.*]] = extractelement <4 x bfloat> [[B]], i32 0
-// CHECK-NEXT: [[VECINIT5:%.*]] = insertelement <8 x bfloat> [[VECINIT]], bfloat [[VGET_LANE3]], i32 1
-// CHECK-NEXT: [[VGET_LANE8:%.*]] = extractelement <4 x bfloat> [[B]], i32 0
-// CHECK-NEXT: [[VECINIT10:%.*]] = insertelement <8 x bfloat> [[VECINIT5]], bfloat [[VGET_LANE8]], i32 2
-// CHECK-NEXT: [[VGET_LANE13:%.*]] = extractelement <4 x bfloat> [[B]], i32 0
-// CHECK-NEXT: [[VECINIT15:%.*]] = insertelement <8 x bfloat> [[VECINIT10]], bfloat [[VGET_LANE13]], i32 3
-// CHECK-NEXT: [[VGET_LANE18:%.*]] = extractelement <4 x bfloat> [[B]], i32 0
-// CHECK-NEXT: [[VECINIT20:%.*]] = insertelement <8 x bfloat> [[VECINIT15]], bfloat [[VGET_LANE18]], i32 4
-// CHECK-NEXT: [[VGET_LANE23:%.*]] = extractelement <4 x bfloat> [[B]], i32 0
-// CHECK-NEXT: [[VECINIT25:%.*]] = insertelement <8 x bfloat> [[VECINIT20]], bfloat [[VGET_LANE23]], i32 5
-// CHECK-NEXT: [[VGET_LANE28:%.*]] = extractelement <4 x bfloat> [[B]], i32 0
-// CHECK-NEXT: [[VECINIT30:%.*]] = insertelement <8 x bfloat> [[VECINIT25]], bfloat [[VGET_LANE28]], i32 6
-// CHECK-NEXT: [[VGET_LANE33:%.*]] = extractelement <4 x bfloat> [[B]], i32 0
-// CHECK-NEXT: [[VECINIT35:%.*]] = insertelement <8 x bfloat> [[VECINIT30]], bfloat [[VGET_LANE33]], i32 7
-// CHECK-NEXT: [[TMP0:%.*]] = bitcast <4 x float> [[R:%.*]] to <16 x i8>
-// CHECK-NEXT: [[TMP1:%.*]] = bitcast <8 x bfloat> [[A:%.*]] to <16 x i8>
-// CHECK-NEXT: [[TMP2:%.*]] = bitcast <8 x bfloat> [[VECINIT35]] to <16 x i8>
-// CHECK-NEXT: [[VBFMLALTQ_F323_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfmlalt(<4 x float> [[R]], <8 x bfloat> [[A]], <8 x bfloat> [[VECINIT35]])
-// CHECK-NEXT: [[VBFMLALTQ_F324_I:%.*]] = bitcast <4 x float> [[VBFMLALTQ_F323_I]] to <16 x i8>
+// CHECK-NEXT: [[VECINIT35:%.*]] = shufflevector <4 x bfloat> [[B:%.*]], <4 x bfloat> poison, <8 x i32> zeroinitializer
+// CHECK-NEXT: [[VBFMLALTQ_F323_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfmlalt(<4 x float> [[R:%.*]], <8 x bfloat> [[A:%.*]], <8 x bfloat> [[VECINIT35]])
// CHECK-NEXT: ...
[truncated]
|
You can test this locally with the following command:darker --check --diff -r 7cb6e6bced8ca5767c3e609f4826982638fd9543...67ba481f809abdc85a90fed5ff11ca48c6578e3e llvm/utils/UpdateTestChecks/common.py llvm/utils/update_cc_test_checks.py View the diff from darker here.--- UpdateTestChecks/common.py 2025-01-06 15:08:27.000000 +0000
+++ UpdateTestChecks/common.py 2025-01-06 17:04:31.173625 +0000
@@ -556,12 +556,11 @@
UTC_ADVERT = "NOTE: Assertions have been autogenerated by "
UTC_AVOID = "NOTE: Do not autogenerate"
UNUSED_NOTE = "NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:"
DATA_LAYOUT_RE = re.compile(
- r"target\sdatalayout\s=\s\"(?P<layout>.+)\"$",
- flags=(re.M | re.S)
+ r"target\sdatalayout\s=\s\"(?P<layout>.+)\"$", flags=(re.M | re.S)
)
OPT_FUNCTION_RE = re.compile(
r"^(\s*;\s*Function\sAttrs:\s(?P<attrs>[\w\s():,]+?))?\s*define\s+(?P<funcdef_attrs_and_ret>[^@]*)@(?P<func>[\w.$-]+?)\s*"
r"(?P<args_and_sig>\((\)|(.*?[\w.-]+?)\))[^{]*\{)\n(?P<body>.*?)^\}$",
@@ -653,20 +652,22 @@
if march.startswith(prefix):
return triple
print("Cannot find a triple. Assume 'x86'", file=sys.stderr)
return "x86"
+
def get_global_underscores(raw_tool_output):
m = DATA_LAYOUT_RE.search(raw_tool_output)
if not m:
return False
data_layout = m.group("layout")
idx = data_layout.find("m:")
if idx < 0:
return False
ch = data_layout[idx + 2]
- return ch == 'o' or ch == 'x'
+ return ch == "o" or ch == "x"
+
def apply_filters(line, filters):
has_filter = False
for f in filters:
if not f.is_filter_out:
--- update_cc_test_checks.py 2025-01-06 15:08:27.000000 +0000
+++ update_cc_test_checks.py 2025-01-06 17:04:31.395847 +0000
@@ -123,11 +123,11 @@
search = spell
mangled = node.get("mangledName", spell)
# Strip leading underscore from globals, so the name matches the LLVM one
if global_underscores:
storage = node.get("storageClass", None)
- if storage != "static" and mangled[0] == '_':
+ if storage != "static" and mangled[0] == "_":
mangled = mangled[1:]
ret[int(line) - 1].append((spell, mangled, search))
ast = json.loads(stdout)
if ast["kind"] != "TranslationUnitDecl":
@@ -252,11 +252,13 @@
args.opt = None
return args, parser
-def get_function_body(builder, args, filename, clang_args, extra_commands, prefixes, raw_tool_output):
+def get_function_body(
+ builder, args, filename, clang_args, extra_commands, prefixes, raw_tool_output
+):
# TODO Clean up duplication of asm/common build_function_body_dictionary
for extra_command in extra_commands:
extra_args = shlex.split(extra_command)
with tempfile.NamedTemporaryFile() as f:
f.write(raw_tool_output.encode())
@@ -387,16 +389,24 @@
common.debug("Extracted FileCheck prefixes: {}".format(prefixes))
# Invoke external tool and extract function bodies.
raw_tool_output = common.invoke_tool(ti.args.clang, clang_args, ti.path)
get_function_body(
- builder, ti.args, ti.path, clang_args, extra_commands, prefixes, raw_tool_output
+ builder,
+ ti.args,
+ ti.path,
+ clang_args,
+ extra_commands,
+ prefixes,
+ raw_tool_output,
)
# Invoke clang -Xclang -ast-dump=json to get mapping from start lines to
# mangled names. Forward all clang args for now.
- for k, v in get_line2func_list(ti.args, clang_args, common.get_global_underscores(raw_tool_output)).items():
+ for k, v in get_line2func_list(
+ ti.args, clang_args, common.get_global_underscores(raw_tool_output)
+ ).items():
line2func_list[k].extend(v)
func_dict = builder.finish_and_get_func_dict()
global_vars_seen_dict = {}
prefix_set = set([prefix for p in filecheck_run_list for prefix in p[0]])
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This patch sounds good, but the intent of the clang tests is that they should not be dependant on the mid-end optimizations in llvm. At least for fast-changing parts like instcombine, where you don't want to have to update a thousand clang tests every time instcombine learns a new trick. The clang tests should ideally be testing the clang output. Can the instcombines be replaced with something simpler like dce or maybe instsimplify? It might be OK with just mem2reg. |
Unfortunately just
and the differences persist after
which makes the test updates after #121802 very hard to inspect. |
sroa would be ideal if it works, I know a number of test cases use it and it shouldn't update too often. |
Another things that "works" is changing the codegen for llvm-project/clang/lib/CodeGen/CGExprScalar.cpp Line 2295 in 32bc029
It stores a value with one type and loads it back with another type which is not handled by mem2reg (
Instead it could perform a load with the original type and then emit an LLVM IR bitcast , just like it does for "C-style" bitcast: llvm-project/clang/lib/CodeGen/CGExprScalar.cpp Line 2425 in 32bc029
|
Except that it does not work when aggregate types are involved. |
Abandon for now, the follow-up patch (#121802) was reduced in scope and does not need to update tests. |
This patch adds
instcombine
to some tests that were passingClang's output to
opt -S --passes=mem2reg
, and converts someother tests to using
update_cc_test_checks.py
.Assembly part of some tests was also split out and moved to LLVM.
This makes it easier to compare test changes in upcoming patches. (#121802)