Skip to content

[Arm] Regenerate tests (NFC) #121801

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

momchil-velikov
Copy link
Collaborator

@momchil-velikov momchil-velikov commented Jan 6, 2025

This patch adds instcombine to some tests that were passing
Clang's output to opt -S --passes=mem2reg, and converts some
other tests to using update_cc_test_checks.py.

Assembly part of some tests was also split out and moved to LLVM.

This makes it easier to compare test changes in upcoming patches. (#121802)

For some ABIs `update_cc_test_checks.py` is unable to generate
tests because of the mismatch between the mangled function names
reported by clang's `-asd-dump` and the function names in LLVM IR.

This patch fixes it by striping the leading underscore from
the mangled name for global functions if the data layout string says
the have one.
This patch adds `instcombine` to some tests that were passing
Clang's output to `opt -S --passes=mem2reg`, and converts some
other tests to using `update_cc_test_checks.py`.

Assembly part of some tests was also split out and moved to LLVM.

This makes it easier to compare test changes in upcoming patches.
@llvmbot
Copy link
Member

llvmbot commented Jan 6, 2025

@llvm/pr-subscribers-backend-aarch64
@llvm/pr-subscribers-testing-tools

@llvm/pr-subscribers-clang

Author: Momchil Velikov (momchil-velikov)

Changes

This patch adds instcombine to some tests that were passing
Clang's output to opt -S --passes=mem2reg, and converts some
other tests to using update_cc_test_checks.py.

Assembly part of some tests was also split out and moved to LLVM.

This makes it easier to compare test changes in upcoming patches.


Patch is 6.30 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/121801.diff

48 Files Affected:

  • (modified) clang/test/CodeGen/AArch64/bf16-dotprod-intrinsics.c (+30-160)
  • (modified) clang/test/CodeGen/AArch64/bf16-getset-intrinsics.c (+17-33)
  • (modified) clang/test/CodeGen/AArch64/bf16-reinterpret-intrinsics.c (+218-163)
  • (modified) clang/test/CodeGen/AArch64/neon-2velem.c (+775-2178)
  • (modified) clang/test/CodeGen/AArch64/neon-extract.c (+143-145)
  • (modified) clang/test/CodeGen/AArch64/neon-fma.c (+33-75)
  • (modified) clang/test/CodeGen/AArch64/neon-fp16fml.c (+41-865)
  • (modified) clang/test/CodeGen/AArch64/neon-intrinsics-constrained.c (+1173-453)
  • (modified) clang/test/CodeGen/AArch64/neon-intrinsics.c (+15106-10053)
  • (modified) clang/test/CodeGen/AArch64/neon-ldst-one-rcpc3.c (+33-65)
  • (modified) clang/test/CodeGen/AArch64/neon-ldst-one.c (+6458-4665)
  • (modified) clang/test/CodeGen/AArch64/neon-misc-constrained.c (+51-33)
  • (modified) clang/test/CodeGen/AArch64/neon-misc.c (+2094-1396)
  • (modified) clang/test/CodeGen/AArch64/neon-perm.c (+1298-1207)
  • (modified) clang/test/CodeGen/AArch64/neon-scalar-x-indexed-elem-constrained.c (+133-90)
  • (modified) clang/test/CodeGen/AArch64/neon-scalar-x-indexed-elem.c (+338-252)
  • (modified) clang/test/CodeGen/AArch64/poly-add.c (+11-26)
  • (modified) clang/test/CodeGen/AArch64/poly128.c (+32-34)
  • (modified) clang/test/CodeGen/AArch64/poly64.c (+518-338)
  • (modified) clang/test/CodeGen/AArch64/v8.1a-neon-intrinsics.c (+33-53)
  • (modified) clang/test/CodeGen/AArch64/v8.2a-neon-intrinsics-constrained.c (+333-233)
  • (modified) clang/test/CodeGen/AArch64/v8.2a-neon-intrinsics-generic.c (+58-150)
  • (modified) clang/test/CodeGen/AArch64/v8.2a-neon-intrinsics.c (+111-426)
  • (modified) clang/test/CodeGen/AArch64/v8.5a-neon-frint3264-intrinsic.c (+98-49)
  • (modified) clang/test/CodeGen/AArch64/v8.6a-neon-intrinsics.c (+104-88)
  • (modified) clang/test/CodeGen/arm-bf16-convert-intrinsics.c (+84-306)
  • (modified) clang/test/CodeGen/arm-bf16-dotprod-intrinsics.c (+31-161)
  • (modified) clang/test/CodeGen/arm-bf16-getset-intrinsics.c (+18-34)
  • (modified) clang/test/CodeGen/arm-neon-directed-rounding-constrained.c (+53-39)
  • (modified) clang/test/CodeGen/arm-neon-directed-rounding.c (+171-62)
  • (modified) clang/test/CodeGen/arm-neon-fma.c (+13-27)
  • (modified) clang/test/CodeGen/arm-neon-numeric-maxmin.c (+3-15)
  • (modified) clang/test/CodeGen/arm-neon-vcvtX.c (+9-25)
  • (modified) clang/test/CodeGen/arm-poly-add.c (+30-35)
  • (modified) clang/test/CodeGen/arm-v8.1a-neon-intrinsics.c (+82-114)
  • (modified) clang/test/CodeGen/arm-v8.2a-neon-intrinsics-generic.c (+119-277)
  • (modified) clang/test/CodeGen/arm-v8.2a-neon-intrinsics.c (+690-371)
  • (modified) clang/test/CodeGen/arm-v8.6a-neon-intrinsics.c (+62-48)
  • (modified) clang/test/CodeGen/arm64_vdupq_n_f64.c (+44-38)
  • (modified) clang/test/CodeGen/arm_neon_intrinsics.c (+15482-12225)
  • (added) llvm/test/CodeGen/AArch64/neon-misc-constrained.ll (+46)
  • (added) llvm/test/CodeGen/AArch64/neon-misc-unconstrained.ll (+45)
  • (added) llvm/test/CodeGen/AArch64/neon-scalar-x-indexed-elem-constrained.ll (+103)
  • (added) llvm/test/CodeGen/AArch64/neon-scalar-x-indexed-elem-unconstrained.ll (+103)
  • (added) llvm/test/CodeGen/AArch64/v8.2a-neon-intrinsics-constrained.ll (+276)
  • (added) llvm/test/CodeGen/AArch64/v8.2a-neon-intrinsics-unconstrained.ll (+265)
  • (modified) llvm/utils/UpdateTestChecks/common.py (+15)
  • (modified) llvm/utils/update_cc_test_checks.py (+11-6)
diff --git a/clang/test/CodeGen/AArch64/bf16-dotprod-intrinsics.c b/clang/test/CodeGen/AArch64/bf16-dotprod-intrinsics.c
index 877d83c0fa3954..caa803ee794603 100644
--- a/clang/test/CodeGen/AArch64/bf16-dotprod-intrinsics.c
+++ b/clang/test/CodeGen/AArch64/bf16-dotprod-intrinsics.c
@@ -1,6 +1,6 @@
 // NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
 // RUN: %clang_cc1 -triple aarch64 -target-feature +neon -target-feature +bf16 \
-// RUN: -disable-O0-optnone -emit-llvm %s -o - | opt -S -passes=mem2reg | FileCheck %s
+// RUN: -disable-O0-optnone -emit-llvm %s -o - | opt -S -passes=mem2reg,instcombine | FileCheck %s
 
 // REQUIRES: aarch64-registered-target || arm-registered-target
 
@@ -8,10 +8,7 @@
 
 // CHECK-LABEL: @test_vbfdot_f32(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = bitcast <2 x float> [[R:%.*]] to <8 x i8>
-// CHECK-NEXT:    [[TMP1:%.*]] = bitcast <4 x bfloat> [[A:%.*]] to <8 x i8>
-// CHECK-NEXT:    [[TMP2:%.*]] = bitcast <4 x bfloat> [[B:%.*]] to <8 x i8>
-// CHECK-NEXT:    [[VBFDOT3_I:%.*]] = call <2 x float> @llvm.aarch64.neon.bfdot.v2f32.v4bf16(<2 x float> [[R]], <4 x bfloat> [[A]], <4 x bfloat> [[B]])
+// CHECK-NEXT:    [[VBFDOT3_I:%.*]] = call <2 x float> @llvm.aarch64.neon.bfdot.v2f32.v4bf16(<2 x float> [[R:%.*]], <4 x bfloat> [[A:%.*]], <4 x bfloat> [[B:%.*]])
 // CHECK-NEXT:    ret <2 x float> [[VBFDOT3_I]]
 //
 float32x2_t test_vbfdot_f32(float32x2_t r, bfloat16x4_t a, bfloat16x4_t b) {
@@ -20,10 +17,7 @@ float32x2_t test_vbfdot_f32(float32x2_t r, bfloat16x4_t a, bfloat16x4_t b) {
 
 // CHECK-LABEL: @test_vbfdotq_f32(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = bitcast <4 x float> [[R:%.*]] to <16 x i8>
-// CHECK-NEXT:    [[TMP1:%.*]] = bitcast <8 x bfloat> [[A:%.*]] to <16 x i8>
-// CHECK-NEXT:    [[TMP2:%.*]] = bitcast <8 x bfloat> [[B:%.*]] to <16 x i8>
-// CHECK-NEXT:    [[VBFDOT3_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfdot.v4f32.v8bf16(<4 x float> [[R]], <8 x bfloat> [[A]], <8 x bfloat> [[B]])
+// CHECK-NEXT:    [[VBFDOT3_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfdot.v4f32.v8bf16(<4 x float> [[R:%.*]], <8 x bfloat> [[A:%.*]], <8 x bfloat> [[B:%.*]])
 // CHECK-NEXT:    ret <4 x float> [[VBFDOT3_I]]
 //
 float32x4_t test_vbfdotq_f32(float32x4_t r, bfloat16x8_t a, bfloat16x8_t b){
@@ -32,19 +26,10 @@ float32x4_t test_vbfdotq_f32(float32x4_t r, bfloat16x8_t a, bfloat16x8_t b){
 
 // CHECK-LABEL: @test_vbfdot_lane_f32(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[__REINT_128:%.*]] = alloca <4 x bfloat>, align 8
-// CHECK-NEXT:    [[__REINT1_128:%.*]] = alloca <2 x float>, align 8
-// CHECK-NEXT:    store <4 x bfloat> [[B:%.*]], ptr [[__REINT_128]], align 8
-// CHECK-NEXT:    [[TMP0:%.*]] = load <2 x float>, ptr [[__REINT_128]], align 8
-// CHECK-NEXT:    [[TMP1:%.*]] = bitcast <2 x float> [[TMP0]] to <8 x i8>
-// CHECK-NEXT:    [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <2 x float>
-// CHECK-NEXT:    [[LANE:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> [[TMP2]], <2 x i32> zeroinitializer
-// CHECK-NEXT:    store <2 x float> [[LANE]], ptr [[__REINT1_128]], align 8
-// CHECK-NEXT:    [[TMP3:%.*]] = load <4 x bfloat>, ptr [[__REINT1_128]], align 8
-// CHECK-NEXT:    [[TMP4:%.*]] = bitcast <2 x float> [[R:%.*]] to <8 x i8>
-// CHECK-NEXT:    [[TMP5:%.*]] = bitcast <4 x bfloat> [[A:%.*]] to <8 x i8>
-// CHECK-NEXT:    [[TMP6:%.*]] = bitcast <4 x bfloat> [[TMP3]] to <8 x i8>
-// CHECK-NEXT:    [[VBFDOT3_I:%.*]] = call <2 x float> @llvm.aarch64.neon.bfdot.v2f32.v4bf16(<2 x float> [[R]], <4 x bfloat> [[A]], <4 x bfloat> [[TMP3]])
+// CHECK-NEXT:    [[DOTCAST:%.*]] = bitcast <4 x bfloat> [[B:%.*]] to <2 x float>
+// CHECK-NEXT:    [[LANE:%.*]] = shufflevector <2 x float> [[DOTCAST]], <2 x float> poison, <2 x i32> zeroinitializer
+// CHECK-NEXT:    [[DOTCAST1:%.*]] = bitcast <2 x float> [[LANE]] to <4 x bfloat>
+// CHECK-NEXT:    [[VBFDOT3_I:%.*]] = call <2 x float> @llvm.aarch64.neon.bfdot.v2f32.v4bf16(<2 x float> [[R:%.*]], <4 x bfloat> [[A:%.*]], <4 x bfloat> [[DOTCAST1]])
 // CHECK-NEXT:    ret <2 x float> [[VBFDOT3_I]]
 //
 float32x2_t test_vbfdot_lane_f32(float32x2_t r, bfloat16x4_t a, bfloat16x4_t b){
@@ -53,19 +38,10 @@ float32x2_t test_vbfdot_lane_f32(float32x2_t r, bfloat16x4_t a, bfloat16x4_t b){
 
 // CHECK-LABEL: @test_vbfdotq_laneq_f32(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[__REINT_130:%.*]] = alloca <8 x bfloat>, align 16
-// CHECK-NEXT:    [[__REINT1_130:%.*]] = alloca <4 x float>, align 16
-// CHECK-NEXT:    store <8 x bfloat> [[B:%.*]], ptr [[__REINT_130]], align 16
-// CHECK-NEXT:    [[TMP0:%.*]] = load <4 x float>, ptr [[__REINT_130]], align 16
-// CHECK-NEXT:    [[TMP1:%.*]] = bitcast <4 x float> [[TMP0]] to <16 x i8>
-// CHECK-NEXT:    [[TMP2:%.*]] = bitcast <16 x i8> [[TMP1]] to <4 x float>
-// CHECK-NEXT:    [[LANE:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> [[TMP2]], <4 x i32> <i32 3, i32 3, i32 3, i32 3>
-// CHECK-NEXT:    store <4 x float> [[LANE]], ptr [[__REINT1_130]], align 16
-// CHECK-NEXT:    [[TMP3:%.*]] = load <8 x bfloat>, ptr [[__REINT1_130]], align 16
-// CHECK-NEXT:    [[TMP4:%.*]] = bitcast <4 x float> [[R:%.*]] to <16 x i8>
-// CHECK-NEXT:    [[TMP5:%.*]] = bitcast <8 x bfloat> [[A:%.*]] to <16 x i8>
-// CHECK-NEXT:    [[TMP6:%.*]] = bitcast <8 x bfloat> [[TMP3]] to <16 x i8>
-// CHECK-NEXT:    [[VBFDOT3_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfdot.v4f32.v8bf16(<4 x float> [[R]], <8 x bfloat> [[A]], <8 x bfloat> [[TMP3]])
+// CHECK-NEXT:    [[DOTCAST:%.*]] = bitcast <8 x bfloat> [[B:%.*]] to <4 x float>
+// CHECK-NEXT:    [[LANE:%.*]] = shufflevector <4 x float> [[DOTCAST]], <4 x float> poison, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
+// CHECK-NEXT:    [[DOTCAST1:%.*]] = bitcast <4 x float> [[LANE]] to <8 x bfloat>
+// CHECK-NEXT:    [[VBFDOT3_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfdot.v4f32.v8bf16(<4 x float> [[R:%.*]], <8 x bfloat> [[A:%.*]], <8 x bfloat> [[DOTCAST1]])
 // CHECK-NEXT:    ret <4 x float> [[VBFDOT3_I]]
 //
 float32x4_t test_vbfdotq_laneq_f32(float32x4_t r, bfloat16x8_t a, bfloat16x8_t b) {
@@ -74,19 +50,10 @@ float32x4_t test_vbfdotq_laneq_f32(float32x4_t r, bfloat16x8_t a, bfloat16x8_t b
 
 // CHECK-LABEL: @test_vbfdot_laneq_f32(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[__REINT_132:%.*]] = alloca <8 x bfloat>, align 16
-// CHECK-NEXT:    [[__REINT1_132:%.*]] = alloca <2 x float>, align 8
-// CHECK-NEXT:    store <8 x bfloat> [[B:%.*]], ptr [[__REINT_132]], align 16
-// CHECK-NEXT:    [[TMP0:%.*]] = load <4 x float>, ptr [[__REINT_132]], align 16
-// CHECK-NEXT:    [[TMP1:%.*]] = bitcast <4 x float> [[TMP0]] to <16 x i8>
-// CHECK-NEXT:    [[TMP2:%.*]] = bitcast <16 x i8> [[TMP1]] to <4 x float>
-// CHECK-NEXT:    [[LANE:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> [[TMP2]], <2 x i32> <i32 3, i32 3>
-// CHECK-NEXT:    store <2 x float> [[LANE]], ptr [[__REINT1_132]], align 8
-// CHECK-NEXT:    [[TMP3:%.*]] = load <4 x bfloat>, ptr [[__REINT1_132]], align 8
-// CHECK-NEXT:    [[TMP4:%.*]] = bitcast <2 x float> [[R:%.*]] to <8 x i8>
-// CHECK-NEXT:    [[TMP5:%.*]] = bitcast <4 x bfloat> [[A:%.*]] to <8 x i8>
-// CHECK-NEXT:    [[TMP6:%.*]] = bitcast <4 x bfloat> [[TMP3]] to <8 x i8>
-// CHECK-NEXT:    [[VBFDOT3_I:%.*]] = call <2 x float> @llvm.aarch64.neon.bfdot.v2f32.v4bf16(<2 x float> [[R]], <4 x bfloat> [[A]], <4 x bfloat> [[TMP3]])
+// CHECK-NEXT:    [[DOTCAST:%.*]] = bitcast <8 x bfloat> [[B:%.*]] to <4 x float>
+// CHECK-NEXT:    [[LANE:%.*]] = shufflevector <4 x float> [[DOTCAST]], <4 x float> poison, <2 x i32> <i32 3, i32 3>
+// CHECK-NEXT:    [[DOTCAST1:%.*]] = bitcast <2 x float> [[LANE]] to <4 x bfloat>
+// CHECK-NEXT:    [[VBFDOT3_I:%.*]] = call <2 x float> @llvm.aarch64.neon.bfdot.v2f32.v4bf16(<2 x float> [[R:%.*]], <4 x bfloat> [[A:%.*]], <4 x bfloat> [[DOTCAST1]])
 // CHECK-NEXT:    ret <2 x float> [[VBFDOT3_I]]
 //
 float32x2_t test_vbfdot_laneq_f32(float32x2_t r, bfloat16x4_t a, bfloat16x8_t b) {
@@ -95,19 +62,10 @@ float32x2_t test_vbfdot_laneq_f32(float32x2_t r, bfloat16x4_t a, bfloat16x8_t b)
 
 // CHECK-LABEL: @test_vbfdotq_lane_f32(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[__REINT_126:%.*]] = alloca <4 x bfloat>, align 8
-// CHECK-NEXT:    [[__REINT1_126:%.*]] = alloca <4 x float>, align 16
-// CHECK-NEXT:    store <4 x bfloat> [[B:%.*]], ptr [[__REINT_126]], align 8
-// CHECK-NEXT:    [[TMP0:%.*]] = load <2 x float>, ptr [[__REINT_126]], align 8
-// CHECK-NEXT:    [[TMP1:%.*]] = bitcast <2 x float> [[TMP0]] to <8 x i8>
-// CHECK-NEXT:    [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <2 x float>
-// CHECK-NEXT:    [[LANE:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> [[TMP2]], <4 x i32> zeroinitializer
-// CHECK-NEXT:    store <4 x float> [[LANE]], ptr [[__REINT1_126]], align 16
-// CHECK-NEXT:    [[TMP3:%.*]] = load <8 x bfloat>, ptr [[__REINT1_126]], align 16
-// CHECK-NEXT:    [[TMP4:%.*]] = bitcast <4 x float> [[R:%.*]] to <16 x i8>
-// CHECK-NEXT:    [[TMP5:%.*]] = bitcast <8 x bfloat> [[A:%.*]] to <16 x i8>
-// CHECK-NEXT:    [[TMP6:%.*]] = bitcast <8 x bfloat> [[TMP3]] to <16 x i8>
-// CHECK-NEXT:    [[VBFDOT3_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfdot.v4f32.v8bf16(<4 x float> [[R]], <8 x bfloat> [[A]], <8 x bfloat> [[TMP3]])
+// CHECK-NEXT:    [[DOTCAST:%.*]] = bitcast <4 x bfloat> [[B:%.*]] to <2 x float>
+// CHECK-NEXT:    [[LANE:%.*]] = shufflevector <2 x float> [[DOTCAST]], <2 x float> poison, <4 x i32> zeroinitializer
+// CHECK-NEXT:    [[DOTCAST1:%.*]] = bitcast <4 x float> [[LANE]] to <8 x bfloat>
+// CHECK-NEXT:    [[VBFDOT3_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfdot.v4f32.v8bf16(<4 x float> [[R:%.*]], <8 x bfloat> [[A:%.*]], <8 x bfloat> [[DOTCAST1]])
 // CHECK-NEXT:    ret <4 x float> [[VBFDOT3_I]]
 //
 float32x4_t test_vbfdotq_lane_f32(float32x4_t r, bfloat16x8_t a, bfloat16x4_t b) {
@@ -116,11 +74,7 @@ float32x4_t test_vbfdotq_lane_f32(float32x4_t r, bfloat16x8_t a, bfloat16x4_t b)
 
 // CHECK-LABEL: @test_vbfmmlaq_f32(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = bitcast <4 x float> [[R:%.*]] to <16 x i8>
-// CHECK-NEXT:    [[TMP1:%.*]] = bitcast <8 x bfloat> [[A:%.*]] to <16 x i8>
-// CHECK-NEXT:    [[TMP2:%.*]] = bitcast <8 x bfloat> [[B:%.*]] to <16 x i8>
-// CHECK-NEXT:    [[VBFMMLAQ_F323_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfmmla(<4 x float> [[R]], <8 x bfloat> [[A]], <8 x bfloat> [[B]])
-// CHECK-NEXT:    [[VBFMMLAQ_F324_I:%.*]] = bitcast <4 x float> [[VBFMMLAQ_F323_I]] to <16 x i8>
+// CHECK-NEXT:    [[VBFMMLAQ_F323_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfmmla(<4 x float> [[R:%.*]], <8 x bfloat> [[A:%.*]], <8 x bfloat> [[B:%.*]])
 // CHECK-NEXT:    ret <4 x float> [[VBFMMLAQ_F323_I]]
 //
 float32x4_t test_vbfmmlaq_f32(float32x4_t r, bfloat16x8_t a, bfloat16x8_t b) {
@@ -129,11 +83,7 @@ float32x4_t test_vbfmmlaq_f32(float32x4_t r, bfloat16x8_t a, bfloat16x8_t b) {
 
 // CHECK-LABEL: @test_vbfmlalbq_f32(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = bitcast <4 x float> [[R:%.*]] to <16 x i8>
-// CHECK-NEXT:    [[TMP1:%.*]] = bitcast <8 x bfloat> [[A:%.*]] to <16 x i8>
-// CHECK-NEXT:    [[TMP2:%.*]] = bitcast <8 x bfloat> [[B:%.*]] to <16 x i8>
-// CHECK-NEXT:    [[VBFMLALBQ_F323_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfmlalb(<4 x float> [[R]], <8 x bfloat> [[A]], <8 x bfloat> [[B]])
-// CHECK-NEXT:    [[VBFMLALBQ_F324_I:%.*]] = bitcast <4 x float> [[VBFMLALBQ_F323_I]] to <16 x i8>
+// CHECK-NEXT:    [[VBFMLALBQ_F323_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfmlalb(<4 x float> [[R:%.*]], <8 x bfloat> [[A:%.*]], <8 x bfloat> [[B:%.*]])
 // CHECK-NEXT:    ret <4 x float> [[VBFMLALBQ_F323_I]]
 //
 float32x4_t test_vbfmlalbq_f32(float32x4_t r, bfloat16x8_t a, bfloat16x8_t b) {
@@ -142,11 +92,7 @@ float32x4_t test_vbfmlalbq_f32(float32x4_t r, bfloat16x8_t a, bfloat16x8_t b) {
 
 // CHECK-LABEL: @test_vbfmlaltq_f32(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = bitcast <4 x float> [[R:%.*]] to <16 x i8>
-// CHECK-NEXT:    [[TMP1:%.*]] = bitcast <8 x bfloat> [[A:%.*]] to <16 x i8>
-// CHECK-NEXT:    [[TMP2:%.*]] = bitcast <8 x bfloat> [[B:%.*]] to <16 x i8>
-// CHECK-NEXT:    [[VBFMLALTQ_F323_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfmlalt(<4 x float> [[R]], <8 x bfloat> [[A]], <8 x bfloat> [[B]])
-// CHECK-NEXT:    [[VBFMLALTQ_F324_I:%.*]] = bitcast <4 x float> [[VBFMLALTQ_F323_I]] to <16 x i8>
+// CHECK-NEXT:    [[VBFMLALTQ_F323_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfmlalt(<4 x float> [[R:%.*]], <8 x bfloat> [[A:%.*]], <8 x bfloat> [[B:%.*]])
 // CHECK-NEXT:    ret <4 x float> [[VBFMLALTQ_F323_I]]
 //
 float32x4_t test_vbfmlaltq_f32(float32x4_t r, bfloat16x8_t a, bfloat16x8_t b) {
@@ -155,27 +101,8 @@ float32x4_t test_vbfmlaltq_f32(float32x4_t r, bfloat16x8_t a, bfloat16x8_t b) {
 
 // CHECK-LABEL: @test_vbfmlalbq_lane_f32(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[VGET_LANE:%.*]] = extractelement <4 x bfloat> [[B:%.*]], i32 0
-// CHECK-NEXT:    [[VECINIT:%.*]] = insertelement <8 x bfloat> poison, bfloat [[VGET_LANE]], i32 0
-// CHECK-NEXT:    [[VGET_LANE3:%.*]] = extractelement <4 x bfloat> [[B]], i32 0
-// CHECK-NEXT:    [[VECINIT5:%.*]] = insertelement <8 x bfloat> [[VECINIT]], bfloat [[VGET_LANE3]], i32 1
-// CHECK-NEXT:    [[VGET_LANE8:%.*]] = extractelement <4 x bfloat> [[B]], i32 0
-// CHECK-NEXT:    [[VECINIT10:%.*]] = insertelement <8 x bfloat> [[VECINIT5]], bfloat [[VGET_LANE8]], i32 2
-// CHECK-NEXT:    [[VGET_LANE13:%.*]] = extractelement <4 x bfloat> [[B]], i32 0
-// CHECK-NEXT:    [[VECINIT15:%.*]] = insertelement <8 x bfloat> [[VECINIT10]], bfloat [[VGET_LANE13]], i32 3
-// CHECK-NEXT:    [[VGET_LANE18:%.*]] = extractelement <4 x bfloat> [[B]], i32 0
-// CHECK-NEXT:    [[VECINIT20:%.*]] = insertelement <8 x bfloat> [[VECINIT15]], bfloat [[VGET_LANE18]], i32 4
-// CHECK-NEXT:    [[VGET_LANE23:%.*]] = extractelement <4 x bfloat> [[B]], i32 0
-// CHECK-NEXT:    [[VECINIT25:%.*]] = insertelement <8 x bfloat> [[VECINIT20]], bfloat [[VGET_LANE23]], i32 5
-// CHECK-NEXT:    [[VGET_LANE28:%.*]] = extractelement <4 x bfloat> [[B]], i32 0
-// CHECK-NEXT:    [[VECINIT30:%.*]] = insertelement <8 x bfloat> [[VECINIT25]], bfloat [[VGET_LANE28]], i32 6
-// CHECK-NEXT:    [[VGET_LANE33:%.*]] = extractelement <4 x bfloat> [[B]], i32 0
-// CHECK-NEXT:    [[VECINIT35:%.*]] = insertelement <8 x bfloat> [[VECINIT30]], bfloat [[VGET_LANE33]], i32 7
-// CHECK-NEXT:    [[TMP0:%.*]] = bitcast <4 x float> [[R:%.*]] to <16 x i8>
-// CHECK-NEXT:    [[TMP1:%.*]] = bitcast <8 x bfloat> [[A:%.*]] to <16 x i8>
-// CHECK-NEXT:    [[TMP2:%.*]] = bitcast <8 x bfloat> [[VECINIT35]] to <16 x i8>
-// CHECK-NEXT:    [[VBFMLALBQ_F323_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfmlalb(<4 x float> [[R]], <8 x bfloat> [[A]], <8 x bfloat> [[VECINIT35]])
-// CHECK-NEXT:    [[VBFMLALBQ_F324_I:%.*]] = bitcast <4 x float> [[VBFMLALBQ_F323_I]] to <16 x i8>
+// CHECK-NEXT:    [[VECINIT35:%.*]] = shufflevector <4 x bfloat> [[B:%.*]], <4 x bfloat> poison, <8 x i32> zeroinitializer
+// CHECK-NEXT:    [[VBFMLALBQ_F323_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfmlalb(<4 x float> [[R:%.*]], <8 x bfloat> [[A:%.*]], <8 x bfloat> [[VECINIT35]])
 // CHECK-NEXT:    ret <4 x float> [[VBFMLALBQ_F323_I]]
 //
 float32x4_t test_vbfmlalbq_lane_f32(float32x4_t r, bfloat16x8_t a, bfloat16x4_t b) {
@@ -184,27 +111,8 @@ float32x4_t test_vbfmlalbq_lane_f32(float32x4_t r, bfloat16x8_t a, bfloat16x4_t
 
 // CHECK-LABEL: @test_vbfmlalbq_laneq_f32(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[VGETQ_LANE:%.*]] = extractelement <8 x bfloat> [[B:%.*]], i32 3
-// CHECK-NEXT:    [[VECINIT:%.*]] = insertelement <8 x bfloat> poison, bfloat [[VGETQ_LANE]], i32 0
-// CHECK-NEXT:    [[VGETQ_LANE3:%.*]] = extractelement <8 x bfloat> [[B]], i32 3
-// CHECK-NEXT:    [[VECINIT5:%.*]] = insertelement <8 x bfloat> [[VECINIT]], bfloat [[VGETQ_LANE3]], i32 1
-// CHECK-NEXT:    [[VGETQ_LANE8:%.*]] = extractelement <8 x bfloat> [[B]], i32 3
-// CHECK-NEXT:    [[VECINIT10:%.*]] = insertelement <8 x bfloat> [[VECINIT5]], bfloat [[VGETQ_LANE8]], i32 2
-// CHECK-NEXT:    [[VGETQ_LANE13:%.*]] = extractelement <8 x bfloat> [[B]], i32 3
-// CHECK-NEXT:    [[VECINIT15:%.*]] = insertelement <8 x bfloat> [[VECINIT10]], bfloat [[VGETQ_LANE13]], i32 3
-// CHECK-NEXT:    [[VGETQ_LANE18:%.*]] = extractelement <8 x bfloat> [[B]], i32 3
-// CHECK-NEXT:    [[VECINIT20:%.*]] = insertelement <8 x bfloat> [[VECINIT15]], bfloat [[VGETQ_LANE18]], i32 4
-// CHECK-NEXT:    [[VGETQ_LANE23:%.*]] = extractelement <8 x bfloat> [[B]], i32 3
-// CHECK-NEXT:    [[VECINIT25:%.*]] = insertelement <8 x bfloat> [[VECINIT20]], bfloat [[VGETQ_LANE23]], i32 5
-// CHECK-NEXT:    [[VGETQ_LANE28:%.*]] = extractelement <8 x bfloat> [[B]], i32 3
-// CHECK-NEXT:    [[VECINIT30:%.*]] = insertelement <8 x bfloat> [[VECINIT25]], bfloat [[VGETQ_LANE28]], i32 6
-// CHECK-NEXT:    [[VGETQ_LANE33:%.*]] = extractelement <8 x bfloat> [[B]], i32 3
-// CHECK-NEXT:    [[VECINIT35:%.*]] = insertelement <8 x bfloat> [[VECINIT30]], bfloat [[VGETQ_LANE33]], i32 7
-// CHECK-NEXT:    [[TMP0:%.*]] = bitcast <4 x float> [[R:%.*]] to <16 x i8>
-// CHECK-NEXT:    [[TMP1:%.*]] = bitcast <8 x bfloat> [[A:%.*]] to <16 x i8>
-// CHECK-NEXT:    [[TMP2:%.*]] = bitcast <8 x bfloat> [[VECINIT35]] to <16 x i8>
-// CHECK-NEXT:    [[VBFMLALBQ_F323_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfmlalb(<4 x float> [[R]], <8 x bfloat> [[A]], <8 x bfloat> [[VECINIT35]])
-// CHECK-NEXT:    [[VBFMLALBQ_F324_I:%.*]] = bitcast <4 x float> [[VBFMLALBQ_F323_I]] to <16 x i8>
+// CHECK-NEXT:    [[VECINIT35:%.*]] = shufflevector <8 x bfloat> [[B:%.*]], <8 x bfloat> poison, <8 x i32> <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>
+// CHECK-NEXT:    [[VBFMLALBQ_F323_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfmlalb(<4 x float> [[R:%.*]], <8 x bfloat> [[A:%.*]], <8 x bfloat> [[VECINIT35]])
 // CHECK-NEXT:    ret <4 x float> [[VBFMLALBQ_F323_I]]
 //
 float32x4_t test_vbfmlalbq_laneq_f32(float32x4_t r, bfloat16x8_t a, bfloat16x8_t b) {
@@ -213,27 +121,8 @@ float32x4_t test_vbfmlalbq_laneq_f32(float32x4_t r, bfloat16x8_t a, bfloat16x8_t
 
 // CHECK-LABEL: @test_vbfmlaltq_lane_f32(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[VGET_LANE:%.*]] = extractelement <4 x bfloat> [[B:%.*]], i32 0
-// CHECK-NEXT:    [[VECINIT:%.*]] = insertelement <8 x bfloat> poison, bfloat [[VGET_LANE]], i32 0
-// CHECK-NEXT:    [[VGET_LANE3:%.*]] = extractelement <4 x bfloat> [[B]], i32 0
-// CHECK-NEXT:    [[VECINIT5:%.*]] = insertelement <8 x bfloat> [[VECINIT]], bfloat [[VGET_LANE3]], i32 1
-// CHECK-NEXT:    [[VGET_LANE8:%.*]] = extractelement <4 x bfloat> [[B]], i32 0
-// CHECK-NEXT:    [[VECINIT10:%.*]] = insertelement <8 x bfloat> [[VECINIT5]], bfloat [[VGET_LANE8]], i32 2
-// CHECK-NEXT:    [[VGET_LANE13:%.*]] = extractelement <4 x bfloat> [[B]], i32 0
-// CHECK-NEXT:    [[VECINIT15:%.*]] = insertelement <8 x bfloat> [[VECINIT10]], bfloat [[VGET_LANE13]], i32 3
-// CHECK-NEXT:    [[VGET_LANE18:%.*]] = extractelement <4 x bfloat> [[B]], i32 0
-// CHECK-NEXT:    [[VECINIT20:%.*]] = insertelement <8 x bfloat> [[VECINIT15]], bfloat [[VGET_LANE18]], i32 4
-// CHECK-NEXT:    [[VGET_LANE23:%.*]] = extractelement <4 x bfloat> [[B]], i32 0
-// CHECK-NEXT:    [[VECINIT25:%.*]] = insertelement <8 x bfloat> [[VECINIT20]], bfloat [[VGET_LANE23]], i32 5
-// CHECK-NEXT:    [[VGET_LANE28:%.*]] = extractelement <4 x bfloat> [[B]], i32 0
-// CHECK-NEXT:    [[VECINIT30:%.*]] = insertelement <8 x bfloat> [[VECINIT25]], bfloat [[VGET_LANE28]], i32 6
-// CHECK-NEXT:    [[VGET_LANE33:%.*]] = extractelement <4 x bfloat> [[B]], i32 0
-// CHECK-NEXT:    [[VECINIT35:%.*]] = insertelement <8 x bfloat> [[VECINIT30]], bfloat [[VGET_LANE33]], i32 7
-// CHECK-NEXT:    [[TMP0:%.*]] = bitcast <4 x float> [[R:%.*]] to <16 x i8>
-// CHECK-NEXT:    [[TMP1:%.*]] = bitcast <8 x bfloat> [[A:%.*]] to <16 x i8>
-// CHECK-NEXT:    [[TMP2:%.*]] = bitcast <8 x bfloat> [[VECINIT35]] to <16 x i8>
-// CHECK-NEXT:    [[VBFMLALTQ_F323_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfmlalt(<4 x float> [[R]], <8 x bfloat> [[A]], <8 x bfloat> [[VECINIT35]])
-// CHECK-NEXT:    [[VBFMLALTQ_F324_I:%.*]] = bitcast <4 x float> [[VBFMLALTQ_F323_I]] to <16 x i8>
+// CHECK-NEXT:    [[VECINIT35:%.*]] = shufflevector <4 x bfloat> [[B:%.*]], <4 x bfloat> poison, <8 x i32> zeroinitializer
+// CHECK-NEXT:    [[VBFMLALTQ_F323_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfmlalt(<4 x float> [[R:%.*]], <8 x bfloat> [[A:%.*]], <8 x bfloat> [[VECINIT35]])
 // CHECK-NEXT:   ...
[truncated]

Copy link

github-actions bot commented Jan 6, 2025

⚠️ Python code formatter, darker found issues in your code. ⚠️

You can test this locally with the following command:
darker --check --diff -r 7cb6e6bced8ca5767c3e609f4826982638fd9543...67ba481f809abdc85a90fed5ff11ca48c6578e3e llvm/utils/UpdateTestChecks/common.py llvm/utils/update_cc_test_checks.py
View the diff from darker here.
--- UpdateTestChecks/common.py	2025-01-06 15:08:27.000000 +0000
+++ UpdateTestChecks/common.py	2025-01-06 17:04:31.173625 +0000
@@ -556,12 +556,11 @@
 UTC_ADVERT = "NOTE: Assertions have been autogenerated by "
 UTC_AVOID = "NOTE: Do not autogenerate"
 UNUSED_NOTE = "NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:"
 
 DATA_LAYOUT_RE = re.compile(
-    r"target\sdatalayout\s=\s\"(?P<layout>.+)\"$",
-    flags=(re.M | re.S)
+    r"target\sdatalayout\s=\s\"(?P<layout>.+)\"$", flags=(re.M | re.S)
 )
 
 OPT_FUNCTION_RE = re.compile(
     r"^(\s*;\s*Function\sAttrs:\s(?P<attrs>[\w\s():,]+?))?\s*define\s+(?P<funcdef_attrs_and_ret>[^@]*)@(?P<func>[\w.$-]+?)\s*"
     r"(?P<args_and_sig>\((\)|(.*?[\w.-]+?)\))[^{]*\{)\n(?P<body>.*?)^\}$",
@@ -653,20 +652,22 @@
         if march.startswith(prefix):
             return triple
     print("Cannot find a triple. Assume 'x86'", file=sys.stderr)
     return "x86"
 
+
 def get_global_underscores(raw_tool_output):
     m = DATA_LAYOUT_RE.search(raw_tool_output)
     if not m:
         return False
     data_layout = m.group("layout")
     idx = data_layout.find("m:")
     if idx < 0:
         return False
     ch = data_layout[idx + 2]
-    return ch == 'o' or ch == 'x'
+    return ch == "o" or ch == "x"
+
 
 def apply_filters(line, filters):
     has_filter = False
     for f in filters:
         if not f.is_filter_out:
--- update_cc_test_checks.py	2025-01-06 15:08:27.000000 +0000
+++ update_cc_test_checks.py	2025-01-06 17:04:31.395847 +0000
@@ -123,11 +123,11 @@
             search = spell
         mangled = node.get("mangledName", spell)
         # Strip leading underscore from globals, so the name matches the LLVM one
         if global_underscores:
             storage = node.get("storageClass", None)
-            if storage != "static" and mangled[0] == '_':
+            if storage != "static" and mangled[0] == "_":
                 mangled = mangled[1:]
         ret[int(line) - 1].append((spell, mangled, search))
 
     ast = json.loads(stdout)
     if ast["kind"] != "TranslationUnitDecl":
@@ -252,11 +252,13 @@
         args.opt = None
 
     return args, parser
 
 
-def get_function_body(builder, args, filename, clang_args, extra_commands, prefixes, raw_tool_output):
+def get_function_body(
+    builder, args, filename, clang_args, extra_commands, prefixes, raw_tool_output
+):
     # TODO Clean up duplication of asm/common build_function_body_dictionary
     for extra_command in extra_commands:
         extra_args = shlex.split(extra_command)
         with tempfile.NamedTemporaryFile() as f:
             f.write(raw_tool_output.encode())
@@ -387,16 +389,24 @@
             common.debug("Extracted FileCheck prefixes: {}".format(prefixes))
 
             # Invoke external tool and extract function bodies.
             raw_tool_output = common.invoke_tool(ti.args.clang, clang_args, ti.path)
             get_function_body(
-                builder, ti.args, ti.path, clang_args, extra_commands, prefixes, raw_tool_output
+                builder,
+                ti.args,
+                ti.path,
+                clang_args,
+                extra_commands,
+                prefixes,
+                raw_tool_output,
             )
 
             # Invoke clang -Xclang -ast-dump=json to get mapping from start lines to
             # mangled names. Forward all clang args for now.
-            for k, v in get_line2func_list(ti.args, clang_args, common.get_global_underscores(raw_tool_output)).items():
+            for k, v in get_line2func_list(
+                ti.args, clang_args, common.get_global_underscores(raw_tool_output)
+            ).items():
                 line2func_list[k].extend(v)
 
         func_dict = builder.finish_and_get_func_dict()
         global_vars_seen_dict = {}
         prefix_set = set([prefix for p in filecheck_run_list for prefix in p[0]])

Copy link
Contributor

@jthackray jthackray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@davemgreen
Copy link
Collaborator

This patch adds instcombine to some tests that were passing

This patch sounds good, but the intent of the clang tests is that they should not be dependant on the mid-end optimizations in llvm. At least for fast-changing parts like instcombine, where you don't want to have to update a thousand clang tests every time instcombine learns a new trick. The clang tests should ideally be testing the clang output.

Can the instcombines be replaced with something simpler like dce or maybe instsimplify? It might be OK with just mem2reg.

@momchil-velikov
Copy link
Collaborator Author

This patch adds instcombine to some tests that were passing
Can the instcombines be replaced with something simpler like dce or maybe instsimplify? It might be OK with just mem2reg.

Unfortunately just mem2reg does not cut it. The original issue it that Clang generates very different code for C-style
cast and __builtin_bit_cast, e.g. for (https://gcc.godbolt.org/z/7rKqhTY5W)

typedef __attribute__((neon_vector_type(4))) unsigned uint32x4_t;
typedef __attribute__((neon_vector_type(4))) float float32x4_t;

uint32x4_t f(float32x4_t v) {
  return __builtin_bit_cast(uint32x4_t ,v);
}

uint32x4_t g(float32x4_t v) {
  return (uint32x4_t) v;
}

and the differences persist after mem2reg :

define dso_local <4 x i32> @f(<4 x float> noundef %v) #0 {
entry:
  %v.addr = alloca <4 x float>, align 16
  store <4 x float> %v, ptr %v.addr, align 16
  %0 = load <4 x i32>, ptr %v.addr, align 16
  ret <4 x i32> %0
}

define dso_local <4 x i32> @g(<4 x float> noundef %v) #0 {
entry:
  %0 = bitcast <4 x float> %v to <4 x i32>
  ret <4 x i32> %0
}

which makes the test updates after #121802 very hard to inspect.
I can try with sroa instead of instcombine (works on this example above at least)

@davemgreen
Copy link
Collaborator

sroa would be ideal if it works, I know a number of test cases use it and it shouldn't update too often.

@momchil-velikov
Copy link
Collaborator Author

Another things that "works" is changing the codegen for __builtin_bit_cast (

case CK_LValueToRValueBitCast: {
).
It stores a value with one type and loads it back with another type which is not handled by mem2reg (
if (LI->isVolatile() || LI->getType() != AI->getAllocatedType())
)
Instead it could perform a load with the original type and then emit an LLVM IR bitcast, just like it does for "C-style" bitcast:
llvm::Value *Result = Builder.CreateBitCast(Src, DstTy);

@momchil-velikov
Copy link
Collaborator Author

momchil-velikov commented Jan 8, 2025

Instead it could perform a load with the original type and then emit an LLVM IR bitcast, just like it does for "C-style"

Except that it does not work when aggregate types are involved.
Using mem2reg,sroa is a bit better than just mem2reg, even though sroa generates some ridiculous extra bitcasts.

@momchil-velikov
Copy link
Collaborator Author

Abandon for now, the follow-up patch (#121802) was reduced in scope and does not need to update tests.

@momchil-velikov momchil-velikov deleted the neon-bitcast-regen-tests branch January 29, 2025 10:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 clang Clang issues not falling into any other category testing-tools
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants