[Arm] Regenerate tests (NFC) #121801

momchil-velikov · 2025-01-06T17:01:18Z

This patch adds instcombine to some tests that were passing
Clang's output to opt -S --passes=mem2reg, and converts some
other tests to using update_cc_test_checks.py.

Assembly part of some tests was also split out and moved to LLVM.

This makes it easier to compare test changes in upcoming patches. (#121802)

For some ABIs `update_cc_test_checks.py` is unable to generate tests because of the mismatch between the mangled function names reported by clang's `-asd-dump` and the function names in LLVM IR. This patch fixes it by striping the leading underscore from the mangled name for global functions if the data layout string says the have one.

This patch adds `instcombine` to some tests that were passing Clang's output to `opt -S --passes=mem2reg`, and converts some other tests to using `update_cc_test_checks.py`. Assembly part of some tests was also split out and moved to LLVM. This makes it easier to compare test changes in upcoming patches.

llvmbot · 2025-01-06T17:01:56Z

@llvm/pr-subscribers-backend-aarch64
@llvm/pr-subscribers-testing-tools

@llvm/pr-subscribers-clang

Author: Momchil Velikov (momchil-velikov)

Changes

This patch adds instcombine to some tests that were passing
Clang's output to opt -S --passes=mem2reg, and converts some
other tests to using update_cc_test_checks.py.

Assembly part of some tests was also split out and moved to LLVM.

This makes it easier to compare test changes in upcoming patches.

Patch is 6.30 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/121801.diff

48 Files Affected:

(modified) clang/test/CodeGen/AArch64/bf16-dotprod-intrinsics.c (+30-160)
(modified) clang/test/CodeGen/AArch64/bf16-getset-intrinsics.c (+17-33)
(modified) clang/test/CodeGen/AArch64/bf16-reinterpret-intrinsics.c (+218-163)
(modified) clang/test/CodeGen/AArch64/neon-2velem.c (+775-2178)
(modified) clang/test/CodeGen/AArch64/neon-extract.c (+143-145)
(modified) clang/test/CodeGen/AArch64/neon-fma.c (+33-75)
(modified) clang/test/CodeGen/AArch64/neon-fp16fml.c (+41-865)
(modified) clang/test/CodeGen/AArch64/neon-intrinsics-constrained.c (+1173-453)
(modified) clang/test/CodeGen/AArch64/neon-intrinsics.c (+15106-10053)
(modified) clang/test/CodeGen/AArch64/neon-ldst-one-rcpc3.c (+33-65)
(modified) clang/test/CodeGen/AArch64/neon-ldst-one.c (+6458-4665)
(modified) clang/test/CodeGen/AArch64/neon-misc-constrained.c (+51-33)
(modified) clang/test/CodeGen/AArch64/neon-misc.c (+2094-1396)
(modified) clang/test/CodeGen/AArch64/neon-perm.c (+1298-1207)
(modified) clang/test/CodeGen/AArch64/neon-scalar-x-indexed-elem-constrained.c (+133-90)
(modified) clang/test/CodeGen/AArch64/neon-scalar-x-indexed-elem.c (+338-252)
(modified) clang/test/CodeGen/AArch64/poly-add.c (+11-26)
(modified) clang/test/CodeGen/AArch64/poly128.c (+32-34)
(modified) clang/test/CodeGen/AArch64/poly64.c (+518-338)
(modified) clang/test/CodeGen/AArch64/v8.1a-neon-intrinsics.c (+33-53)
(modified) clang/test/CodeGen/AArch64/v8.2a-neon-intrinsics-constrained.c (+333-233)
(modified) clang/test/CodeGen/AArch64/v8.2a-neon-intrinsics-generic.c (+58-150)
(modified) clang/test/CodeGen/AArch64/v8.2a-neon-intrinsics.c (+111-426)
(modified) clang/test/CodeGen/AArch64/v8.5a-neon-frint3264-intrinsic.c (+98-49)
(modified) clang/test/CodeGen/AArch64/v8.6a-neon-intrinsics.c (+104-88)
(modified) clang/test/CodeGen/arm-bf16-convert-intrinsics.c (+84-306)
(modified) clang/test/CodeGen/arm-bf16-dotprod-intrinsics.c (+31-161)
(modified) clang/test/CodeGen/arm-bf16-getset-intrinsics.c (+18-34)
(modified) clang/test/CodeGen/arm-neon-directed-rounding-constrained.c (+53-39)
(modified) clang/test/CodeGen/arm-neon-directed-rounding.c (+171-62)
(modified) clang/test/CodeGen/arm-neon-fma.c (+13-27)
(modified) clang/test/CodeGen/arm-neon-numeric-maxmin.c (+3-15)
(modified) clang/test/CodeGen/arm-neon-vcvtX.c (+9-25)
(modified) clang/test/CodeGen/arm-poly-add.c (+30-35)
(modified) clang/test/CodeGen/arm-v8.1a-neon-intrinsics.c (+82-114)
(modified) clang/test/CodeGen/arm-v8.2a-neon-intrinsics-generic.c (+119-277)
(modified) clang/test/CodeGen/arm-v8.2a-neon-intrinsics.c (+690-371)
(modified) clang/test/CodeGen/arm-v8.6a-neon-intrinsics.c (+62-48)
(modified) clang/test/CodeGen/arm64_vdupq_n_f64.c (+44-38)
(modified) clang/test/CodeGen/arm_neon_intrinsics.c (+15482-12225)
(added) llvm/test/CodeGen/AArch64/neon-misc-constrained.ll (+46)
(added) llvm/test/CodeGen/AArch64/neon-misc-unconstrained.ll (+45)
(added) llvm/test/CodeGen/AArch64/neon-scalar-x-indexed-elem-constrained.ll (+103)
(added) llvm/test/CodeGen/AArch64/neon-scalar-x-indexed-elem-unconstrained.ll (+103)
(added) llvm/test/CodeGen/AArch64/v8.2a-neon-intrinsics-constrained.ll (+276)
(added) llvm/test/CodeGen/AArch64/v8.2a-neon-intrinsics-unconstrained.ll (+265)
(modified) llvm/utils/UpdateTestChecks/common.py (+15)
(modified) llvm/utils/update_cc_test_checks.py (+11-6)

diff --git a/clang/test/CodeGen/AArch64/bf16-dotprod-intrinsics.c b/clang/test/CodeGen/AArch64/bf16-dotprod-intrinsics.c
index 877d83c0fa3954..caa803ee794603 100644
--- a/clang/test/CodeGen/AArch64/bf16-dotprod-intrinsics.c
+++ b/clang/test/CodeGen/AArch64/bf16-dotprod-intrinsics.c
@@ -1,6 +1,6 @@
 // NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
 // RUN: %clang_cc1 -triple aarch64 -target-feature +neon -target-feature +bf16 \
-// RUN: -disable-O0-optnone -emit-llvm %s -o - | opt -S -passes=mem2reg | FileCheck %s
+// RUN: -disable-O0-optnone -emit-llvm %s -o - | opt -S -passes=mem2reg,instcombine | FileCheck %s
 
 // REQUIRES: aarch64-registered-target || arm-registered-target
 
@@ -8,10 +8,7 @@
 
 // CHECK-LABEL: @test_vbfdot_f32(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = bitcast <2 x float> [[R:%.*]] to <8 x i8>
-// CHECK-NEXT:    [[TMP1:%.*]] = bitcast <4 x bfloat> [[A:%.*]] to <8 x i8>
-// CHECK-NEXT:    [[TMP2:%.*]] = bitcast <4 x bfloat> [[B:%.*]] to <8 x i8>
-// CHECK-NEXT:    [[VBFDOT3_I:%.*]] = call <2 x float> @llvm.aarch64.neon.bfdot.v2f32.v4bf16(<2 x float> [[R]], <4 x bfloat> [[A]], <4 x bfloat> [[B]])
+// CHECK-NEXT:    [[VBFDOT3_I:%.*]] = call <2 x float> @llvm.aarch64.neon.bfdot.v2f32.v4bf16(<2 x float> [[R:%.*]], <4 x bfloat> [[A:%.*]], <4 x bfloat> [[B:%.*]])
 // CHECK-NEXT:    ret <2 x float> [[VBFDOT3_I]]
 //
 float32x2_t test_vbfdot_f32(float32x2_t r, bfloat16x4_t a, bfloat16x4_t b) {
@@ -20,10 +17,7 @@ float32x2_t test_vbfdot_f32(float32x2_t r, bfloat16x4_t a, bfloat16x4_t b) {
 
 // CHECK-LABEL: @test_vbfdotq_f32(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = bitcast <4 x float> [[R:%.*]] to <16 x i8>
-// CHECK-NEXT:    [[TMP1:%.*]] = bitcast <8 x bfloat> [[A:%.*]] to <16 x i8>
-// CHECK-NEXT:    [[TMP2:%.*]] = bitcast <8 x bfloat> [[B:%.*]] to <16 x i8>
-// CHECK-NEXT:    [[VBFDOT3_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfdot.v4f32.v8bf16(<4 x float> [[R]], <8 x bfloat> [[A]], <8 x bfloat> [[B]])
+// CHECK-NEXT:    [[VBFDOT3_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfdot.v4f32.v8bf16(<4 x float> [[R:%.*]], <8 x bfloat> [[A:%.*]], <8 x bfloat> [[B:%.*]])
 // CHECK-NEXT:    ret <4 x float> [[VBFDOT3_I]]
 //
 float32x4_t test_vbfdotq_f32(float32x4_t r, bfloat16x8_t a, bfloat16x8_t b){
@@ -32,19 +26,10 @@ float32x4_t test_vbfdotq_f32(float32x4_t r, bfloat16x8_t a, bfloat16x8_t b){
 
 // CHECK-LABEL: @test_vbfdot_lane_f32(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[__REINT_128:%.*]] = alloca <4 x bfloat>, align 8
-// CHECK-NEXT:    [[__REINT1_128:%.*]] = alloca <2 x float>, align 8
-// CHECK-NEXT:    store <4 x bfloat> [[B:%.*]], ptr [[__REINT_128]], align 8
-// CHECK-NEXT:    [[TMP0:%.*]] = load <2 x float>, ptr [[__REINT_128]], align 8
-// CHECK-NEXT:    [[TMP1:%.*]] = bitcast <2 x float> [[TMP0]] to <8 x i8>
-// CHECK-NEXT:    [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <2 x float>
-// CHECK-NEXT:    [[LANE:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> [[TMP2]], <2 x i32> zeroinitializer
-// CHECK-NEXT:    store <2 x float> [[LANE]], ptr [[__REINT1_128]], align 8
-// CHECK-NEXT:    [[TMP3:%.*]] = load <4 x bfloat>, ptr [[__REINT1_128]], align 8
-// CHECK-NEXT:    [[TMP4:%.*]] = bitcast <2 x float> [[R:%.*]] to <8 x i8>
-// CHECK-NEXT:    [[TMP5:%.*]] = bitcast <4 x bfloat> [[A:%.*]] to <8 x i8>
-// CHECK-NEXT:    [[TMP6:%.*]] = bitcast <4 x bfloat> [[TMP3]] to <8 x i8>
-// CHECK-NEXT:    [[VBFDOT3_I:%.*]] = call <2 x float> @llvm.aarch64.neon.bfdot.v2f32.v4bf16(<2 x float> [[R]], <4 x bfloat> [[A]], <4 x bfloat> [[TMP3]])
+// CHECK-NEXT:    [[DOTCAST:%.*]] = bitcast <4 x bfloat> [[B:%.*]] to <2 x float>
+// CHECK-NEXT:    [[LANE:%.*]] = shufflevector <2 x float> [[DOTCAST]], <2 x float> poison, <2 x i32> zeroinitializer
+// CHECK-NEXT:    [[DOTCAST1:%.*]] = bitcast <2 x float> [[LANE]] to <4 x bfloat>
+// CHECK-NEXT:    [[VBFDOT3_I:%.*]] = call <2 x float> @llvm.aarch64.neon.bfdot.v2f32.v4bf16(<2 x float> [[R:%.*]], <4 x bfloat> [[A:%.*]], <4 x bfloat> [[DOTCAST1]])
 // CHECK-NEXT:    ret <2 x float> [[VBFDOT3_I]]
 //
 float32x2_t test_vbfdot_lane_f32(float32x2_t r, bfloat16x4_t a, bfloat16x4_t b){
@@ -53,19 +38,10 @@ float32x2_t test_vbfdot_lane_f32(float32x2_t r, bfloat16x4_t a, bfloat16x4_t b){
 
 // CHECK-LABEL: @test_vbfdotq_laneq_f32(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[__REINT_130:%.*]] = alloca <8 x bfloat>, align 16
-// CHECK-NEXT:    [[__REINT1_130:%.*]] = alloca <4 x float>, align 16
-// CHECK-NEXT:    store <8 x bfloat> [[B:%.*]], ptr [[__REINT_130]], align 16
-// CHECK-NEXT:    [[TMP0:%.*]] = load <4 x float>, ptr [[__REINT_130]], align 16
-// CHECK-NEXT:    [[TMP1:%.*]] = bitcast <4 x float> [[TMP0]] to <16 x i8>
-// CHECK-NEXT:    [[TMP2:%.*]] = bitcast <16 x i8> [[TMP1]] to <4 x float>
-// CHECK-NEXT:    [[LANE:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> [[TMP2]], <4 x i32> <i32 3, i32 3, i32 3, i32 3>
-// CHECK-NEXT:    store <4 x float> [[LANE]], ptr [[__REINT1_130]], align 16
-// CHECK-NEXT:    [[TMP3:%.*]] = load <8 x bfloat>, ptr [[__REINT1_130]], align 16
-// CHECK-NEXT:    [[TMP4:%.*]] = bitcast <4 x float> [[R:%.*]] to <16 x i8>
-// CHECK-NEXT:    [[TMP5:%.*]] = bitcast <8 x bfloat> [[A:%.*]] to <16 x i8>
-// CHECK-NEXT:    [[TMP6:%.*]] = bitcast <8 x bfloat> [[TMP3]] to <16 x i8>
-// CHECK-NEXT:    [[VBFDOT3_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfdot.v4f32.v8bf16(<4 x float> [[R]], <8 x bfloat> [[A]], <8 x bfloat> [[TMP3]])
+// CHECK-NEXT:    [[DOTCAST:%.*]] = bitcast <8 x bfloat> [[B:%.*]] to <4 x float>
+// CHECK-NEXT:    [[LANE:%.*]] = shufflevector <4 x float> [[DOTCAST]], <4 x float> poison, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
+// CHECK-NEXT:    [[DOTCAST1:%.*]] = bitcast <4 x float> [[LANE]] to <8 x bfloat>
+// CHECK-NEXT:    [[VBFDOT3_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfdot.v4f32.v8bf16(<4 x float> [[R:%.*]], <8 x bfloat> [[A:%.*]], <8 x bfloat> [[DOTCAST1]])
 // CHECK-NEXT:    ret <4 x float> [[VBFDOT3_I]]
 //
 float32x4_t test_vbfdotq_laneq_f32(float32x4_t r, bfloat16x8_t a, bfloat16x8_t b) {
@@ -74,19 +50,10 @@ float32x4_t test_vbfdotq_laneq_f32(float32x4_t r, bfloat16x8_t a, bfloat16x8_t b
 
 // CHECK-LABEL: @test_vbfdot_laneq_f32(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[__REINT_132:%.*]] = alloca <8 x bfloat>, align 16
-// CHECK-NEXT:    [[__REINT1_132:%.*]] = alloca <2 x float>, align 8
-// CHECK-NEXT:    store <8 x bfloat> [[B:%.*]], ptr [[__REINT_132]], align 16
-// CHECK-NEXT:    [[TMP0:%.*]] = load <4 x float>, ptr [[__REINT_132]], align 16
-// CHECK-NEXT:    [[TMP1:%.*]] = bitcast <4 x float> [[TMP0]] to <16 x i8>
-// CHECK-NEXT:    [[TMP2:%.*]] = bitcast <16 x i8> [[TMP1]] to <4 x float>
-// CHECK-NEXT:    [[LANE:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> [[TMP2]], <2 x i32> <i32 3, i32 3>
-// CHECK-NEXT:    store <2 x float> [[LANE]], ptr [[__REINT1_132]], align 8
-// CHECK-NEXT:    [[TMP3:%.*]] = load <4 x bfloat>, ptr [[__REINT1_132]], align 8
-// CHECK-NEXT:    [[TMP4:%.*]] = bitcast <2 x float> [[R:%.*]] to <8 x i8>
-// CHECK-NEXT:    [[TMP5:%.*]] = bitcast <4 x bfloat> [[A:%.*]] to <8 x i8>
-// CHECK-NEXT:    [[TMP6:%.*]] = bitcast <4 x bfloat> [[TMP3]] to <8 x i8>
-// CHECK-NEXT:    [[VBFDOT3_I:%.*]] = call <2 x float> @llvm.aarch64.neon.bfdot.v2f32.v4bf16(<2 x float> [[R]], <4 x bfloat> [[A]], <4 x bfloat> [[TMP3]])
+// CHECK-NEXT:    [[DOTCAST:%.*]] = bitcast <8 x bfloat> [[B:%.*]] to <4 x float>
+// CHECK-NEXT:    [[LANE:%.*]] = shufflevector <4 x float> [[DOTCAST]], <4 x float> poison, <2 x i32> <i32 3, i32 3>
+// CHECK-NEXT:    [[DOTCAST1:%.*]] = bitcast <2 x float> [[LANE]] to <4 x bfloat>
+// CHECK-NEXT:    [[VBFDOT3_I:%.*]] = call <2 x float> @llvm.aarch64.neon.bfdot.v2f32.v4bf16(<2 x float> [[R:%.*]], <4 x bfloat> [[A:%.*]], <4 x bfloat> [[DOTCAST1]])
 // CHECK-NEXT:    ret <2 x float> [[VBFDOT3_I]]
 //
 float32x2_t test_vbfdot_laneq_f32(float32x2_t r, bfloat16x4_t a, bfloat16x8_t b) {
@@ -95,19 +62,10 @@ float32x2_t test_vbfdot_laneq_f32(float32x2_t r, bfloat16x4_t a, bfloat16x8_t b)
 
 // CHECK-LABEL: @test_vbfdotq_lane_f32(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[__REINT_126:%.*]] = alloca <4 x bfloat>, align 8
-// CHECK-NEXT:    [[__REINT1_126:%.*]] = alloca <4 x float>, align 16
-// CHECK-NEXT:    store <4 x bfloat> [[B:%.*]], ptr [[__REINT_126]], align 8
-// CHECK-NEXT:    [[TMP0:%.*]] = load <2 x float>, ptr [[__REINT_126]], align 8
-// CHECK-NEXT:    [[TMP1:%.*]] = bitcast <2 x float> [[TMP0]] to <8 x i8>
-// CHECK-NEXT:    [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <2 x float>
-// CHECK-NEXT:    [[LANE:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> [[TMP2]], <4 x i32> zeroinitializer
-// CHECK-NEXT:    store <4 x float> [[LANE]], ptr [[__REINT1_126]], align 16
-// CHECK-NEXT:    [[TMP3:%.*]] = load <8 x bfloat>, ptr [[__REINT1_126]], align 16
-// CHECK-NEXT:    [[TMP4:%.*]] = bitcast <4 x float> [[R:%.*]] to <16 x i8>
-// CHECK-NEXT:    [[TMP5:%.*]] = bitcast <8 x bfloat> [[A:%.*]] to <16 x i8>
-// CHECK-NEXT:    [[TMP6:%.*]] = bitcast <8 x bfloat> [[TMP3]] to <16 x i8>
-// CHECK-NEXT:    [[VBFDOT3_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfdot.v4f32.v8bf16(<4 x float> [[R]], <8 x bfloat> [[A]], <8 x bfloat> [[TMP3]])
+// CHECK-NEXT:    [[DOTCAST:%.*]] = bitcast <4 x bfloat> [[B:%.*]] to <2 x float>
+// CHECK-NEXT:    [[LANE:%.*]] = shufflevector <2 x float> [[DOTCAST]], <2 x float> poison, <4 x i32> zeroinitializer
+// CHECK-NEXT:    [[DOTCAST1:%.*]] = bitcast <4 x float> [[LANE]] to <8 x bfloat>
+// CHECK-NEXT:    [[VBFDOT3_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfdot.v4f32.v8bf16(<4 x float> [[R:%.*]], <8 x bfloat> [[A:%.*]], <8 x bfloat> [[DOTCAST1]])
 // CHECK-NEXT:    ret <4 x float> [[VBFDOT3_I]]
 //
 float32x4_t test_vbfdotq_lane_f32(float32x4_t r, bfloat16x8_t a, bfloat16x4_t b) {
@@ -116,11 +74,7 @@ float32x4_t test_vbfdotq_lane_f32(float32x4_t r, bfloat16x8_t a, bfloat16x4_t b)
 
 // CHECK-LABEL: @test_vbfmmlaq_f32(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = bitcast <4 x float> [[R:%.*]] to <16 x i8>
-// CHECK-NEXT:    [[TMP1:%.*]] = bitcast <8 x bfloat> [[A:%.*]] to <16 x i8>
-// CHECK-NEXT:    [[TMP2:%.*]] = bitcast <8 x bfloat> [[B:%.*]] to <16 x i8>
-// CHECK-NEXT:    [[VBFMMLAQ_F323_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfmmla(<4 x float> [[R]], <8 x bfloat> [[A]], <8 x bfloat> [[B]])
-// CHECK-NEXT:    [[VBFMMLAQ_F324_I:%.*]] = bitcast <4 x float> [[VBFMMLAQ_F323_I]] to <16 x i8>
+// CHECK-NEXT:    [[VBFMMLAQ_F323_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfmmla(<4 x float> [[R:%.*]], <8 x bfloat> [[A:%.*]], <8 x bfloat> [[B:%.*]])
 // CHECK-NEXT:    ret <4 x float> [[VBFMMLAQ_F323_I]]
 //
 float32x4_t test_vbfmmlaq_f32(float32x4_t r, bfloat16x8_t a, bfloat16x8_t b) {
@@ -129,11 +83,7 @@ float32x4_t test_vbfmmlaq_f32(float32x4_t r, bfloat16x8_t a, bfloat16x8_t b) {
 
 // CHECK-LABEL: @test_vbfmlalbq_f32(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = bitcast <4 x float> [[R:%.*]] to <16 x i8>
-// CHECK-NEXT:    [[TMP1:%.*]] = bitcast <8 x bfloat> [[A:%.*]] to <16 x i8>
-// CHECK-NEXT:    [[TMP2:%.*]] = bitcast <8 x bfloat> [[B:%.*]] to <16 x i8>
-// CHECK-NEXT:    [[VBFMLALBQ_F323_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfmlalb(<4 x float> [[R]], <8 x bfloat> [[A]], <8 x bfloat> [[B]])
-// CHECK-NEXT:    [[VBFMLALBQ_F324_I:%.*]] = bitcast <4 x float> [[VBFMLALBQ_F323_I]] to <16 x i8>
+// CHECK-NEXT:    [[VBFMLALBQ_F323_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfmlalb(<4 x float> [[R:%.*]], <8 x bfloat> [[A:%.*]], <8 x bfloat> [[B:%.*]])
 // CHECK-NEXT:    ret <4 x float> [[VBFMLALBQ_F323_I]]
 //
 float32x4_t test_vbfmlalbq_f32(float32x4_t r, bfloat16x8_t a, bfloat16x8_t b) {
@@ -142,11 +92,7 @@ float32x4_t test_vbfmlalbq_f32(float32x4_t r, bfloat16x8_t a, bfloat16x8_t b) {
 
 // CHECK-LABEL: @test_vbfmlaltq_f32(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = bitcast <4 x float> [[R:%.*]] to <16 x i8>
-// CHECK-NEXT:    [[TMP1:%.*]] = bitcast <8 x bfloat> [[A:%.*]] to <16 x i8>
-// CHECK-NEXT:    [[TMP2:%.*]] = bitcast <8 x bfloat> [[B:%.*]] to <16 x i8>
-// CHECK-NEXT:    [[VBFMLALTQ_F323_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfmlalt(<4 x float> [[R]], <8 x bfloat> [[A]], <8 x bfloat> [[B]])
-// CHECK-NEXT:    [[VBFMLALTQ_F324_I:%.*]] = bitcast <4 x float> [[VBFMLALTQ_F323_I]] to <16 x i8>
+// CHECK-NEXT:    [[VBFMLALTQ_F323_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfmlalt(<4 x float> [[R:%.*]], <8 x bfloat> [[A:%.*]], <8 x bfloat> [[B:%.*]])
 // CHECK-NEXT:    ret <4 x float> [[VBFMLALTQ_F323_I]]
 //
 float32x4_t test_vbfmlaltq_f32(float32x4_t r, bfloat16x8_t a, bfloat16x8_t b) {
@@ -155,27 +101,8 @@ float32x4_t test_vbfmlaltq_f32(float32x4_t r, bfloat16x8_t a, bfloat16x8_t b) {
 
 // CHECK-LABEL: @test_vbfmlalbq_lane_f32(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[VGET_LANE:%.*]] = extractelement <4 x bfloat> [[B:%.*]], i32 0
-// CHECK-NEXT:    [[VECINIT:%.*]] = insertelement <8 x bfloat> poison, bfloat [[VGET_LANE]], i32 0
-// CHECK-NEXT:    [[VGET_LANE3:%.*]] = extractelement <4 x bfloat> [[B]], i32 0
-// CHECK-NEXT:    [[VECINIT5:%.*]] = insertelement <8 x bfloat> [[VECINIT]], bfloat [[VGET_LANE3]], i32 1
-// CHECK-NEXT:    [[VGET_LANE8:%.*]] = extractelement <4 x bfloat> [[B]], i32 0
-// CHECK-NEXT:    [[VECINIT10:%.*]] = insertelement <8 x bfloat> [[VECINIT5]], bfloat [[VGET_LANE8]], i32 2
-// CHECK-NEXT:    [[VGET_LANE13:%.*]] = extractelement <4 x bfloat> [[B]], i32 0
-// CHECK-NEXT:    [[VECINIT15:%.*]] = insertelement <8 x bfloat> [[VECINIT10]], bfloat [[VGET_LANE13]], i32 3
-// CHECK-NEXT:    [[VGET_LANE18:%.*]] = extractelement <4 x bfloat> [[B]], i32 0
-// CHECK-NEXT:    [[VECINIT20:%.*]] = insertelement <8 x bfloat> [[VECINIT15]], bfloat [[VGET_LANE18]], i32 4
-// CHECK-NEXT:    [[VGET_LANE23:%.*]] = extractelement <4 x bfloat> [[B]], i32 0
-// CHECK-NEXT:    [[VECINIT25:%.*]] = insertelement <8 x bfloat> [[VECINIT20]], bfloat [[VGET_LANE23]], i32 5
-// CHECK-NEXT:    [[VGET_LANE28:%.*]] = extractelement <4 x bfloat> [[B]], i32 0
-// CHECK-NEXT:    [[VECINIT30:%.*]] = insertelement <8 x bfloat> [[VECINIT25]], bfloat [[VGET_LANE28]], i32 6
-// CHECK-NEXT:    [[VGET_LANE33:%.*]] = extractelement <4 x bfloat> [[B]], i32 0
-// CHECK-NEXT:    [[VECINIT35:%.*]] = insertelement <8 x bfloat> [[VECINIT30]], bfloat [[VGET_LANE33]], i32 7
-// CHECK-NEXT:    [[TMP0:%.*]] = bitcast <4 x float> [[R:%.*]] to <16 x i8>
-// CHECK-NEXT:    [[TMP1:%.*]] = bitcast <8 x bfloat> [[A:%.*]] to <16 x i8>
-// CHECK-NEXT:    [[TMP2:%.*]] = bitcast <8 x bfloat> [[VECINIT35]] to <16 x i8>
-// CHECK-NEXT:    [[VBFMLALBQ_F323_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfmlalb(<4 x float> [[R]], <8 x bfloat> [[A]], <8 x bfloat> [[VECINIT35]])
-// CHECK-NEXT:    [[VBFMLALBQ_F324_I:%.*]] = bitcast <4 x float> [[VBFMLALBQ_F323_I]] to <16 x i8>
+// CHECK-NEXT:    [[VECINIT35:%.*]] = shufflevector <4 x bfloat> [[B:%.*]], <4 x bfloat> poison, <8 x i32> zeroinitializer
+// CHECK-NEXT:    [[VBFMLALBQ_F323_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfmlalb(<4 x float> [[R:%.*]], <8 x bfloat> [[A:%.*]], <8 x bfloat> [[VECINIT35]])
 // CHECK-NEXT:    ret <4 x float> [[VBFMLALBQ_F323_I]]
 //
 float32x4_t test_vbfmlalbq_lane_f32(float32x4_t r, bfloat16x8_t a, bfloat16x4_t b) {
@@ -184,27 +111,8 @@ float32x4_t test_vbfmlalbq_lane_f32(float32x4_t r, bfloat16x8_t a, bfloat16x4_t
 
 // CHECK-LABEL: @test_vbfmlalbq_laneq_f32(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[VGETQ_LANE:%.*]] = extractelement <8 x bfloat> [[B:%.*]], i32 3
-// CHECK-NEXT:    [[VECINIT:%.*]] = insertelement <8 x bfloat> poison, bfloat [[VGETQ_LANE]], i32 0
-// CHECK-NEXT:    [[VGETQ_LANE3:%.*]] = extractelement <8 x bfloat> [[B]], i32 3
-// CHECK-NEXT:    [[VECINIT5:%.*]] = insertelement <8 x bfloat> [[VECINIT]], bfloat [[VGETQ_LANE3]], i32 1
-// CHECK-NEXT:    [[VGETQ_LANE8:%.*]] = extractelement <8 x bfloat> [[B]], i32 3
-// CHECK-NEXT:    [[VECINIT10:%.*]] = insertelement <8 x bfloat> [[VECINIT5]], bfloat [[VGETQ_LANE8]], i32 2
-// CHECK-NEXT:    [[VGETQ_LANE13:%.*]] = extractelement <8 x bfloat> [[B]], i32 3
-// CHECK-NEXT:    [[VECINIT15:%.*]] = insertelement <8 x bfloat> [[VECINIT10]], bfloat [[VGETQ_LANE13]], i32 3
-// CHECK-NEXT:    [[VGETQ_LANE18:%.*]] = extractelement <8 x bfloat> [[B]], i32 3
-// CHECK-NEXT:    [[VECINIT20:%.*]] = insertelement <8 x bfloat> [[VECINIT15]], bfloat [[VGETQ_LANE18]], i32 4
-// CHECK-NEXT:    [[VGETQ_LANE23:%.*]] = extractelement <8 x bfloat> [[B]], i32 3
-// CHECK-NEXT:    [[VECINIT25:%.*]] = insertelement <8 x bfloat> [[VECINIT20]], bfloat [[VGETQ_LANE23]], i32 5
-// CHECK-NEXT:    [[VGETQ_LANE28:%.*]] = extractelement <8 x bfloat> [[B]], i32 3
-// CHECK-NEXT:    [[VECINIT30:%.*]] = insertelement <8 x bfloat> [[VECINIT25]], bfloat [[VGETQ_LANE28]], i32 6
-// CHECK-NEXT:    [[VGETQ_LANE33:%.*]] = extractelement <8 x bfloat> [[B]], i32 3
-// CHECK-NEXT:    [[VECINIT35:%.*]] = insertelement <8 x bfloat> [[VECINIT30]], bfloat [[VGETQ_LANE33]], i32 7
-// CHECK-NEXT:    [[TMP0:%.*]] = bitcast <4 x float> [[R:%.*]] to <16 x i8>
-// CHECK-NEXT:    [[TMP1:%.*]] = bitcast <8 x bfloat> [[A:%.*]] to <16 x i8>
-// CHECK-NEXT:    [[TMP2:%.*]] = bitcast <8 x bfloat> [[VECINIT35]] to <16 x i8>
-// CHECK-NEXT:    [[VBFMLALBQ_F323_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfmlalb(<4 x float> [[R]], <8 x bfloat> [[A]], <8 x bfloat> [[VECINIT35]])
-// CHECK-NEXT:    [[VBFMLALBQ_F324_I:%.*]] = bitcast <4 x float> [[VBFMLALBQ_F323_I]] to <16 x i8>
+// CHECK-NEXT:    [[VECINIT35:%.*]] = shufflevector <8 x bfloat> [[B:%.*]], <8 x bfloat> poison, <8 x i32> <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>
+// CHECK-NEXT:    [[VBFMLALBQ_F323_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfmlalb(<4 x float> [[R:%.*]], <8 x bfloat> [[A:%.*]], <8 x bfloat> [[VECINIT35]])
 // CHECK-NEXT:    ret <4 x float> [[VBFMLALBQ_F323_I]]
 //
 float32x4_t test_vbfmlalbq_laneq_f32(float32x4_t r, bfloat16x8_t a, bfloat16x8_t b) {
@@ -213,27 +121,8 @@ float32x4_t test_vbfmlalbq_laneq_f32(float32x4_t r, bfloat16x8_t a, bfloat16x8_t
 
 // CHECK-LABEL: @test_vbfmlaltq_lane_f32(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[VGET_LANE:%.*]] = extractelement <4 x bfloat> [[B:%.*]], i32 0
-// CHECK-NEXT:    [[VECINIT:%.*]] = insertelement <8 x bfloat> poison, bfloat [[VGET_LANE]], i32 0
-// CHECK-NEXT:    [[VGET_LANE3:%.*]] = extractelement <4 x bfloat> [[B]], i32 0
-// CHECK-NEXT:    [[VECINIT5:%.*]] = insertelement <8 x bfloat> [[VECINIT]], bfloat [[VGET_LANE3]], i32 1
-// CHECK-NEXT:    [[VGET_LANE8:%.*]] = extractelement <4 x bfloat> [[B]], i32 0
-// CHECK-NEXT:    [[VECINIT10:%.*]] = insertelement <8 x bfloat> [[VECINIT5]], bfloat [[VGET_LANE8]], i32 2
-// CHECK-NEXT:    [[VGET_LANE13:%.*]] = extractelement <4 x bfloat> [[B]], i32 0
-// CHECK-NEXT:    [[VECINIT15:%.*]] = insertelement <8 x bfloat> [[VECINIT10]], bfloat [[VGET_LANE13]], i32 3
-// CHECK-NEXT:    [[VGET_LANE18:%.*]] = extractelement <4 x bfloat> [[B]], i32 0
-// CHECK-NEXT:    [[VECINIT20:%.*]] = insertelement <8 x bfloat> [[VECINIT15]], bfloat [[VGET_LANE18]], i32 4
-// CHECK-NEXT:    [[VGET_LANE23:%.*]] = extractelement <4 x bfloat> [[B]], i32 0
-// CHECK-NEXT:    [[VECINIT25:%.*]] = insertelement <8 x bfloat> [[VECINIT20]], bfloat [[VGET_LANE23]], i32 5
-// CHECK-NEXT:    [[VGET_LANE28:%.*]] = extractelement <4 x bfloat> [[B]], i32 0
-// CHECK-NEXT:    [[VECINIT30:%.*]] = insertelement <8 x bfloat> [[VECINIT25]], bfloat [[VGET_LANE28]], i32 6
-// CHECK-NEXT:    [[VGET_LANE33:%.*]] = extractelement <4 x bfloat> [[B]], i32 0
-// CHECK-NEXT:    [[VECINIT35:%.*]] = insertelement <8 x bfloat> [[VECINIT30]], bfloat [[VGET_LANE33]], i32 7
-// CHECK-NEXT:    [[TMP0:%.*]] = bitcast <4 x float> [[R:%.*]] to <16 x i8>
-// CHECK-NEXT:    [[TMP1:%.*]] = bitcast <8 x bfloat> [[A:%.*]] to <16 x i8>
-// CHECK-NEXT:    [[TMP2:%.*]] = bitcast <8 x bfloat> [[VECINIT35]] to <16 x i8>
-// CHECK-NEXT:    [[VBFMLALTQ_F323_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfmlalt(<4 x float> [[R]], <8 x bfloat> [[A]], <8 x bfloat> [[VECINIT35]])
-// CHECK-NEXT:    [[VBFMLALTQ_F324_I:%.*]] = bitcast <4 x float> [[VBFMLALTQ_F323_I]] to <16 x i8>
+// CHECK-NEXT:    [[VECINIT35:%.*]] = shufflevector <4 x bfloat> [[B:%.*]], <4 x bfloat> poison, <8 x i32> zeroinitializer
+// CHECK-NEXT:    [[VBFMLALTQ_F323_I:%.*]] = call <4 x float> @llvm.aarch64.neon.bfmlalt(<4 x float> [[R:%.*]], <8 x bfloat> [[A:%.*]], <8 x bfloat> [[VECINIT35]])
 // CHECK-NEXT:   ...
[truncated]

github-actions · 2025-01-06T17:05:00Z

⚠️ Python code formatter, darker found issues in your code. ⚠️

You can test this locally with the following command:

darker --check --diff -r 7cb6e6bced8ca5767c3e609f4826982638fd9543...67ba481f809abdc85a90fed5ff11ca48c6578e3e llvm/utils/UpdateTestChecks/common.py llvm/utils/update_cc_test_checks.py

View the diff from darker here.

--- UpdateTestChecks/common.py	2025-01-06 15:08:27.000000 +0000
+++ UpdateTestChecks/common.py	2025-01-06 17:04:31.173625 +0000
@@ -556,12 +556,11 @@
 UTC_ADVERT = "NOTE: Assertions have been autogenerated by "
 UTC_AVOID = "NOTE: Do not autogenerate"
 UNUSED_NOTE = "NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:"
 
 DATA_LAYOUT_RE = re.compile(
-    r"target\sdatalayout\s=\s\"(?P<layout>.+)\"$",
-    flags=(re.M | re.S)
+    r"target\sdatalayout\s=\s\"(?P<layout>.+)\"$", flags=(re.M | re.S)
 )
 
 OPT_FUNCTION_RE = re.compile(
     r"^(\s*;\s*Function\sAttrs:\s(?P<attrs>[\w\s():,]+?))?\s*define\s+(?P<funcdef_attrs_and_ret>[^@]*)@(?P<func>[\w.$-]+?)\s*"
     r"(?P<args_and_sig>\((\)|(.*?[\w.-]+?)\))[^{]*\{)\n(?P<body>.*?)^\}$",
@@ -653,20 +652,22 @@
         if march.startswith(prefix):
             return triple
     print("Cannot find a triple. Assume 'x86'", file=sys.stderr)
     return "x86"
 
+
 def get_global_underscores(raw_tool_output):
     m = DATA_LAYOUT_RE.search(raw_tool_output)
     if not m:
         return False
     data_layout = m.group("layout")
     idx = data_layout.find("m:")
     if idx < 0:
         return False
     ch = data_layout[idx + 2]
-    return ch == 'o' or ch == 'x'
+    return ch == "o" or ch == "x"
+
 
 def apply_filters(line, filters):
     has_filter = False
     for f in filters:
         if not f.is_filter_out:
--- update_cc_test_checks.py	2025-01-06 15:08:27.000000 +0000
+++ update_cc_test_checks.py	2025-01-06 17:04:31.395847 +0000
@@ -123,11 +123,11 @@
             search = spell
         mangled = node.get("mangledName", spell)
         # Strip leading underscore from globals, so the name matches the LLVM one
         if global_underscores:
             storage = node.get("storageClass", None)
-            if storage != "static" and mangled[0] == '_':
+            if storage != "static" and mangled[0] == "_":
                 mangled = mangled[1:]
         ret[int(line) - 1].append((spell, mangled, search))
 
     ast = json.loads(stdout)
     if ast["kind"] != "TranslationUnitDecl":
@@ -252,11 +252,13 @@
         args.opt = None
 
     return args, parser
 
 
-def get_function_body(builder, args, filename, clang_args, extra_commands, prefixes, raw_tool_output):
+def get_function_body(
+    builder, args, filename, clang_args, extra_commands, prefixes, raw_tool_output
+):
     # TODO Clean up duplication of asm/common build_function_body_dictionary
     for extra_command in extra_commands:
         extra_args = shlex.split(extra_command)
         with tempfile.NamedTemporaryFile() as f:
             f.write(raw_tool_output.encode())
@@ -387,16 +389,24 @@
             common.debug("Extracted FileCheck prefixes: {}".format(prefixes))
 
             # Invoke external tool and extract function bodies.
             raw_tool_output = common.invoke_tool(ti.args.clang, clang_args, ti.path)
             get_function_body(
-                builder, ti.args, ti.path, clang_args, extra_commands, prefixes, raw_tool_output
+                builder,
+                ti.args,
+                ti.path,
+                clang_args,
+                extra_commands,
+                prefixes,
+                raw_tool_output,
             )
 
             # Invoke clang -Xclang -ast-dump=json to get mapping from start lines to
             # mangled names. Forward all clang args for now.
-            for k, v in get_line2func_list(ti.args, clang_args, common.get_global_underscores(raw_tool_output)).items():
+            for k, v in get_line2func_list(
+                ti.args, clang_args, common.get_global_underscores(raw_tool_output)
+            ).items():
                 line2func_list[k].extend(v)
 
         func_dict = builder.finish_and_get_func_dict()
         global_vars_seen_dict = {}
         prefix_set = set([prefix for p in filecheck_run_list for prefix in p[0]])

jthackray

LGTM

davemgreen · 2025-01-06T17:17:14Z

This patch adds instcombine to some tests that were passing

This patch sounds good, but the intent of the clang tests is that they should not be dependant on the mid-end optimizations in llvm. At least for fast-changing parts like instcombine, where you don't want to have to update a thousand clang tests every time instcombine learns a new trick. The clang tests should ideally be testing the clang output.

Can the instcombines be replaced with something simpler like dce or maybe instsimplify? It might be OK with just mem2reg.

momchil-velikov · 2025-01-06T18:01:28Z

This patch adds instcombine to some tests that were passing
Can the instcombines be replaced with something simpler like dce or maybe instsimplify? It might be OK with just mem2reg.

Unfortunately just mem2reg does not cut it. The original issue it that Clang generates very different code for C-style
cast and __builtin_bit_cast, e.g. for (https://gcc.godbolt.org/z/7rKqhTY5W)

typedef __attribute__((neon_vector_type(4))) unsigned uint32x4_t;
typedef __attribute__((neon_vector_type(4))) float float32x4_t;

uint32x4_t f(float32x4_t v) {
  return __builtin_bit_cast(uint32x4_t ,v);
}

uint32x4_t g(float32x4_t v) {
  return (uint32x4_t) v;
}

and the differences persist after mem2reg :

define dso_local <4 x i32> @f(<4 x float> noundef %v) #0 {
entry:
  %v.addr = alloca <4 x float>, align 16
  store <4 x float> %v, ptr %v.addr, align 16
  %0 = load <4 x i32>, ptr %v.addr, align 16
  ret <4 x i32> %0
}

define dso_local <4 x i32> @g(<4 x float> noundef %v) #0 {
entry:
  %0 = bitcast <4 x float> %v to <4 x i32>
  ret <4 x i32> %0
}

which makes the test updates after #121802 very hard to inspect.
I can try with sroa instead of instcombine (works on this example above at least)

davemgreen · 2025-01-08T08:49:44Z

sroa would be ideal if it works, I know a number of test cases use it and it shouldn't update too often.

momchil-velikov · 2025-01-08T09:54:55Z

Another things that "works" is changing the codegen for __builtin_bit_cast (

llvm-project/clang/lib/CodeGen/CGExprScalar.cpp

Line 2295 in 32bc029

case CK_LValueToRValueBitCast: {

).
It stores a value with one type and loads it back with another type which is not handled by mem2reg (

llvm-project/llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp

Line 72 in 32bc029

if (LI->isVolatile() || LI->getType() != AI->getAllocatedType())

)
Instead it could perform a load with the original type and then emit an LLVM IR bitcast, just like it does for "C-style" bitcast:

llvm-project/clang/lib/CodeGen/CGExprScalar.cpp

Line 2425 in 32bc029

llvm::Value *Result = Builder.CreateBitCast(Src, DstTy);

momchil-velikov · 2025-01-08T15:13:14Z

Instead it could perform a load with the original type and then emit an LLVM IR bitcast, just like it does for "C-style"

Except that it does not work when aggregate types are involved.
Using mem2reg,sroa is a bit better than just mem2reg, even though sroa generates some ridiculous extra bitcasts.

momchil-velikov · 2025-01-14T15:26:04Z

Abandon for now, the follow-up patch (#121802) was reduced in scope and does not need to update tests.

momchil-velikov added 2 commits January 6, 2025 15:08

momchil-velikov requested review from jthackray, davemgreen, rgwott, paulwalker-arm, Lukacma, CarolineConcatto and tmatheson-arm January 6, 2025 17:01

llvmbot added clang Clang issues not falling into any other category backend:AArch64 testing-tools labels Jan 6, 2025

jthackray approved these changes Jan 6, 2025

View reviewed changes

momchil-velikov closed this Jan 14, 2025

momchil-velikov deleted the neon-bitcast-regen-tests branch January 29, 2025 10:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Arm] Regenerate tests (NFC) #121801

[Arm] Regenerate tests (NFC) #121801

momchil-velikov commented Jan 6, 2025 •

edited

Loading

llvmbot commented Jan 6, 2025 •

edited

Loading

github-actions bot commented Jan 6, 2025

jthackray left a comment

davemgreen commented Jan 6, 2025

momchil-velikov commented Jan 6, 2025

davemgreen commented Jan 8, 2025

momchil-velikov commented Jan 8, 2025

momchil-velikov commented Jan 8, 2025 •

edited

Loading

momchil-velikov commented Jan 14, 2025

[Arm] Regenerate tests (NFC) #121801

[Arm] Regenerate tests (NFC) #121801

Conversation

momchil-velikov commented Jan 6, 2025 • edited Loading

llvmbot commented Jan 6, 2025 • edited Loading

github-actions bot commented Jan 6, 2025

jthackray left a comment

Choose a reason for hiding this comment

davemgreen commented Jan 6, 2025

momchil-velikov commented Jan 6, 2025

davemgreen commented Jan 8, 2025

momchil-velikov commented Jan 8, 2025

momchil-velikov commented Jan 8, 2025 • edited Loading

momchil-velikov commented Jan 14, 2025

momchil-velikov commented Jan 6, 2025 •

edited

Loading

llvmbot commented Jan 6, 2025 •

edited

Loading

momchil-velikov commented Jan 8, 2025 •

edited

Loading