Skip to content

[InlineCost] Consider the default branch when calculating cost #77856

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Feb 11, 2024

Conversation

dianqk
Copy link
Member

@dianqk dianqk commented Jan 12, 2024

First step in fixing #76772.

This PR considers the default branch as a case branch. This will give the unreachable default branch fair consideration.

I changed getEstimatedNumberOfCaseClusters, although that didn't affect the rest. Maybe a better approach would be to add a new method?

@llvmbot
Copy link
Member

llvmbot commented Jan 12, 2024

@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-analysis

Author: Quentin Dian (DianQK)

Changes

First step in fixing #76772.

This PR considers the default branch as a case branch. This will give the unreachable default branch fair consideration.


Patch is 23.41 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/77856.diff

5 Files Affected:

  • (modified) llvm/include/llvm/Analysis/TargetTransformInfoImpl.h (+3-1)
  • (modified) llvm/include/llvm/CodeGen/BasicTTIImpl.h (+7-5)
  • (modified) llvm/lib/Analysis/InlineCost.cpp (+5-1)
  • (added) llvm/test/Transforms/Inline/inline-switch-default-2.ll (+317)
  • (added) llvm/test/Transforms/Inline/inline-switch-default.ll (+216)
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
index 60eab53fa2f601..1bf9cec8a2c40b 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
@@ -65,7 +65,9 @@ class TargetTransformInfoImplBase {
     (void)PSI;
     (void)BFI;
     JTSize = 0;
-    return SI.getNumCases();
+    bool HasDefault =
+        !isa<UnreachableInst>(SI.getDefaultDest()->getFirstNonPHIOrDbg());
+    return SI.getNumCases() + HasDefault;
   }
 
   unsigned getInliningThresholdMultiplier() const { return 1; }
diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
index 5e7bdcdf72a49f..45849869be4a65 100644
--- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h
+++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
@@ -446,6 +446,8 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
     /// inline cost heuristic, but it's a generic cost model to be used in other
     /// places (e.g., in loop unrolling).
     unsigned N = SI.getNumCases();
+    bool HasDefault =
+        !isa<UnreachableInst>(SI.getDefaultDest()->getFirstNonPHIOrDbg());
     const TargetLoweringBase *TLI = getTLI();
     const DataLayout &DL = this->getDataLayout();
 
@@ -454,7 +456,7 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
 
     // Early exit if both a jump table and bit test are not allowed.
     if (N < 1 || (!IsJTAllowed && DL.getIndexSizeInBits(0u) < N))
-      return N;
+      return N + HasDefault;
 
     APInt MaxCaseVal = SI.case_begin()->getCaseValue()->getValue();
     APInt MinCaseVal = MaxCaseVal;
@@ -474,23 +476,23 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
 
       if (TLI->isSuitableForBitTests(Dests.size(), N, MinCaseVal, MaxCaseVal,
                                      DL))
-        return 1;
+        return 1 + HasDefault;
     }
 
     // Check if suitable for a jump table.
     if (IsJTAllowed) {
       if (N < 2 || N < TLI->getMinimumJumpTableEntries())
-        return N;
+        return N + HasDefault;
       uint64_t Range =
           (MaxCaseVal - MinCaseVal)
               .getLimitedValue(std::numeric_limits<uint64_t>::max() - 1) + 1;
       // Check whether a range of clusters is dense enough for a jump table
       if (TLI->isSuitableForJumpTable(&SI, N, Range, PSI, BFI)) {
         JumpTableSize = Range;
-        return 1;
+        return 1 + HasDefault;
       }
     }
-    return N;
+    return N + HasDefault;
   }
 
   bool shouldBuildLookupTables() {
diff --git a/llvm/lib/Analysis/InlineCost.cpp b/llvm/lib/Analysis/InlineCost.cpp
index 1fa7badaa4fa01..7b9e6e02f7b093 100644
--- a/llvm/lib/Analysis/InlineCost.cpp
+++ b/llvm/lib/Analysis/InlineCost.cpp
@@ -707,8 +707,9 @@ class InlineCostCallAnalyzer final : public CallAnalyzer {
     if (JumpTableSize) {
       int64_t JTCost =
           static_cast<int64_t>(JumpTableSize) * InstrCost + 4 * InstrCost;
-
       addCost(JTCost);
+      if (NumCaseCluster > 1)
+        addCost((NumCaseCluster - 1) * 2 * InstrCost);
       return;
     }
 
@@ -1238,6 +1239,9 @@ class InlineCostFeaturesAnalyzer final : public CallAnalyzer {
       int64_t JTCost = static_cast<int64_t>(JumpTableSize) * InstrCost +
                        JTCostMultiplier * InstrCost;
       increment(InlineCostFeatureIndex::jump_table_penalty, JTCost);
+      if (NumCaseCluster > 1)
+        increment(InlineCostFeatureIndex::case_cluster_penalty,
+                  (NumCaseCluster - 1) * CaseClusterCostMultiplier * InstrCost);
       return;
     }
 
diff --git a/llvm/test/Transforms/Inline/inline-switch-default-2.ll b/llvm/test/Transforms/Inline/inline-switch-default-2.ll
new file mode 100644
index 00000000000000..a784bd8d8613ab
--- /dev/null
+++ b/llvm/test/Transforms/Inline/inline-switch-default-2.ll
@@ -0,0 +1,317 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+; RUN: opt %s -S -passes=inline -inline-threshold=21 | FileCheck %s
+
+; Check for scenarios without TTI.
+
+define i64 @foo1(i64 %0) {
+; LOOKUPTABLE-LABEL: define i64 @foo1(
+; LOOKUPTABLE-SAME: i64 [[TMP0:%.*]]) {
+; LOOKUPTABLE-NEXT:    switch i64 [[TMP0]], label [[DEFAULT_BRANCH_I:%.*]] [
+; LOOKUPTABLE-NEXT:      i64 0, label [[BRANCH_0_I:%.*]]
+; LOOKUPTABLE-NEXT:      i64 2, label [[BRANCH_2_I:%.*]]
+; LOOKUPTABLE-NEXT:      i64 4, label [[BRANCH_4_I:%.*]]
+; LOOKUPTABLE-NEXT:      i64 6, label [[BRANCH_6_I:%.*]]
+; LOOKUPTABLE-NEXT:    ]
+; LOOKUPTABLE:       branch_0.i:
+; LOOKUPTABLE-NEXT:    br label [[BAR1_EXIT:%.*]]
+; LOOKUPTABLE:       branch_2.i:
+; LOOKUPTABLE-NEXT:    br label [[BAR1_EXIT]]
+; LOOKUPTABLE:       branch_4.i:
+; LOOKUPTABLE-NEXT:    br label [[BAR1_EXIT]]
+; LOOKUPTABLE:       branch_6.i:
+; LOOKUPTABLE-NEXT:    br label [[BAR1_EXIT]]
+; LOOKUPTABLE:       default_branch.i:
+; LOOKUPTABLE-NEXT:    br label [[BAR1_EXIT]]
+; LOOKUPTABLE:       bar1.exit:
+; LOOKUPTABLE-NEXT:    [[TMP2:%.*]] = phi i64 [ 5, [[BRANCH_0_I]] ], [ 9, [[BRANCH_2_I]] ], [ 2, [[BRANCH_4_I]] ], [ 7, [[BRANCH_6_I]] ], [ 3, [[DEFAULT_BRANCH_I]] ]
+; LOOKUPTABLE-NEXT:    ret i64 [[TMP2]]
+;
+; SWITCH-LABEL: define i64 @foo1(
+; SWITCH-SAME: i64 [[TMP0:%.*]]) {
+; SWITCH-NEXT:    switch i64 [[TMP0]], label [[DEFAULT_BRANCH_I:%.*]] [
+; SWITCH-NEXT:      i64 0, label [[BRANCH_0_I:%.*]]
+; SWITCH-NEXT:      i64 2, label [[BRANCH_2_I:%.*]]
+; SWITCH-NEXT:      i64 4, label [[BRANCH_4_I:%.*]]
+; SWITCH-NEXT:      i64 6, label [[BRANCH_6_I:%.*]]
+; SWITCH-NEXT:    ]
+; SWITCH:       branch_0.i:
+; SWITCH-NEXT:    br label [[BAR1_EXIT:%.*]]
+; SWITCH:       branch_2.i:
+; SWITCH-NEXT:    br label [[BAR1_EXIT]]
+; SWITCH:       branch_4.i:
+; SWITCH-NEXT:    br label [[BAR1_EXIT]]
+; SWITCH:       branch_6.i:
+; SWITCH-NEXT:    br label [[BAR1_EXIT]]
+; SWITCH:       default_branch.i:
+; SWITCH-NEXT:    br label [[BAR1_EXIT]]
+; SWITCH:       bar1.exit:
+; SWITCH-NEXT:    [[TMP2:%.*]] = phi i64 [ 5, [[BRANCH_0_I]] ], [ 9, [[BRANCH_2_I]] ], [ 2, [[BRANCH_4_I]] ], [ 7, [[BRANCH_6_I]] ], [ 3, [[DEFAULT_BRANCH_I]] ]
+; SWITCH-NEXT:    ret i64 [[TMP2]]
+;
+; CHECK-LABEL: define i64 @foo1(
+; CHECK-SAME: i64 [[TMP0:%.*]]) {
+; CHECK-NEXT:    [[TMP2:%.*]] = call i64 @bar1(i64 [[TMP0]])
+; CHECK-NEXT:    ret i64 [[TMP2]]
+;
+  %2 = call i64 @bar1(i64 %0)
+  ret i64 %2
+}
+
+define i64 @foo2(i64 %0) {
+; LOOKUPTABLE-LABEL: define i64 @foo2(
+; LOOKUPTABLE-SAME: i64 [[TMP0:%.*]]) {
+; LOOKUPTABLE-NEXT:    switch i64 [[TMP0]], label [[UNREACHABLEDEFAULT_I:%.*]] [
+; LOOKUPTABLE-NEXT:      i64 0, label [[BRANCH_0_I:%.*]]
+; LOOKUPTABLE-NEXT:      i64 2, label [[BRANCH_2_I:%.*]]
+; LOOKUPTABLE-NEXT:      i64 4, label [[BRANCH_4_I:%.*]]
+; LOOKUPTABLE-NEXT:      i64 6, label [[BRANCH_6_I:%.*]]
+; LOOKUPTABLE-NEXT:    ]
+; LOOKUPTABLE:       branch_0.i:
+; LOOKUPTABLE-NEXT:    br label [[BAR2_EXIT:%.*]]
+; LOOKUPTABLE:       branch_2.i:
+; LOOKUPTABLE-NEXT:    br label [[BAR2_EXIT]]
+; LOOKUPTABLE:       branch_4.i:
+; LOOKUPTABLE-NEXT:    br label [[BAR2_EXIT]]
+; LOOKUPTABLE:       branch_6.i:
+; LOOKUPTABLE-NEXT:    br label [[BAR2_EXIT]]
+; LOOKUPTABLE:       unreachabledefault.i:
+; LOOKUPTABLE-NEXT:    unreachable
+; LOOKUPTABLE:       bar2.exit:
+; LOOKUPTABLE-NEXT:    [[TMP2:%.*]] = phi i64 [ 5, [[BRANCH_0_I]] ], [ 9, [[BRANCH_2_I]] ], [ 2, [[BRANCH_4_I]] ], [ 7, [[BRANCH_6_I]] ]
+; LOOKUPTABLE-NEXT:    ret i64 [[TMP2]]
+;
+; SWITCH-LABEL: define i64 @foo2(
+; SWITCH-SAME: i64 [[TMP0:%.*]]) {
+; SWITCH-NEXT:    switch i64 [[TMP0]], label [[UNREACHABLEDEFAULT_I:%.*]] [
+; SWITCH-NEXT:      i64 0, label [[BRANCH_0_I:%.*]]
+; SWITCH-NEXT:      i64 2, label [[BRANCH_2_I:%.*]]
+; SWITCH-NEXT:      i64 4, label [[BRANCH_4_I:%.*]]
+; SWITCH-NEXT:      i64 6, label [[BRANCH_6_I:%.*]]
+; SWITCH-NEXT:    ]
+; SWITCH:       branch_0.i:
+; SWITCH-NEXT:    br label [[BAR2_EXIT:%.*]]
+; SWITCH:       branch_2.i:
+; SWITCH-NEXT:    br label [[BAR2_EXIT]]
+; SWITCH:       branch_4.i:
+; SWITCH-NEXT:    br label [[BAR2_EXIT]]
+; SWITCH:       branch_6.i:
+; SWITCH-NEXT:    br label [[BAR2_EXIT]]
+; SWITCH:       unreachabledefault.i:
+; SWITCH-NEXT:    unreachable
+; SWITCH:       bar2.exit:
+; SWITCH-NEXT:    [[TMP2:%.*]] = phi i64 [ 5, [[BRANCH_0_I]] ], [ 9, [[BRANCH_2_I]] ], [ 2, [[BRANCH_4_I]] ], [ 7, [[BRANCH_6_I]] ]
+; SWITCH-NEXT:    ret i64 [[TMP2]]
+;
+; CHECK-LABEL: define i64 @foo2(
+; CHECK-SAME: i64 [[TMP0:%.*]]) {
+; CHECK-NEXT:    switch i64 [[TMP0]], label [[UNREACHABLEDEFAULT_I:%.*]] [
+; CHECK-NEXT:      i64 0, label [[BRANCH_0_I:%.*]]
+; CHECK-NEXT:      i64 2, label [[BRANCH_2_I:%.*]]
+; CHECK-NEXT:      i64 4, label [[BRANCH_4_I:%.*]]
+; CHECK-NEXT:      i64 6, label [[BRANCH_6_I:%.*]]
+; CHECK-NEXT:    ]
+; CHECK:       branch_0.i:
+; CHECK-NEXT:    br label [[BAR2_EXIT:%.*]]
+; CHECK:       branch_2.i:
+; CHECK-NEXT:    br label [[BAR2_EXIT]]
+; CHECK:       branch_4.i:
+; CHECK-NEXT:    br label [[BAR2_EXIT]]
+; CHECK:       branch_6.i:
+; CHECK-NEXT:    br label [[BAR2_EXIT]]
+; CHECK:       unreachabledefault.i:
+; CHECK-NEXT:    unreachable
+; CHECK:       bar2.exit:
+; CHECK-NEXT:    [[TMP2:%.*]] = phi i64 [ 5, [[BRANCH_0_I]] ], [ 9, [[BRANCH_2_I]] ], [ 2, [[BRANCH_4_I]] ], [ 7, [[BRANCH_6_I]] ]
+; CHECK-NEXT:    ret i64 [[TMP2]]
+;
+  %2 = call i64 @bar2(i64 %0)
+  ret i64 %2
+}
+
+define i64 @bar1(i64 %0) {
+; LOOKUPTABLE-LABEL: define i64 @bar1(
+; LOOKUPTABLE-SAME: i64 [[TMP0:%.*]]) {
+; LOOKUPTABLE-NEXT:    switch i64 [[TMP0]], label [[DEFAULT_BRANCH:%.*]] [
+; LOOKUPTABLE-NEXT:      i64 0, label [[BRANCH_0:%.*]]
+; LOOKUPTABLE-NEXT:      i64 2, label [[BRANCH_2:%.*]]
+; LOOKUPTABLE-NEXT:      i64 4, label [[BRANCH_4:%.*]]
+; LOOKUPTABLE-NEXT:      i64 6, label [[BRANCH_6:%.*]]
+; LOOKUPTABLE-NEXT:    ]
+; LOOKUPTABLE:       branch_0:
+; LOOKUPTABLE-NEXT:    br label [[EXIT:%.*]]
+; LOOKUPTABLE:       branch_2:
+; LOOKUPTABLE-NEXT:    br label [[EXIT]]
+; LOOKUPTABLE:       branch_4:
+; LOOKUPTABLE-NEXT:    br label [[EXIT]]
+; LOOKUPTABLE:       branch_6:
+; LOOKUPTABLE-NEXT:    br label [[EXIT]]
+; LOOKUPTABLE:       default_branch:
+; LOOKUPTABLE-NEXT:    br label [[EXIT]]
+; LOOKUPTABLE:       exit:
+; LOOKUPTABLE-NEXT:    [[TMP2:%.*]] = phi i64 [ 5, [[BRANCH_0]] ], [ 9, [[BRANCH_2]] ], [ 2, [[BRANCH_4]] ], [ 7, [[BRANCH_6]] ], [ 3, [[DEFAULT_BRANCH]] ]
+; LOOKUPTABLE-NEXT:    ret i64 [[TMP2]]
+;
+; SWITCH-LABEL: define i64 @bar1(
+; SWITCH-SAME: i64 [[TMP0:%.*]]) {
+; SWITCH-NEXT:    switch i64 [[TMP0]], label [[DEFAULT_BRANCH:%.*]] [
+; SWITCH-NEXT:      i64 0, label [[BRANCH_0:%.*]]
+; SWITCH-NEXT:      i64 2, label [[BRANCH_2:%.*]]
+; SWITCH-NEXT:      i64 4, label [[BRANCH_4:%.*]]
+; SWITCH-NEXT:      i64 6, label [[BRANCH_6:%.*]]
+; SWITCH-NEXT:    ]
+; SWITCH:       branch_0:
+; SWITCH-NEXT:    br label [[EXIT:%.*]]
+; SWITCH:       branch_2:
+; SWITCH-NEXT:    br label [[EXIT]]
+; SWITCH:       branch_4:
+; SWITCH-NEXT:    br label [[EXIT]]
+; SWITCH:       branch_6:
+; SWITCH-NEXT:    br label [[EXIT]]
+; SWITCH:       default_branch:
+; SWITCH-NEXT:    br label [[EXIT]]
+; SWITCH:       exit:
+; SWITCH-NEXT:    [[TMP2:%.*]] = phi i64 [ 5, [[BRANCH_0]] ], [ 9, [[BRANCH_2]] ], [ 2, [[BRANCH_4]] ], [ 7, [[BRANCH_6]] ], [ 3, [[DEFAULT_BRANCH]] ]
+; SWITCH-NEXT:    ret i64 [[TMP2]]
+;
+; CHECK-LABEL: define i64 @bar1(
+; CHECK-SAME: i64 [[TMP0:%.*]]) {
+; CHECK-NEXT:    switch i64 [[TMP0]], label [[DEFAULT_BRANCH:%.*]] [
+; CHECK-NEXT:      i64 0, label [[BRANCH_0:%.*]]
+; CHECK-NEXT:      i64 2, label [[BRANCH_2:%.*]]
+; CHECK-NEXT:      i64 4, label [[BRANCH_4:%.*]]
+; CHECK-NEXT:      i64 6, label [[BRANCH_6:%.*]]
+; CHECK-NEXT:    ]
+; CHECK:       branch_0:
+; CHECK-NEXT:    br label [[EXIT:%.*]]
+; CHECK:       branch_2:
+; CHECK-NEXT:    br label [[EXIT]]
+; CHECK:       branch_4:
+; CHECK-NEXT:    br label [[EXIT]]
+; CHECK:       branch_6:
+; CHECK-NEXT:    br label [[EXIT]]
+; CHECK:       default_branch:
+; CHECK-NEXT:    br label [[EXIT]]
+; CHECK:       exit:
+; CHECK-NEXT:    [[TMP2:%.*]] = phi i64 [ 5, [[BRANCH_0]] ], [ 9, [[BRANCH_2]] ], [ 2, [[BRANCH_4]] ], [ 7, [[BRANCH_6]] ], [ 3, [[DEFAULT_BRANCH]] ]
+; CHECK-NEXT:    ret i64 [[TMP2]]
+;
+  switch i64 %0, label %default_branch [
+  i64 0, label %branch_0
+  i64 2, label %branch_2
+  i64 4, label %branch_4
+  i64 6, label %branch_6
+  ]
+
+branch_0:
+  br label %exit
+
+branch_2:
+  br label %exit
+
+branch_4:
+  br label %exit
+
+branch_6:
+  br label %exit
+
+default_branch:
+  br label %exit
+
+exit:
+  %2 = phi i64 [ 5, %branch_0 ], [ 9, %branch_2 ], [ 2, %branch_4 ], [ 7, %branch_6 ], [ 3, %default_branch ]
+  ret i64 %2
+}
+
+define i64 @bar2(i64 %0) {
+; LOOKUPTABLE-LABEL: define i64 @bar2(
+; LOOKUPTABLE-SAME: i64 [[TMP0:%.*]]) {
+; LOOKUPTABLE-NEXT:    switch i64 [[TMP0]], label [[UNREACHABLEDEFAULT:%.*]] [
+; LOOKUPTABLE-NEXT:      i64 0, label [[BRANCH_0:%.*]]
+; LOOKUPTABLE-NEXT:      i64 2, label [[BRANCH_2:%.*]]
+; LOOKUPTABLE-NEXT:      i64 4, label [[BRANCH_4:%.*]]
+; LOOKUPTABLE-NEXT:      i64 6, label [[BRANCH_6:%.*]]
+; LOOKUPTABLE-NEXT:    ]
+; LOOKUPTABLE:       branch_0:
+; LOOKUPTABLE-NEXT:    br label [[EXIT:%.*]]
+; LOOKUPTABLE:       branch_2:
+; LOOKUPTABLE-NEXT:    br label [[EXIT]]
+; LOOKUPTABLE:       branch_4:
+; LOOKUPTABLE-NEXT:    br label [[EXIT]]
+; LOOKUPTABLE:       branch_6:
+; LOOKUPTABLE-NEXT:    br label [[EXIT]]
+; LOOKUPTABLE:       unreachabledefault:
+; LOOKUPTABLE-NEXT:    unreachable
+; LOOKUPTABLE:       exit:
+; LOOKUPTABLE-NEXT:    [[TMP2:%.*]] = phi i64 [ 5, [[BRANCH_0]] ], [ 9, [[BRANCH_2]] ], [ 2, [[BRANCH_4]] ], [ 7, [[BRANCH_6]] ]
+; LOOKUPTABLE-NEXT:    ret i64 [[TMP2]]
+;
+; SWITCH-LABEL: define i64 @bar2(
+; SWITCH-SAME: i64 [[TMP0:%.*]]) {
+; SWITCH-NEXT:    switch i64 [[TMP0]], label [[UNREACHABLEDEFAULT:%.*]] [
+; SWITCH-NEXT:      i64 0, label [[BRANCH_0:%.*]]
+; SWITCH-NEXT:      i64 2, label [[BRANCH_2:%.*]]
+; SWITCH-NEXT:      i64 4, label [[BRANCH_4:%.*]]
+; SWITCH-NEXT:      i64 6, label [[BRANCH_6:%.*]]
+; SWITCH-NEXT:    ]
+; SWITCH:       branch_0:
+; SWITCH-NEXT:    br label [[EXIT:%.*]]
+; SWITCH:       branch_2:
+; SWITCH-NEXT:    br label [[EXIT]]
+; SWITCH:       branch_4:
+; SWITCH-NEXT:    br label [[EXIT]]
+; SWITCH:       branch_6:
+; SWITCH-NEXT:    br label [[EXIT]]
+; SWITCH:       unreachabledefault:
+; SWITCH-NEXT:    unreachable
+; SWITCH:       exit:
+; SWITCH-NEXT:    [[TMP2:%.*]] = phi i64 [ 5, [[BRANCH_0]] ], [ 9, [[BRANCH_2]] ], [ 2, [[BRANCH_4]] ], [ 7, [[BRANCH_6]] ]
+; SWITCH-NEXT:    ret i64 [[TMP2]]
+;
+; CHECK-LABEL: define i64 @bar2(
+; CHECK-SAME: i64 [[TMP0:%.*]]) {
+; CHECK-NEXT:    switch i64 [[TMP0]], label [[UNREACHABLEDEFAULT:%.*]] [
+; CHECK-NEXT:      i64 0, label [[BRANCH_0:%.*]]
+; CHECK-NEXT:      i64 2, label [[BRANCH_2:%.*]]
+; CHECK-NEXT:      i64 4, label [[BRANCH_4:%.*]]
+; CHECK-NEXT:      i64 6, label [[BRANCH_6:%.*]]
+; CHECK-NEXT:    ]
+; CHECK:       branch_0:
+; CHECK-NEXT:    br label [[EXIT:%.*]]
+; CHECK:       branch_2:
+; CHECK-NEXT:    br label [[EXIT]]
+; CHECK:       branch_4:
+; CHECK-NEXT:    br label [[EXIT]]
+; CHECK:       branch_6:
+; CHECK-NEXT:    br label [[EXIT]]
+; CHECK:       unreachabledefault:
+; CHECK-NEXT:    unreachable
+; CHECK:       exit:
+; CHECK-NEXT:    [[TMP2:%.*]] = phi i64 [ 5, [[BRANCH_0]] ], [ 9, [[BRANCH_2]] ], [ 2, [[BRANCH_4]] ], [ 7, [[BRANCH_6]] ]
+; CHECK-NEXT:    ret i64 [[TMP2]]
+;
+  switch i64 %0, label %unreachabledefault [
+  i64 0, label %branch_0
+  i64 2, label %branch_2
+  i64 4, label %branch_4
+  i64 6, label %branch_6
+  ]
+
+branch_0:
+  br label %exit
+
+branch_2:
+  br label %exit
+
+branch_4:
+  br label %exit
+
+branch_6:
+  br label %exit
+
+unreachabledefault:
+  unreachable
+
+exit:
+  %2 = phi i64 [ 5, %branch_0 ], [ 9, %branch_2 ], [ 2, %branch_4 ], [ 7, %branch_6 ]
+  ret i64 %2
+}
diff --git a/llvm/test/Transforms/Inline/inline-switch-default.ll b/llvm/test/Transforms/Inline/inline-switch-default.ll
new file mode 100644
index 00000000000000..6271d6c06dcc6f
--- /dev/null
+++ b/llvm/test/Transforms/Inline/inline-switch-default.ll
@@ -0,0 +1,216 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+; RUN: opt %s -S -passes=inline -inline-threshold=26 -min-jump-table-entries=4 | FileCheck %s -check-prefix=LOOKUPTABLE
+; RUN: opt %s -S -passes=inline -inline-threshold=21 -min-jump-table-entries=5 | FileCheck %s -check-prefix=SWITCH
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+; The `bar1` should not be inlined since there is a default branch.
+
+define i64 @foo1(i64 %0) {
+; LOOKUPTABLE-LABEL: define i64 @foo1(
+; LOOKUPTABLE-SAME: i64 [[TMP0:%.*]]) {
+; LOOKUPTABLE-NEXT:    [[TMP2:%.*]] = call i64 @bar1(i64 [[TMP0]])
+; LOOKUPTABLE-NEXT:    ret i64 [[TMP2]]
+;
+; SWITCH-LABEL: define i64 @foo1(
+; SWITCH-SAME: i64 [[TMP0:%.*]]) {
+; SWITCH-NEXT:    [[TMP2:%.*]] = call i64 @bar1(i64 [[TMP0]])
+; SWITCH-NEXT:    ret i64 [[TMP2]]
+;
+  %2 = call i64 @bar1(i64 %0)
+  ret i64 %2
+}
+
+define i64 @foo2(i64 %0) {
+; LOOKUPTABLE-LABEL: define i64 @foo2(
+; LOOKUPTABLE-SAME: i64 [[TMP0:%.*]]) {
+; LOOKUPTABLE-NEXT:    switch i64 [[TMP0]], label [[UNREACHABLEDEFAULT_I:%.*]] [
+; LOOKUPTABLE-NEXT:      i64 0, label [[BRANCH_0_I:%.*]]
+; LOOKUPTABLE-NEXT:      i64 2, label [[BRANCH_2_I:%.*]]
+; LOOKUPTABLE-NEXT:      i64 4, label [[BRANCH_4_I:%.*]]
+; LOOKUPTABLE-NEXT:      i64 6, label [[BRANCH_6_I:%.*]]
+; LOOKUPTABLE-NEXT:    ]
+; LOOKUPTABLE:       branch_0.i:
+; LOOKUPTABLE-NEXT:    br label [[BAR2_EXIT:%.*]]
+; LOOKUPTABLE:       branch_2.i:
+; LOOKUPTABLE-NEXT:    br label [[BAR2_EXIT]]
+; LOOKUPTABLE:       branch_4.i:
+; LOOKUPTABLE-NEXT:    br label [[BAR2_EXIT]]
+; LOOKUPTABLE:       branch_6.i:
+; LOOKUPTABLE-NEXT:    br label [[BAR2_EXIT]]
+; LOOKUPTABLE:       unreachabledefault.i:
+; LOOKUPTABLE-NEXT:    unreachable
+; LOOKUPTABLE:       bar2.exit:
+; LOOKUPTABLE-NEXT:    [[TMP2:%.*]] = phi i64 [ 5, [[BRANCH_0_I]] ], [ 9, [[BRANCH_2_I]] ], [ 2, [[BRANCH_4_I]] ], [ 7, [[BRANCH_6_I]] ]
+; LOOKUPTABLE-NEXT:    ret i64 [[TMP2]]
+;
+; SWITCH-LABEL: define i64 @foo2(
+; SWITCH-SAME: i64 [[TMP0:%.*]]) {
+; SWITCH-NEXT:    switch i64 [[TMP0]], label [[UNREACHABLEDEFAULT_I:%.*]] [
+; SWITCH-NEXT:      i64 0, label [[BRANCH_0_I:%.*]]
+; SWITCH-NEXT:      i64 2, label [[BRANCH_2_I:%.*]]
+; SWITCH-NEXT:      i64 4, label [[BRANCH_4_I:%.*]]
+; SWITCH-NEXT:      i64 6, label [[BRANCH_6_I:%.*]]
+; SWITCH-NEXT:    ]
+; SWITCH:       branch_0.i:
+; SWITCH-NEXT:    br label [[BAR2_EXIT:%.*]]
+; SWITCH:       branch_2.i:
+; SWITCH-NEXT:    br label [[BAR2_EXIT]]
+; SWITCH:       branch_4.i:
+; SWITCH-NEXT:    br label [[BAR2_EXIT]]
+; SWITCH:       branch_6.i:
+; SWITCH-NEXT:    br label [[BAR2_EXIT]]
+; SWITCH:       unreachabledefault.i:
+; SWITCH-NEXT:    unreachable
+; SWITCH:       bar2.exit:
+; SWITCH-NEXT:    [[TMP2:%.*]] = phi i64 [ 5, [[BRANCH_0_I]] ], [ 9, [[BRANCH_2_I]] ], [ 2, [[BRANCH_4_I]] ], [ 7, [[BRANCH_6_I]] ]
+; SWITCH-NEXT:    ret i64 [[TMP2]]
+;
+  %2 = call i64 @bar2(i64 %0)
+  ret i64 %2
+}
+
+define i64 @bar1(i64 %0) {
+; LOOKUPTABLE-LABEL: define i64 @bar1(
+; LOOKUPTABLE-SAME: i64 [[TMP0:%.*]]) {
+; LOOKUPTABLE-NEXT:    switch i64 [[TMP0]], label [[DEFAULT_BRANCH:%.*]] [
+; LOOKUPTABLE-NEXT:      i64 0, label [[BRANCH_0:%.*]]
+; LOOKUPTABLE-NEXT:      i64 2, label [[BRANCH_2:%.*]]
+; LOOKUPTABLE-NEXT:      i64 4, label [[BRANCH_4:%.*]]
+; LOOKUPTABLE-NEXT:      i64 6, label [[BRANCH_6:%.*]]
+; LOOKUPTABLE-NEXT:    ]
+; LOOKUPTABLE:       branch_0:
+; LOOKUPTABLE-NEXT:    br label [[EXIT:%.*]]
+; LOOKUPTABLE:       branch_2:
+; LOOKUPTABLE-NEXT:    br label [[EXIT]]
+; LOOKUPTABLE:       branch_4:
+; LOOKUPTABLE-NEXT:    br label [[EXIT]]
+; LOOKUPTABLE:       branch_6:
+; LOOKUPTABLE-NEXT:    br label [[EXIT]]
+; LOOKUPTABLE:       default_branch:
+; LOOKUPTABLE-NEXT:    br label [[EXIT]]
+; LOOKUPTABLE:       exit:
+; LOOKUPTABLE-NEXT:    [[TMP2:%.*]] = phi i64 [ 5, [[BRANCH_0]] ], [ 9, [[BRANCH_2]] ], [ 2, [[BRANCH_4]] ], [ 7, [[BRANCH_6]] ], [ 3, [[DEFAULT_BRANCH]] ]
+; LOOKUPTABLE-NEXT:    ret i64 [[TMP2]]
+;
+; SWITCH-LABEL: define i64 @bar1(
+; SWITCH-SAME: i64 [[TMP0:%.*]]) {
+; SWITCH-NEXT:    switch i64 [[TMP0]], label [[DEFAULT_BRANCH:%.*]] [
+; SWITCH-NEXT:      i64 0, label [[BRANCH_0:%.*]]
+; SWITCH-NEX...
[truncated]

@dianqk dianqk requested review from arsenm, nikic and dtcxzyw January 12, 2024 00:54
Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm except for the naming documentation of the helper

@@ -3505,6 +3506,10 @@ class SwitchInst : public Instruction {
return cast<BasicBlock>(getOperand(1));
}

bool defaultDestIsUnreachable() const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add documentation comment. I think the name is misleading too. The code isn't unreachable, and it only accepts empty blocks. How about something like defaultDestIsEmpty? defaultDestUndefined?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reminder. I prefer defaultDestUndefined.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request Jan 17, 2024
@dtcxzyw
Copy link
Member

dtcxzyw commented Jan 17, 2024

Emm, looks like a lot of callees cannot be inlined after this patch.
I'd like to decrease the inline cost of switch w/ unreachable default branch cases, instead of increasing the cost of other common cases.

@dianqk
Copy link
Member Author

dianqk commented Jan 17, 2024

Emm, looks like a lot of callees cannot be inlined after this patch. I'd like to decrease the inline cost of switch w/ unreachable default branch cases, instead of increasing the cost of other common cases.

Indeed, this is different from what was initially assumed. But from the code, I think there is a real omission of default branch handling here. I was just considering adding a bias experiment, but this would cause the consumption of small no-default branches to go negative.

@@ -1153,6 +1154,7 @@ class InlineCostFeaturesAnalyzer final : public CallAnalyzer {
// heuristics in the ML inliner.
static constexpr int JTCostMultiplier = 4;
static constexpr int CaseClusterCostMultiplier = 2;
static constexpr int SwitchDefaultDestCostMultiplier = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defined this constant but hardcoded 2 above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just copied it from CaseClusterCostMultiplier.

@dianqk
Copy link
Member Author

dianqk commented Feb 8, 2024

Emm, looks like a lot of callees cannot be inlined after this patch. I'd like to decrease the inline cost of switch w/ unreachable default branch cases, instead of increasing the cost of other common cases.

Indeed, this is different from what was initially assumed. But from the code, I think there is a real omission of default branch handling here. I was just considering adding a bias experiment, but this would cause the consumption of small no-default branches to go negative.

@dtcxzyw
Any more comments? At least for now, this is because we forgot about the default branching situation earlier.

Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@dianqk dianqk merged commit 5932fcc into llvm:main Feb 11, 2024
@dianqk dianqk deleted the inline-cost-switch-default branch February 11, 2024 10:25
@nikic
Copy link
Contributor

nikic commented Feb 11, 2024

FWIW, it seems like this causes a minor performance regression in stage2 clang builds (https://llvm-compile-time-tracker.com/compare.php?from=5aec9392674572fa5a06283173a6a739742d261d&to=5932fcc47855fdd209784f38820422d2369b84b2&stat=instructions:u). Though code size does reduce as well.

@dianqk
Copy link
Member Author

dianqk commented Feb 11, 2024

FWIW, it seems like this causes a minor performance regression in stage2 clang builds (https://llvm-compile-time-tracker.com/compare.php?from=5aec9392674572fa5a06283173a6a739742d261d&to=5932fcc47855fdd209784f38820422d2369b84b2&stat=instructions:u). Though code size does reduce as well.

Do you think we need to revert? If not, I can create an issue to track it.

@dtcxzyw
Copy link
Member

dtcxzyw commented Feb 12, 2024

Can we slightly increase the inline threshold to fix the regression?

@nikic
Copy link
Contributor

nikic commented Feb 13, 2024

We need to understand what is no longer getting inlined and see whether there is either some optimization of cost modelling improvement we can make to compensate. (E.g. I think doing switch-to-arithmetic conversion pre-inline would be worthwhile.)

@dianqk
Copy link
Member Author

dianqk commented Feb 14, 2024

Can we slightly increase the inline threshold to fix the regression?

I will consider trying it.

We need to understand what is no longer getting inlined and see whether there is either some optimization of cost modelling improvement we can make to compensate. (E.g. I think doing switch-to-arithmetic conversion pre-inline would be worthwhile.)

Yes, the example makes sense. I have created an issue for this and I will try to investigate at the end of the traveling.

@topperc
Copy link
Collaborator

topperc commented Mar 12, 2024

Someone just reported to me internally that this caused a benchmark regression in our downstream.

I wonder if the default case was originally missing because it should just be a fallthrough after all the other cases are picked off. It shouldn't have any code. Given N destinations for a switch you only need N-1 compares and branches. Maybe an unreachable default should reduce the number of case clusters since one case cluster check wouldn't be needed? Though I'm not sure exactly what the backend generates.

@dianqk
Copy link
Member Author

dianqk commented Mar 13, 2024

Someone just reported to me internally that this caused a benchmark regression in our downstream.

I have created #81723 to track some regressions. I haven't started my research yet because this case is too big.
Can you provide a reproducible example? (It's okay if you can't.)

I wonder if the default case was originally missing because it should just be a fallthrough after all the other cases are picked off. It shouldn't have any code. Given N destinations for a switch you only need N-1 compares and branches. Maybe an unreachable default should reduce the number of case clusters since one case cluster check wouldn't be needed? Though I'm not sure exactly what the backend generates.

I believe that we will create extra instructions if the default branch is reachable. See https://llvm.godbolt.org/z/x7azoP7eY.

bar1: # @bar1
  cmp rdi, 6
  ja .LBB0_6
  ...

This regression is likely coming from the select lookup vs compare instructions if I guess it right. See https://llvm.godbolt.org/z/cYdoef4cT.

I will prioritize the issue because it's becoming important. :) I'll report back my progress this weekend.

@topperc
Copy link
Collaborator

topperc commented Mar 13, 2024

Someone just reported to me internally that this caused a benchmark regression in our downstream.

I have created #81723 to track some regressions. I haven't started my research yet because this case is too big. Can you provide a reproducible example? (It's okay if you can't.)

I wonder if the default case was originally missing because it should just be a fallthrough after all the other cases are picked off. It shouldn't have any code. Given N destinations for a switch you only need N-1 compares and branches. Maybe an unreachable default should reduce the number of case clusters since one case cluster check wouldn't be needed? Though I'm not sure exactly what the backend generates.

I believe that we will create extra instructions if the default branch is reachable. See https://llvm.godbolt.org/z/x7azoP7eY.

bar1: # @bar1
  cmp rdi, 6
  ja .LBB0_6
  ...

This regression is likely coming from the select lookup vs compare instructions if I guess it right. See https://llvm.godbolt.org/z/cYdoef4cT.

I will prioritize the issue because it's becoming important. :) I'll report back my progress this weekend.

You're right that cmp/ja does come from the default when we use jump table lowering. There is 4*InstrCost on this line that may been intended to account for the cmp+ja+load+jmp? Otherwise I'm not sure why 4.

      int64_t JTCost =
          static_cast<int64_t>(JumpTableSize) * InstrCost + 4 * InstrCost;

@dianqk
Copy link
Member Author

dianqk commented Mar 14, 2024

I think that's probably it. I believe this code is all based on the existence of default branching considerations. Could you try with b74b7b1 and 0a52558 in #85160? Though I haven't gotten around to add comments yet.

dianqk added a commit that referenced this pull request May 5, 2024
…#85160)

Fixes #81723.

The earliest commit of the related code is:
919f9e8.
I tried to understand the following code with
#77856 (comment).

https://github.com/llvm/llvm-project/blob/5932fcc47855fdd209784f38820422d2369b84b2/llvm/lib/Analysis/InlineCost.cpp#L709-L720

I think only scenarios where there is a default branch were considered.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants