Skip to content

Commit 651e644

Browse files
committed
[BPF] Replace BPFMIPeepholeTruncElim by custom logic in isZExtFree()
Replace `BPFMIPeepholeTruncElim` by adding an overload for `TargetLowering::isZExtFree()` aware that zero extension is free for `ISD::LOAD`. Short description ================= The `BPFMIPeepholeTruncElim` handles two patterns: Pattern #1: %1 = LDB %0, ... %1 = LDB %0, ... %2 = AND_ri %1, 0xff -> %2 = MOV_ri %1 <-- (!) Pattern #2: bb.1: bb.1: %a = LDB %0, ... %a = LDB %0, ... br %bb3 br %bb3 bb.2: bb.2: %b = LDB %0, ... -> %b = LDB %0, ... br %bb3 br %bb3 bb.3: bb.3: %1 = PHI %a, %b %1 = PHI %a, %b %2 = AND_ri %1, 0xff %2 = MOV_ri %1 <-- (!) Plus variations: - AND_ri_32 instead of AND_ri - SLL/SLR instead of AND_ri - LDH, LDW, LDB32, LDH32, LDW32 Both patterns could be handled by built-in transformations at instruction selection phase if suitable `isZExtFree()` implementation is provided. The idea is borrowed from `ARMTargetLowering::isZExtFree`. When evaluating on BPF kernel selftests and remove_truncate_*.ll LLVM test cases this revisions performs slightly better than BPFMIPeepholeTruncElim, see "Impact" section below for details. Commit also adds a few test cases to make sure that patterns in question are handled. Long description ================ Why this works: Pattern #1 -------------------------- Consider the following example: define i1 @foo(ptr %p) { entry: %a = load i8, ptr %p, align 1 %cond = icmp eq i8 %a, 0 ret i1 %cond } Log for `llc -mcpu=v2 -mtriple=bpfel -debug-only=isel` command: ... Type-legalized selection DAG: %bb.0 'foo:entry' SelectionDAG has 13 nodes: t0: ch,glue = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %0 t16: i64,ch = load<(load (s8) from %ir.p), anyext from i8> t0, t2, undef:i64 t19: i64 = and t16, Constant:i64<255> t17: i64 = setcc t19, Constant:i64<0>, seteq:ch t11: ch,glue = CopyToReg t0, Register:i64 $r0, t17 t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1 ... Replacing.1 t19: i64 = and t16, Constant:i64<255> With: t16: i64,ch = load<(load (s8) from %ir.p), anyext from i8> t0, t2, undef:i64 and 0 other values ... Optimized type-legalized selection DAG: %bb.0 'foo:entry' SelectionDAG has 11 nodes: t0: ch,glue = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %0 t20: i64,ch = load<(load (s8) from %ir.p), zext from i8> t0, t2, undef:i64 t17: i64 = setcc t20, Constant:i64<0>, seteq:ch t11: ch,glue = CopyToReg t0, Register:i64 $r0, t17 t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1 ... Note: - Optimized type-legalized selection DAG: - `t19 = and t16, 255` had been replaced by `t16` (load). - Patterns like `(and (load ... i8), 255)` are replaced by `load` in `DAGCombiner::BackwardsPropagateMask` called from `DAGCombiner::visitAND`. - Similarly patterns like `(shl (srl ..., 56), 56)` are replaced by `(and ..., 255)` in `DAGCombiner::visitSRL` (this function is huge, look for `TLI.shouldFoldConstantShiftPairToMask()` call). Why this works: Pattern #2 -------------------------- Consider the following example: define i1 @foo(ptr %p) { entry: %a = load i8, ptr %p, align 1 br label %next next: %cond = icmp eq i8 %a, 0 ret i1 %cond } Consider log for `llc -mcpu=v2 -mtriple=bpfel -debug-only=isel` command. Log for first basic block: Initial selection DAG: %bb.0 'foo:entry' SelectionDAG has 9 nodes: t0: ch,glue = EntryToken t3: i64 = Constant<0> t2: i64,ch = CopyFromReg t0, Register:i64 %1 t5: i8,ch = load<(load (s8) from %ir.p)> t0, t2, undef:i64 t6: i64 = zero_extend t5 t8: ch = CopyToReg t0, Register:i64 %0, t6 ... Replacing.1 t6: i64 = zero_extend t5 With: t9: i64,ch = load<(load (s8) from %ir.p), zext from i8> t0, t2, undef:i64 and 0 other values ... Optimized lowered selection DAG: %bb.0 'foo:entry' SelectionDAG has 7 nodes: t0: ch,glue = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %1 t9: i64,ch = load<(load (s8) from %ir.p), zext from i8> t0, t2, undef:i64 t8: ch = CopyToReg t0, Register:i64 %0, t9 Note: - Initial selection DAG: - `%a = load ...` is lowered as `t6 = (zero_extend (load ...))` w/o special `isZExtFree()` overload added by this commit it is instead lowered as `t6 = (any_extend (load ...))`. - The decision to generate `zero_extend` or `any_extend` is done in `RegsForValue::getCopyToRegs` called from `SelectionDAGBuilder::CopyValueToVirtualRegister`: - if `isZExtFree()` for load returns true `zero_extend` is used; - `any_extend` is used otherwise. - Optimized lowered selection DAG: - `t6 = (any_extend (load ...))` is replaced by `t9 = load ..., zext from i8` This is done by `DagCombiner.cpp:tryToFoldExtOfLoad()` called from `DAGCombiner::visitZERO_EXTEND`. Log for second basic block: Initial selection DAG: %bb.1 'foo:next' SelectionDAG has 13 nodes: t0: ch,glue = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %0 t4: i64 = AssertZext t2, ValueType:ch:i8 t5: i8 = truncate t4 t8: i1 = setcc t5, Constant:i8<0>, seteq:ch t9: i64 = any_extend t8 t11: ch,glue = CopyToReg t0, Register:i64 $r0, t9 t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1 ... Replacing.2 t18: i64 = and t4, Constant:i64<255> With: t4: i64 = AssertZext t2, ValueType:ch:i8 ... Type-legalized selection DAG: %bb.1 'foo:next' SelectionDAG has 13 nodes: t0: ch,glue = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %0 t4: i64 = AssertZext t2, ValueType:ch:i8 t18: i64 = and t4, Constant:i64<255> t16: i64 = setcc t18, Constant:i64<0>, seteq:ch t11: ch,glue = CopyToReg t0, Register:i64 $r0, t16 t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1 ... Optimized type-legalized selection DAG: %bb.1 'foo:next' SelectionDAG has 11 nodes: t0: ch,glue = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %0 t4: i64 = AssertZext t2, ValueType:ch:i8 t16: i64 = setcc t4, Constant:i64<0>, seteq:ch t11: ch,glue = CopyToReg t0, Register:i64 $r0, t16 t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1 ... Note: - Initial selection DAG: - `t0` is an input value for this basic block, it corresponds load instruction (`t9`) from the first basic block. - It is accessed within basic block via `t4` (AssertZext (CopyFromReg t0, ...)). - The `AssertZext` is generated by RegsForValue::getCopyFromRegs called from SelectionDAGBuilder::getCopyFromRegs, it is generated only when `LiveOutInfo` with known number of leading zeros is present for `t0`. - Known register bits in `LiveOutInfo` are computed by `SelectionDAG::computeKnownBits` called from `SelectionDAGISel::ComputeLiveOutVRegInfo`. - `computeKnownBits()` generates leading zeros information for `(load ..., zext from ...)` but *does not* generate leading zeros information for `(load ..., anyext from ...)`. This is why `isZExtFree()` added in this commit is important. - Type-legalized selection DAG: - `t5 = truncate t4` is replaced by `t18 = and t4, 255` - Optimized type-legalized selection DAG: - `t18 = and t4, 255` is replaced by `t4`, this is done by `DAGCombiner::SimplifyDemandedBits` called from `DAGCombiner::visitAND`, which simplifies patterns like `(and (assertzext ...))` Impact ------ This change covers all remove_truncate_*.ll test cases: - for -mcpu=v4 there are no changes in the generated code; - for -mcpu=v2 code generated for remove_truncate_7 and remove_truncate_8 improved slightly, for other tests it is unchanged. For remove_truncate_7: Before this revision After this revision -------------------- ------------------- r1 <<= 0x20 r1 <<= 0x20 r1 >>= 0x20 r1 >>= 0x20 if r1 == 0x0 goto +0x2 <LBB0_2> if r1 == 0x0 goto +0x2 <LBB0_2> r1 = *(u32 *)(r2 + 0x0) r0 = *(u32 *)(r2 + 0x0) goto +0x1 <LBB0_3> goto +0x1 <LBB0_3> <LBB0_2>: <LBB0_2>: r1 = *(u32 *)(r2 + 0x4) r0 = *(u32 *)(r2 + 0x4) <LBB0_3>: <LBB0_3>: r0 = r1 exit exit For remove_truncate_8: Before this revision After this revision -------------------- ------------------- r2 = *(u32 *)(r1 + 0x0) r2 = *(u32 *)(r1 + 0x0) r3 = r2 r3 = r2 r3 <<= 0x20 r3 <<= 0x20 r4 = r3 r3 s>>= 0x20 r4 s>>= 0x20 if r4 s> 0x2 goto +0x5 <LBB0_3> if r3 s> 0x2 goto +0x4 <LBB0_3> r4 = *(u32 *)(r1 + 0x4) r3 = *(u32 *)(r1 + 0x4) r3 >>= 0x20 if r3 >= r4 goto +0x2 <LBB0_3> if r2 >= r3 goto +0x2 <LBB0_3> r2 += 0x2 r2 += 0x2 *(u32 *)(r1 + 0x0) = r2 *(u32 *)(r1 + 0x0) = r2 <LBB0_3>: <LBB0_3>: r0 = 0x3 r0 = 0x3 exit exit For kernel BPF selftests statistics is as follows: (-mcpu=v4): - For -mcpu=v4: 9 out of 655 object files have differences, in all cases total number of instructions marginally decreased (-27 instructions). - For -mcpu=v2: 9 out of 655 object files have differences: - For 19 object files number of instruction decreased (-129 instruction in total): some redundant `rX &= 0xffff` and register to register assignments removed; - For 2 object files number of instructions increased +2 instructions in each file. Both -mcpu=v2 instruction increases could be reduced to the same example: define void @foo(ptr %p) { entry: %a = load i32, ptr %p, align 4 %b = sext i32 %a to i64 %c = icmp ult i64 1, %b br i1 %c, label %next, label %end next: call void inttoptr (i64 62 to ptr)(i32 %a) br label %end end: ret void } Note that this example uses value loaded to `%a` both as a sign extended (`%b`) and as zero extended (`%a` passed as parameter). Here is the difference in final assembly code: Before this revision After this revision -------------------- ------------------- r1 = *(u32 *)(r1 + 0) r1 = *(u32 *)(r1 + 0) r1 <<= 32 r1 <<= 32 r1 s>>= 32 r1 s>>= 32 if r1 < 2 goto <LBB0_2> if r1 < 2 goto <LBB0_2> r1 <<= 32 r1 >>= 32 call 62 call 62 <LBB0_2>: <LBB0_2>: exit exit Before this commit `%a` is passed to call as a sign extended value, after this commit `%a` is passed to call as a zero extended value, both are correct as 32-bit sub-register is the same. The difference comes from `DAGCombiner` operation on the initial DAG: Initial selection DAG before this commit: t5: i32,ch = load<(load (s32) from %ir.p)> t0, t2, undef:i64 t6: i64 = any_extend t5 <--------------------- (1) t8: ch = CopyToReg t0, Register:i64 %0, t6 t9: i64 = sign_extend t5 t12: i1 = setcc Constant:i64<1>, t9, setult:ch Initial selection DAG after this commit: t5: i32,ch = load<(load (s32) from %ir.p)> t0, t2, undef:i64 t6: i64 = zero_extend t5 <--------------------- (2) t8: ch = CopyToReg t0, Register:i64 %0, t6 t9: i64 = sign_extend t5 t12: i1 = setcc Constant:i64<1>, t9, setult:ch The node `t9` is processed before node `t6` and `load` instruction is combined to load with sign extension: Replacing.1 t9: i64 = sign_extend t5 With: t30: i64,ch = load<(load (s32) from %ir.p), sext from i32> t0, t2, undef:i64 and 0 other values Replacing.1 t5: i32,ch = load<(load (s32) from %ir.p)> t0, t2, undef:i64 With: t31: i32 = truncate t30 and 1 other values This is done by `DAGCombiner.cpp:tryToFoldExtOfLoad` called from `DAGCombiner::visitSIGN_EXTEND`. Note that `t5` is used by `t6` which is `any_extend` in (1) and `zero_extend` in (2). `tryToFoldExtOfLoad()` rewrites such uses of `t5` differently: - `any_extend` is simply removed - `zero_extend` is replaced by `and t30, 0xffffffff`, which is later converted to a pair of shifts. This pair of shifts survives till the end of translation. Differential Revision: https://reviews.llvm.org/D157870
1 parent 48e0a6f commit 651e644

File tree

6 files changed

+95
-182
lines changed

6 files changed

+95
-182
lines changed

llvm/lib/Target/BPF/BPF.h

+1-3
Original file line numberDiff line numberDiff line change
@@ -23,14 +23,12 @@ ModulePass *createBPFCheckAndAdjustIR();
2323
FunctionPass *createBPFISelDag(BPFTargetMachine &TM);
2424
FunctionPass *createBPFMISimplifyPatchablePass();
2525
FunctionPass *createBPFMIPeepholePass();
26-
FunctionPass *createBPFMIPeepholeTruncElimPass();
2726
FunctionPass *createBPFMIPreEmitPeepholePass();
2827
FunctionPass *createBPFMIPreEmitCheckingPass();
2928

3029
void initializeBPFCheckAndAdjustIRPass(PassRegistry&);
3130
void initializeBPFDAGToDAGISelPass(PassRegistry &);
32-
void initializeBPFMIPeepholePass(PassRegistry&);
33-
void initializeBPFMIPeepholeTruncElimPass(PassRegistry &);
31+
void initializeBPFMIPeepholePass(PassRegistry &);
3432
void initializeBPFMIPreEmitCheckingPass(PassRegistry&);
3533
void initializeBPFMIPreEmitPeepholePass(PassRegistry &);
3634
void initializeBPFMISimplifyPatchablePass(PassRegistry &);

llvm/lib/Target/BPF/BPFISelLowering.cpp

+12
Original file line numberDiff line numberDiff line change
@@ -224,6 +224,18 @@ bool BPFTargetLowering::isZExtFree(EVT VT1, EVT VT2) const {
224224
return NumBits1 == 32 && NumBits2 == 64;
225225
}
226226

227+
bool BPFTargetLowering::isZExtFree(SDValue Val, EVT VT2) const {
228+
EVT VT1 = Val.getValueType();
229+
if (Val.getOpcode() == ISD::LOAD && VT1.isSimple() && VT2.isSimple()) {
230+
MVT MT1 = VT1.getSimpleVT().SimpleTy;
231+
MVT MT2 = VT2.getSimpleVT().SimpleTy;
232+
if ((MT1 == MVT::i8 || MT1 == MVT::i16 || MT1 == MVT::i32) &&
233+
(MT2 == MVT::i32 || MT2 == MVT::i64))
234+
return true;
235+
}
236+
return TargetLoweringBase::isZExtFree(Val, VT2);
237+
}
238+
227239
BPFTargetLowering::ConstraintType
228240
BPFTargetLowering::getConstraintType(StringRef Constraint) const {
229241
if (Constraint.size() == 1) {

llvm/lib/Target/BPF/BPFISelLowering.h

+1
Original file line numberDiff line numberDiff line change
@@ -144,6 +144,7 @@ class BPFTargetLowering : public TargetLowering {
144144
// For 32bit ALU result zext to 64bit is free.
145145
bool isZExtFree(Type *Ty1, Type *Ty2) const override;
146146
bool isZExtFree(EVT VT1, EVT VT2) const override;
147+
bool isZExtFree(SDValue Val, EVT VT2) const override;
147148

148149
unsigned EmitSubregExt(MachineInstr &MI, MachineBasicBlock *BB, unsigned Reg,
149150
bool isSigned) const;

llvm/lib/Target/BPF/BPFMIPeephole.cpp

-177
Original file line numberDiff line numberDiff line change
@@ -606,180 +606,3 @@ FunctionPass* llvm::createBPFMIPreEmitPeepholePass()
606606
{
607607
return new BPFMIPreEmitPeephole();
608608
}
609-
610-
STATISTIC(TruncElemNum, "Number of truncation eliminated");
611-
612-
namespace {
613-
614-
struct BPFMIPeepholeTruncElim : public MachineFunctionPass {
615-
616-
static char ID;
617-
const BPFInstrInfo *TII;
618-
MachineFunction *MF;
619-
MachineRegisterInfo *MRI;
620-
621-
BPFMIPeepholeTruncElim() : MachineFunctionPass(ID) {
622-
initializeBPFMIPeepholeTruncElimPass(*PassRegistry::getPassRegistry());
623-
}
624-
625-
private:
626-
// Initialize class variables.
627-
void initialize(MachineFunction &MFParm);
628-
629-
bool eliminateTruncSeq();
630-
631-
public:
632-
633-
// Main entry point for this pass.
634-
bool runOnMachineFunction(MachineFunction &MF) override {
635-
if (skipFunction(MF.getFunction()))
636-
return false;
637-
638-
initialize(MF);
639-
640-
return eliminateTruncSeq();
641-
}
642-
};
643-
644-
static bool TruncSizeCompatible(int TruncSize, unsigned opcode)
645-
{
646-
if (TruncSize == 1)
647-
return opcode == BPF::LDB || opcode == BPF::LDB32;
648-
649-
if (TruncSize == 2)
650-
return opcode == BPF::LDH || opcode == BPF::LDH32;
651-
652-
if (TruncSize == 4)
653-
return opcode == BPF::LDW || opcode == BPF::LDW32;
654-
655-
return false;
656-
}
657-
658-
// Initialize class variables.
659-
void BPFMIPeepholeTruncElim::initialize(MachineFunction &MFParm) {
660-
MF = &MFParm;
661-
MRI = &MF->getRegInfo();
662-
TII = MF->getSubtarget<BPFSubtarget>().getInstrInfo();
663-
LLVM_DEBUG(dbgs() << "*** BPF MachineSSA TRUNC Elim peephole pass ***\n\n");
664-
}
665-
666-
// Reg truncating is often the result of 8/16/32bit->64bit or
667-
// 8/16bit->32bit conversion. If the reg value is loaded with
668-
// masked byte width, the AND operation can be removed since
669-
// BPF LOAD already has zero extension.
670-
//
671-
// This also solved a correctness issue.
672-
// In BPF socket-related program, e.g., __sk_buff->{data, data_end}
673-
// are 32-bit registers, but later on, kernel verifier will rewrite
674-
// it with 64-bit value. Therefore, truncating the value after the
675-
// load will result in incorrect code.
676-
bool BPFMIPeepholeTruncElim::eliminateTruncSeq() {
677-
MachineInstr* ToErase = nullptr;
678-
bool Eliminated = false;
679-
680-
for (MachineBasicBlock &MBB : *MF) {
681-
for (MachineInstr &MI : MBB) {
682-
// The second insn to remove if the eliminate candidate is a pair.
683-
MachineInstr *MI2 = nullptr;
684-
Register DstReg, SrcReg;
685-
MachineInstr *DefMI;
686-
int TruncSize = -1;
687-
688-
// If the previous instruction was marked for elimination, remove it now.
689-
if (ToErase) {
690-
ToErase->eraseFromParent();
691-
ToErase = nullptr;
692-
}
693-
694-
// AND A, 0xFFFFFFFF will be turned into SLL/SRL pair due to immediate
695-
// for BPF ANDI is i32, and this case only happens on ALU64.
696-
if (MI.getOpcode() == BPF::SRL_ri &&
697-
MI.getOperand(2).getImm() == 32) {
698-
SrcReg = MI.getOperand(1).getReg();
699-
if (!MRI->hasOneNonDBGUse(SrcReg))
700-
continue;
701-
702-
MI2 = MRI->getVRegDef(SrcReg);
703-
DstReg = MI.getOperand(0).getReg();
704-
705-
if (!MI2 ||
706-
MI2->getOpcode() != BPF::SLL_ri ||
707-
MI2->getOperand(2).getImm() != 32)
708-
continue;
709-
710-
// Update SrcReg.
711-
SrcReg = MI2->getOperand(1).getReg();
712-
DefMI = MRI->getVRegDef(SrcReg);
713-
if (DefMI)
714-
TruncSize = 4;
715-
} else if (MI.getOpcode() == BPF::AND_ri ||
716-
MI.getOpcode() == BPF::AND_ri_32) {
717-
SrcReg = MI.getOperand(1).getReg();
718-
DstReg = MI.getOperand(0).getReg();
719-
DefMI = MRI->getVRegDef(SrcReg);
720-
721-
if (!DefMI)
722-
continue;
723-
724-
int64_t imm = MI.getOperand(2).getImm();
725-
if (imm == 0xff)
726-
TruncSize = 1;
727-
else if (imm == 0xffff)
728-
TruncSize = 2;
729-
}
730-
731-
if (TruncSize == -1)
732-
continue;
733-
734-
// The definition is PHI node, check all inputs.
735-
if (DefMI->isPHI()) {
736-
bool CheckFail = false;
737-
738-
for (unsigned i = 1, e = DefMI->getNumOperands(); i < e; i += 2) {
739-
MachineOperand &opnd = DefMI->getOperand(i);
740-
if (!opnd.isReg()) {
741-
CheckFail = true;
742-
break;
743-
}
744-
745-
MachineInstr *PhiDef = MRI->getVRegDef(opnd.getReg());
746-
if (!PhiDef || PhiDef->isPHI() ||
747-
!TruncSizeCompatible(TruncSize, PhiDef->getOpcode())) {
748-
CheckFail = true;
749-
break;
750-
}
751-
}
752-
753-
if (CheckFail)
754-
continue;
755-
} else if (!TruncSizeCompatible(TruncSize, DefMI->getOpcode())) {
756-
continue;
757-
}
758-
759-
BuildMI(MBB, MI, MI.getDebugLoc(), TII->get(BPF::MOV_rr), DstReg)
760-
.addReg(SrcReg);
761-
762-
if (MI2)
763-
MI2->eraseFromParent();
764-
765-
// Mark it to ToErase, and erase in the next iteration.
766-
ToErase = &MI;
767-
TruncElemNum++;
768-
Eliminated = true;
769-
}
770-
}
771-
772-
return Eliminated;
773-
}
774-
775-
} // end default namespace
776-
777-
INITIALIZE_PASS(BPFMIPeepholeTruncElim, "bpf-mi-trunc-elim",
778-
"BPF MachineSSA Peephole Optimization For TRUNC Eliminate",
779-
false, false)
780-
781-
char BPFMIPeepholeTruncElim::ID = 0;
782-
FunctionPass* llvm::createBPFMIPeepholeTruncElimPass()
783-
{
784-
return new BPFMIPeepholeTruncElim();
785-
}

llvm/lib/Target/BPF/BPFTargetMachine.cpp

-2
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,6 @@ extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeBPFTarget() {
4242
PassRegistry &PR = *PassRegistry::getPassRegistry();
4343
initializeBPFCheckAndAdjustIRPass(PR);
4444
initializeBPFMIPeepholePass(PR);
45-
initializeBPFMIPeepholeTruncElimPass(PR);
4645
initializeBPFDAGToDAGISelPass(PR);
4746
}
4847

@@ -155,7 +154,6 @@ void BPFPassConfig::addMachineSSAOptimization() {
155154
if (!DisableMIPeephole) {
156155
if (Subtarget->getHasAlu32())
157156
addPass(createBPFMIPeepholePass());
158-
addPass(createBPFMIPeepholeTruncElimPass());
159157
}
160158
}
161159

+81
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
; RUN: llc -mcpu=v2 -march=bpf < %s | FileCheck %s
2+
; RUN: llc -mcpu=v4 -march=bpf < %s | FileCheck %s
3+
4+
; Zero extension instructions should be eliminated at instruction
5+
; selection phase for all test cases below.
6+
7+
; In BPF zero extension is implemented as &= or a pair of <<=/>>=
8+
; instructions, hence simply check that &= and >>= do not exist in
9+
; generated code (<<= remains because %c is used by both call and
10+
; lshr in a few test cases).
11+
12+
; CHECK-NOT: &=
13+
; CHECK-NOT: >>=
14+
15+
define void @shl_lshr_same_bb(ptr %p) {
16+
entry:
17+
%a = load i8, ptr %p, align 1
18+
%b = zext i8 %a to i64
19+
%c = shl i64 %b, 56
20+
%d = lshr i64 %c, 56
21+
%e = icmp eq i64 %d, 0
22+
; hasOneUse() is a common requirement for many CombineDAG
23+
; transofmations, make sure that it does not matter in this case.
24+
call void @sink1(i8 %a, i64 %b, i64 %c, i64 %d, i1 %e)
25+
ret void
26+
}
27+
28+
define void @shl_lshr_diff_bb(ptr %p) {
29+
entry:
30+
%a = load i16, ptr %p, align 2
31+
%b = zext i16 %a to i64
32+
%c = shl i64 %b, 48
33+
%d = lshr i64 %c, 48
34+
br label %next
35+
36+
; Jump to the new basic block creates a COPY instruction for %d, which
37+
; might be materialized as noop or as AND_ri (zero extension) at the
38+
; start of the basic block. The decision depends on TLI.isZExtFree()
39+
; results, see RegsForValue::getCopyToRegs(). Check below verifies
40+
; that COPY is materialized as noop.
41+
next:
42+
%e = icmp eq i64 %d, 0
43+
call void @sink2(i16 %a, i64 %b, i64 %c, i64 %d, i1 %e)
44+
ret void
45+
}
46+
47+
define void @load_zext_same_bb(ptr %p) {
48+
entry:
49+
%a = load i8, ptr %p, align 1
50+
; zext is implicit in this context
51+
%b = icmp eq i8 %a, 0
52+
call void @sink3(i8 %a, i1 %b)
53+
ret void
54+
}
55+
56+
define void @load_zext_diff_bb(ptr %p) {
57+
entry:
58+
%a = load i8, ptr %p, align 1
59+
br label %next
60+
61+
next:
62+
%b = icmp eq i8 %a, 0
63+
call void @sink3(i8 %a, i1 %b)
64+
ret void
65+
}
66+
67+
define void @load_zext_diff_bb_2(ptr %p) {
68+
entry:
69+
%a = load i32, ptr %p, align 4
70+
br label %next
71+
72+
next:
73+
%b = icmp eq i32 %a, 0
74+
call void @sink4(i32 %a, i1 %b)
75+
ret void
76+
}
77+
78+
declare void @sink1(i8, i64, i64, i64, i1);
79+
declare void @sink2(i16, i64, i64, i64, i1);
80+
declare void @sink3(i8, i1);
81+
declare void @sink4(i32, i1);

0 commit comments

Comments
 (0)