-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[X86] Missed combining of usubo+cmp into sbb #53432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@llvm/issue-subscribers-backend-x86 |
https://reviews.llvm.org/D125642 improves the codegen to:
|
@chfast where did this code snippet originally come from? I'm not sure whether its better trying to recognise that this is a i128 comparison in InstCombine or just continue with backend combines (the remaining pattern is very specific). @rotateright I'm not sure if we do much to reconstruct wider arithmetic in IR? Although there does appear to be this: https://reviews.llvm.org/D101232 |
Alive2: https://alive2.llvm.org/ce/z/7gtizV The increase in instructions suggests trying to widen the icmp in InstCombine might not be a good idea :(
|
I have an library implementing I once tried to widen multiplication with mixed results: https://reviews.llvm.org/D56277. |
We're actually missing a reduce of the IR in the example with the subtract, so that makes it even less likely that we'd try widening in instcombine:
|
I'm not sure if that makes it easier to widen in the backend, but that fold in IR does seem to slightly improve the x86 asm: |
This is the specific pattern seen in #53432, but it can be extended in multiple ways: 1. The 'zext' could be an 'and' 2. The 'sub' could be some other binop with a similar ==0 property (udiv). There might be some way to generalize using knownbits, but that would require checking that the 'bool' value is created with some instruction that can be replaced with new icmp+logic. https://alive2.llvm.org/ce/z/-KCfpa
I updated the output which is a bit better now, but still not the ideal. |
I recompiled C++ code and it now different because of the InstCombine addition. define i1 @subcarry_ult_2x64_2(i64 %0, i64 %1, i64 %2, i64 %3) nounwind {
%5 = icmp ult i64 %0, %2
%6 = icmp ult i64 %1, %3
%7 = icmp eq i64 %1, %3
%8 = and i1 %5, %7
%9 = or i1 %6, %8
ret i1 %9
} Output: subcarry_ult_2x64_2:
cmp rdi, rdx
setb dl
cmp rsi, rcx
setb cl
sete al
and al, dl
or al, cl
ret |
After #56926 and 926e731 the codege result remains the same. I also found similar example: src: # @src
cmp rdi, rdx
setb cl
test rsi, rsi
sete al
and al, cl
ret
tgt: # @tgt
cmp rdi, rdx
sbb rsi, 0
setb al
ret https://godbolt.org/z/9qoM3Tqe1 Do you think I should try to add this optimization in X86 backend only? If yes, any hints where to start? |
Looking at the debug output for the target case shows that we transform to target-independent nodes before producing the x86-specific ones:
So there's a possible generic (DAGCombiner) solution (guarded by appropriate legality checks). We need to pattern match something like this:
...into the above sequence that uses usubo + setcccarry. |
Wasn't something similar proposed in https://reviews.llvm.org/D118037? |
I added two test cases in https://reviews.llvm.org/D132463. C source: https://godbolt.org/z/W1qqvqGbr.
This one seems to be obsoleted by https://reviews.llvm.org/D127115 (topological order) + https://reviews.llvm.org/D57317. This one is different in a way that we want to combine two |
This looks like very specific pattern to match. Do you think there is a smaller subpattern that would e.g. produce only Also, is there a way to dump DAG in this text from without using debugger? |
I haven't found a smaller target-independent fold for something like this, but it's possible that we could generalize it. Usually, there's some inverse logic pattern that should also be handled (DeMorgan), so that would end with an 'or' instruction and inverted setcc predicates. The only ways to dump the DAG that I know of are with the "-debug" option or: |
There are 2 more patterns already in tests (transcribed by me manually from view-dag) The one currently produced by InstCombine.
Older one before some changes to InstCombine
Both should be converted to:
|
This is implementation of less-than of 2x
i64
multi-precision integer done by checking the carry flag from subtraction.https://godbolt.org/z/Kbhnq8c91
Output:
Expected:
The text was updated successfully, but these errors were encountered: