-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Speedup heapsort by 1.8x by making it branchless #107894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
`slice::sort_unstable` will fall back to heapsort if it repeatedly fails to find a good pivot. By making the core child update code branchless it is much faster. On Zen3 sorting 10k `u64` and forcing the sort to pick heapsort, results in: 455us -> 278us
r? @scottmcm (rustbot has picked a reviewer for you, use r? to override) |
Hey! It looks like you've submitted a new PR for the library teams! If this PR contains changes to any Examples of
|
r? thomcc |
This allows even better code-gen, cmp + adc. While also more clearly communicating the intent.
r? @scottmcm |
Thanks for benching it! @bors r+ (Ok to rollup because this is in the fallback sorting/selecting path, so is unlikely to matter for compiler perf.) |
@bors rollup=never |
@scottmcm Not sure I agree — that's true but compiler is huge and it's hard to say what matters. Better safe than sorry IMO. |
@thomcc Sure, no objections, especially since the queue isn't too long right now. |
☀️ Test successful - checks-actions |
1 similar comment
☀️ Test successful - checks-actions |
Finished benchmarking commit (b7089e0): comparison URL. Overall result: no relevant changes - no action needed@rustbot label: -perf-regression Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. |
slice::sort_unstable
will fall back to heapsort if it repeatedly fails to find a good pivot. By making the core child update code branchless it is much faster. On Zen3 sorting 10ku64
and forcing the sort to pick heapsort, results in:455us -> 278us455us -> 249us