Speedup heapsort by 1.8x by making it branchless #107894

Voultapher · 2023-02-10T17:07:14Z

slice::sort_unstable will fall back to heapsort if it repeatedly fails to find a good pivot. By making the core child update code branchless it is much faster. On Zen3 sorting 10k u64 and forcing the sort to pick heapsort, results in:

~~455us -> 278us~~

455us -> 249us

`slice::sort_unstable` will fall back to heapsort if it repeatedly fails to find a good pivot. By making the core child update code branchless it is much faster. On Zen3 sorting 10k `u64` and forcing the sort to pick heapsort, results in: 455us -> 278us

rustbot · 2023-02-10T17:07:22Z

r? @scottmcm

(rustbot has picked a reviewer for you, use r? to override)

rustbot · 2023-02-10T17:07:25Z

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

Stabilizing library features
Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
Changing public documentation in ways that create new stability guarantees
Changing observable runtime behavior of library APIs

Voultapher · 2023-02-10T17:07:26Z

r? thomcc

library/core/src/slice/sort.rs

This allows even better code-gen, cmp + adc. While also more clearly communicating the intent.

Voultapher · 2023-02-11T08:34:16Z

r? @scottmcm

scottmcm · 2023-02-11T22:20:45Z

Thanks for benching it!

@bors r+

(Ok to rollup because this is in the fallback sorting/selecting path, so is unlikely to matter for compiler perf.)

bors · 2023-02-11T22:20:48Z

📌 Commit ee0376c has been approved by scottmcm

It is now in the queue for this repository.

thomcc · 2023-02-11T22:22:24Z

@bors rollup=never

thomcc · 2023-02-11T22:23:15Z

@scottmcm Not sure I agree — that's true but compiler is huge and it's hard to say what matters. Better safe than sorry IMO.

scottmcm · 2023-02-11T23:04:00Z

@thomcc Sure, no objections, especially since the queue isn't too long right now.

bors · 2023-02-12T03:30:13Z

⌛ Testing commit ee0376c with merge b7089e0...

bors · 2023-02-12T07:00:39Z

☀️ Test successful - checks-actions
Approved by: scottmcm
Pushing b7089e0 to master...

bors · 2023-02-12T07:00:39Z

☀️ Test successful - checks-actions
Approved by: scottmcm
Pushing b7089e0 to master...

rust-timer · 2023-02-12T08:30:14Z

Finished benchmarking commit (b7089e0): comparison URL.

Overall result: no relevant changes - no action needed

@rustbot label: -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	1.6%	[1.6%, 1.6%]	1
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-3.3%	[-3.3%, -3.3%]	1
All ❌✅ (primary)	-	-	0

Cycles

This benchmark run did not return any relevant results for this metric.

rustbot assigned scottmcm Feb 10, 2023

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Feb 10, 2023

rustbot assigned thomcc and unassigned scottmcm Feb 10, 2023

scottmcm reviewed Feb 10, 2023

View reviewed changes

library/core/src/slice/sort.rs Outdated Show resolved Hide resolved

Split branches in heapsort child selection

ee0376c

This allows even better code-gen, cmp + adc. While also more clearly communicating the intent.

rustbot assigned scottmcm and unassigned thomcc Feb 11, 2023

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 11, 2023

bors added the merged-by-bors This PR was explicitly merged by bors. label Feb 12, 2023

bors merged commit b7089e0 into rust-lang:master Feb 12, 2023

rustbot added this to the 1.69.0 milestone Feb 12, 2023

Voultapher deleted the improve-heapsort-fallback branch February 12, 2023 09:53

niklasf mentioned this pull request Feb 19, 2023

Avoid an unpredictable branch in adjust_heap sundy-li/partial_sort#9

Merged

Voultapher changed the title ~~Speedup heapsort by 1.5x by making it branchless~~ Speedup heapsort by 1.8x by making it branchless May 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speedup heapsort by 1.8x by making it branchless #107894

Speedup heapsort by 1.8x by making it branchless #107894

Voultapher commented Feb 10, 2023 •

edited

Loading

rustbot commented Feb 10, 2023

rustbot commented Feb 10, 2023

Voultapher commented Feb 10, 2023

Voultapher commented Feb 11, 2023

scottmcm commented Feb 11, 2023

bors commented Feb 11, 2023

thomcc commented Feb 11, 2023

thomcc commented Feb 11, 2023

scottmcm commented Feb 11, 2023

bors commented Feb 12, 2023

bors commented Feb 12, 2023

bors commented Feb 12, 2023

rust-timer commented Feb 12, 2023

Speedup heapsort by 1.8x by making it branchless #107894

Speedup heapsort by 1.8x by making it branchless #107894

Conversation

Voultapher commented Feb 10, 2023 • edited Loading

rustbot commented Feb 10, 2023

rustbot commented Feb 10, 2023

Voultapher commented Feb 10, 2023

Voultapher commented Feb 11, 2023

scottmcm commented Feb 11, 2023

bors commented Feb 11, 2023

thomcc commented Feb 11, 2023

thomcc commented Feb 11, 2023

scottmcm commented Feb 11, 2023

bors commented Feb 12, 2023

bors commented Feb 12, 2023

bors commented Feb 12, 2023

rust-timer commented Feb 12, 2023

Overall result: no relevant changes - no action needed

Instruction count

Max RSS (memory usage)

Cycles

Voultapher commented Feb 10, 2023 •

edited

Loading