-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Add to_boxed_slice()
to clone slice into boxed slice
#80460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @sfackler (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
Is that code actually executed? It might be dead code that just doesn't get elided by the compiler. |
That's true, it internally calls |
There already exists a |
76e0473
to
509280c
Compare
Perhaps because it was implemented before specialization? There is an |
If you extend the |
Well there already exist |
r? @m-ou-se |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a good addition to me, to add unstably so we can try it out. Can you open a tracking issue? Thanks!
509280c
to
f50a592
Compare
Thanks for the feedback! I think I've addressed it, but we should probably do a perf test to make sure that implementing |
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit f50a5923f16d92d2afaafe0b10bb68a67100aacd with merge 678aab286813c93b8a5589c664cc3534f33e46e6... |
… r=m-ou-se Avoid unnecessary Vec construction in BufReader As mentioned in rust-lang#80460, creating a `Vec` and calling `Vec::into_boxed_slice()` emits unnecessary calls to `realloc()` and `free()`. Updated the code to use `Box::new_uninit_slice()` to create a boxed slice directly. I think this also makes it more explicit that the initial contents of the buffer are uninitialized. r? `@m-ou-se`
… r=m-ou-se Avoid unnecessary Vec construction in BufReader As mentioned in rust-lang#80460, creating a `Vec` and calling `Vec::into_boxed_slice()` emits unnecessary calls to `realloc()` and `free()`. Updated the code to use `Box::new_uninit_slice()` to create a boxed slice directly. I think this also makes it more explicit that the initial contents of the buffer are uninitialized. r? ``@m-ou-se``
It looks like the perf run was cancelled when I updated the PR to factor out the second commit into #82728. Would it be possible to re-run it? |
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit 47b1dbd with merge 15dafdb539a06e106a54773f45ddf0b7f750c6c2... |
Hmm, in the interest of reducing code duplication, it looks like copying and cloning into a boxed slice can be easily implemented in terms of the recently-added functions on |
☀️ Try build successful - checks-actions |
Queued 15dafdb539a06e106a54773f45ddf0b7f750c6c2 with parent 1d6b0f6, future comparison URL. |
Finished benchmarking try commit (15dafdb539a06e106a54773f45ddf0b7f750c6c2): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
Those results don't look great. :( |
Hmm, there seem to be some significant improvements and some significant regressions for each of the performance measures. I'm not sure what is causing this, since the generated assembly for
example::copy_slice_old:
push r15
push r14
push r13
push r12
push rbx
; ...
example::copy_slice_new:
push rbp
push r15
push r14
push r13
push r12
push rbx
sub rsp, 8
; ...
example::copy_slice_old:
; ...
mov rdi, rax
mov rax, r13
shr rax, 2
test rdi, rdi
cmove rax, rdi
test rdi, rdi
je .LBB0_7
; ...
example::copy_slice_new:
; ...
mov rbp, rax
test rbp, rbp
je .LBB1_7
; ...
example::clone_slice_old:
; ...
mov qword ptr [rsp], r14
; ...
mov rax, qword ptr [rsp]
; ...
example::clone_slice_old:
; ...
.LBB0_7:
cmp r12, rbx
je .LBB0_10
mov edi, 4
mov esi, 4
call r14
test rax, rax
je .LBB0_13
mov rcx, qword ptr [r15 + 8*rbx]
mov ecx, dword ptr [rcx]
mov dword ptr [rax], ecx
mov qword ptr [r13 + 8*rbx], rax
lea rax, [rbx + 1]
mov rbx, rax
cmp rbp, rax
jne .LBB0_7
; ...
example::clone_slice_new:
; ...
.LBB1_7:
mov edi, 4
mov esi, 4
call r13
test rax, rax
je .LBB1_12
mov rcx, qword ptr [r15 + 8*rbp]
mov ecx, dword ptr [rcx]
mov dword ptr [rax], ecx
mov qword ptr [rbx + 8*rbp], rax
add rbp, 1
cmp r12, rbp
jne .LBB1_7
; ... I would expect the code involved in making the allocation or performing a |
@calebsander Ping from triage, what's next steps on this? |
The concern @m-ou-se brought up was that the perf run showed some performance regressions (and some improvements) from implementing
|
The generated assembly may not be the problem. perf rlo mostly measures compile time, not runtime. So the results aren't necessarily due to box/vec use, they may simply be due to code being being more difficult for the compiler even when it ultimately results in similar assembly. E.g. this one spends more time in LLVM: https://perf.rust-lang.org/detailed-query.html?commit=15dafdb539a06e106a54773f45ddf0b7f750c6c2&base_commit=1d6b0f626aad4ee9f2eaec4d5582f45620ccab80&benchmark=ripgrep-opt&run_name=full |
// allocated above with the capacity of `s`, and initialize to `s.len()` in | ||
// ptr::copy_to_non_overlapping below. | ||
fn to_boxed_slice<A: Allocator>(s: &[Self], alloc: A) -> Box<[Self], A> { | ||
let mut boxed = Box::new_uninit_slice_in(s.len(), alloc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of effort went into the Vec allocation path to reduce the amount of IR generated, perhaps the Box equivalent simply isn't as fine-tuned and that causes the regressions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're identical except for the additional RawVec::into_box()
call, which just moves some pointers and lengths around, so it seems unlikely that this would be the culprit:
impl<T, A: Allocator> Box<[T], A> {
pub fn new_uninit_slice_in(len: usize, alloc: A) -> Box<[mem::MaybeUninit<T>], A> {
unsafe { RawVec::with_capacity_in(len, alloc).into_box(len) }
}
}
impl<T, A: Allocator> Vec<T, A> {
pub fn with_capacity_in(capacity: usize, alloc: A) -> Self {
Vec { buf: RawVec::with_capacity_in(capacity, alloc), len: 0 }
}
}
impl<T, A: Allocator> RawVec<T, A> {
pub unsafe fn into_box(self, len: usize) -> Box<[MaybeUninit<T>], A> {
// Sanity-check one half of the safety requirement (we cannot check the other half).
debug_assert!(
len <= self.capacity(),
"`len` must be smaller than or equal to `self.capacity()`"
);
let me = ManuallyDrop::new(self);
unsafe {
let slice = slice::from_raw_parts_mut(me.ptr() as *mut MaybeUninit<T>, len);
Box::from_raw_in(slice, ptr::read(&me.alloc))
}
}
}
Okay, that's good to know. It's definitely possible that more IR is being generated, although the Rust playground shows pretty similar LLVM IR generated (in release mode): Old IR for <[i32]>::to_vec(); ModuleID = 'playground.5mgwo5pt-cgu.0'
source_filename = "playground.5mgwo5pt-cgu.0"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
%"std::vec::Vec<i32>" = type { [0 x i64], { i32*, i64 }, [0 x i64], i64, [0 x i64] }
%"unwind::libunwind::_Unwind_Exception" = type { [0 x i64], i64, [0 x i64], void (i32, %"unwind::libunwind::_Unwind_Exception"*)*, [0 x i64], [6 x i64], [0 x i64] }
%"unwind::libunwind::_Unwind_Context" = type { [0 x i8] }
; playground::copy_slice_old
; Function Attrs: nonlazybind uwtable
define void @_ZN10playground14copy_slice_old17h6d0009fe79b83180E(%"std::vec::Vec<i32>"* noalias nocapture sret(%"std::vec::Vec<i32>") dereferenceable(24) %0, [0 x i32]* noalias nocapture nonnull readonly align 4 %slice.0, i64 %slice.1) unnamed_addr #0 personality i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* @rust_eh_personality {
start:
tail call void @llvm.experimental.noalias.scope.decl(metadata !2)
tail call void @llvm.experimental.noalias.scope.decl(metadata !5), !noalias !8
tail call void @llvm.experimental.noalias.scope.decl(metadata !10), !noalias !13
tail call void @llvm.experimental.noalias.scope.decl(metadata !15), !noalias !18
tail call void @llvm.experimental.noalias.scope.decl(metadata !20), !noalias !23
%1 = tail call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %slice.1, i64 4) #7
%2 = extractvalue { i64, i1 } %1, 1
%3 = extractvalue { i64, i1 } %1, 0
%.sroa.3.0.i.i.i.i.i.i.i.i = select i1 %2, i64 0, i64 4
%.sroa.0.0.i.i.i.i.i.i.i.i = select i1 %2, i64 undef, i64 %3
br i1 %2, label %bb6.i.i.i.i.i.i.i, label %bb13.i.i.i.i.i.i.i
bb6.i.i.i.i.i.i.i: ; preds = %start
; call alloc::raw_vec::capacity_overflow
tail call void @_ZN5alloc7raw_vec17capacity_overflow17h97153d40f6e2cd3bE(), !noalias !25
unreachable
bb13.i.i.i.i.i.i.i: ; preds = %start
%4 = icmp eq i64 %.sroa.0.0.i.i.i.i.i.i.i.i, 0
br i1 %4, label %bb3.i.i.i.i.i.i.i.i.i, label %bb7.i.i.i.i.i.i.i.i.i
bb3.i.i.i.i.i.i.i.i.i: ; preds = %bb13.i.i.i.i.i.i.i
%_2.i.i.i.i.i.i.i.i.i.i = inttoptr i64 %.sroa.3.0.i.i.i.i.i.i.i.i to i8*
br label %"_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$8allocate17hdb6fd4f64ab5e4e5E.exit.i.i.i.i.i.i.i"
bb7.i.i.i.i.i.i.i.i.i: ; preds = %bb13.i.i.i.i.i.i.i
%5 = tail call i8* @__rust_alloc(i64 %.sroa.0.0.i.i.i.i.i.i.i.i, i64 %.sroa.3.0.i.i.i.i.i.i.i.i) #7, !noalias !25
%6 = icmp eq i8* %5, null
%.sroa.0.0.i.op.i.i.i.i.i.i.i = lshr i64 %.sroa.0.0.i.i.i.i.i.i.i.i, 2
%phi.bo.i.i.i.i.i.i.i = select i1 %6, i64 0, i64 %.sroa.0.0.i.op.i.i.i.i.i.i.i
br label %"_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$8allocate17hdb6fd4f64ab5e4e5E.exit.i.i.i.i.i.i.i"
"_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$8allocate17hdb6fd4f64ab5e4e5E.exit.i.i.i.i.i.i.i": ; preds = %bb7.i.i.i.i.i.i.i.i.i, %bb3.i.i.i.i.i.i.i.i.i
%.sroa.4.0.i.i.i.i.i.i.i.i.i = phi i64 [ 0, %bb3.i.i.i.i.i.i.i.i.i ], [ %phi.bo.i.i.i.i.i.i.i, %bb7.i.i.i.i.i.i.i.i.i ]
%.sroa.0.0.i.i.i.i.i.i.i.i.i = phi i8* [ %_2.i.i.i.i.i.i.i.i.i.i, %bb3.i.i.i.i.i.i.i.i.i ], [ %5, %bb7.i.i.i.i.i.i.i.i.i ]
%7 = icmp eq i8* %.sroa.0.0.i.i.i.i.i.i.i.i.i, null
br i1 %7, label %bb20.i.i.i.i.i.i.i, label %"_ZN5alloc5slice29_$LT$impl$u20$$u5b$T$u5d$$GT$6to_vec17h82a8a1f74f9ea9ddE.exit"
bb20.i.i.i.i.i.i.i: ; preds = %"_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$8allocate17hdb6fd4f64ab5e4e5E.exit.i.i.i.i.i.i.i"
; call alloc::alloc::handle_alloc_error
tail call void @_ZN5alloc5alloc18handle_alloc_error17hbbcab09c85c3ddd4E(i64 %.sroa.0.0.i.i.i.i.i.i.i.i, i64 %.sroa.3.0.i.i.i.i.i.i.i.i), !noalias !25
unreachable
"_ZN5alloc5slice29_$LT$impl$u20$$u5b$T$u5d$$GT$6to_vec17h82a8a1f74f9ea9ddE.exit": ; preds = %"_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$8allocate17hdb6fd4f64ab5e4e5E.exit.i.i.i.i.i.i.i"
%8 = bitcast %"std::vec::Vec<i32>"* %0 to i8**
store i8* %.sroa.0.0.i.i.i.i.i.i.i.i.i, i8** %8, align 8, !alias.scope !26, !noalias !27
%9 = getelementptr inbounds %"std::vec::Vec<i32>", %"std::vec::Vec<i32>"* %0, i64 0, i32 1, i32 1
store i64 %.sroa.4.0.i.i.i.i.i.i.i.i.i, i64* %9, align 8, !alias.scope !26, !noalias !27
%10 = getelementptr inbounds %"std::vec::Vec<i32>", %"std::vec::Vec<i32>"* %0, i64 0, i32 3
%11 = shl nuw i64 %slice.1, 2
%12 = bitcast [0 x i32]* %slice.0 to i8*
tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 4 %.sroa.0.0.i.i.i.i.i.i.i.i.i, i8* nonnull align 4 %12, i64 %11, i1 false) #7, !noalias !28
store i64 %slice.1, i64* %10, align 8, !alias.scope !29, !noalias !27
ret void
}
; Function Attrs: nofree nosync nounwind readnone speculatable willreturn
declare { i64, i1 } @llvm.umul.with.overflow.i64(i64, i64) #1
; Function Attrs: nounwind nonlazybind uwtable
declare i32 @rust_eh_personality(i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*) unnamed_addr #2
; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #3
; Function Attrs: nounwind nonlazybind uwtable
declare noalias i8* @__rust_alloc(i64, i64) unnamed_addr #2
; alloc::raw_vec::capacity_overflow
; Function Attrs: noreturn nonlazybind uwtable
declare void @_ZN5alloc7raw_vec17capacity_overflow17h97153d40f6e2cd3bE() unnamed_addr #4
; alloc::alloc::handle_alloc_error
; Function Attrs: cold noreturn nounwind nonlazybind uwtable
declare void @_ZN5alloc5alloc18handle_alloc_error17hbbcab09c85c3ddd4E(i64, i64) unnamed_addr #5
; Function Attrs: inaccessiblememonly nofree nosync nounwind willreturn
declare void @llvm.experimental.noalias.scope.decl(metadata) #6
attributes #0 = { nonlazybind uwtable "probe-stack"="inline-asm" "target-cpu"="x86-64" }
attributes #1 = { nofree nosync nounwind readnone speculatable willreturn }
attributes #2 = { nounwind nonlazybind uwtable "probe-stack"="inline-asm" "target-cpu"="x86-64" }
attributes #3 = { argmemonly nofree nosync nounwind willreturn }
attributes #4 = { noreturn nonlazybind uwtable "probe-stack"="inline-asm" "target-cpu"="x86-64" }
attributes #5 = { cold noreturn nounwind nonlazybind uwtable "probe-stack"="inline-asm" "target-cpu"="x86-64" }
attributes #6 = { inaccessiblememonly nofree nosync nounwind willreturn }
attributes #7 = { nounwind }
!llvm.module.flags = !{!0, !1}
!0 = !{i32 7, !"PIC Level", i32 2}
!1 = !{i32 2, !"RtLibUseGOT", i32 1}
!2 = !{!3}
!3 = distinct !{!3, !4, !"_ZN5alloc5slice29_$LT$impl$u20$$u5b$T$u5d$$GT$6to_vec17h82a8a1f74f9ea9ddE: argument 0"}
!4 = distinct !{!4, !"_ZN5alloc5slice29_$LT$impl$u20$$u5b$T$u5d$$GT$6to_vec17h82a8a1f74f9ea9ddE"}
!5 = !{!6}
!6 = distinct !{!6, !7, !"_ZN5alloc5slice29_$LT$impl$u20$$u5b$T$u5d$$GT$9to_vec_in17hf0efe825ef187949E: argument 0"}
!7 = distinct !{!7, !"_ZN5alloc5slice29_$LT$impl$u20$$u5b$T$u5d$$GT$9to_vec_in17hf0efe825ef187949E"}
!8 = !{!3, !9}
!9 = distinct !{!9, !4, !"_ZN5alloc5slice29_$LT$impl$u20$$u5b$T$u5d$$GT$6to_vec17h82a8a1f74f9ea9ddE: %self.0"}
!10 = !{!11}
!11 = distinct !{!11, !12, !"_ZN5alloc5slice4hack6to_vec17h4e5b681653d8de3eE: argument 0"}
!12 = distinct !{!12, !"_ZN5alloc5slice4hack6to_vec17h4e5b681653d8de3eE"}
!13 = !{!6, !14, !3, !9}
!14 = distinct !{!14, !7, !"_ZN5alloc5slice29_$LT$impl$u20$$u5b$T$u5d$$GT$9to_vec_in17hf0efe825ef187949E: %self.0"}
!15 = !{!16}
!16 = distinct !{!16, !17, !"_ZN52_$LT$T$u20$as$u20$alloc..slice..hack..ConvertVec$GT$6to_vec17h153ad63a55993150E: %v"}
!17 = distinct !{!17, !"_ZN52_$LT$T$u20$as$u20$alloc..slice..hack..ConvertVec$GT$6to_vec17h153ad63a55993150E"}
!18 = !{!11, !19, !6, !14, !3, !9}
!19 = distinct !{!19, !12, !"_ZN5alloc5slice4hack6to_vec17h4e5b681653d8de3eE: %s.0"}
!20 = !{!21}
!21 = distinct !{!21, !22, !"_ZN5alloc3vec16Vec$LT$T$C$A$GT$16with_capacity_in17hce1a244487fd289dE: argument 0"}
!22 = distinct !{!22, !"_ZN5alloc3vec16Vec$LT$T$C$A$GT$16with_capacity_in17hce1a244487fd289dE"}
!23 = !{!16, !24, !11, !19, !6, !14, !3, !9}
!24 = distinct !{!24, !17, !"_ZN52_$LT$T$u20$as$u20$alloc..slice..hack..ConvertVec$GT$6to_vec17h153ad63a55993150E: %s.0"}
!25 = !{!21, !16, !24, !11, !19, !6, !14, !3, !9}
!26 = !{!21, !16, !11, !6, !3}
!27 = !{!24, !19, !14, !9}
!28 = !{!16, !11, !6, !3}
!29 = !{!30, !16, !11, !6, !3}
!30 = distinct !{!30, !31, !"_ZN5alloc3vec16Vec$LT$T$C$A$GT$7set_len17hcd51c7b209261a71E: %self"}
!31 = distinct !{!31, !"_ZN5alloc3vec16Vec$LT$T$C$A$GT$7set_len17hcd51c7b209261a71E"} New IR; ModuleID = 'playground.5mgwo5pt-cgu.0'
source_filename = "playground.5mgwo5pt-cgu.0"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
%"std::vec::Vec<i32>" = type { [0 x i64], { i32*, i64 }, [0 x i64], i64, [0 x i64] }
%"unwind::libunwind::_Unwind_Exception" = type { [0 x i64], i64, [0 x i64], void (i32, %"unwind::libunwind::_Unwind_Exception"*)*, [0 x i64], [6 x i64], [0 x i64] }
%"unwind::libunwind::_Unwind_Context" = type { [0 x i8] }
; playground::copy_slice_new
; Function Attrs: nonlazybind uwtable
define void @_ZN10playground14copy_slice_new17hbe14d17b96488007E(%"std::vec::Vec<i32>"* noalias nocapture sret(%"std::vec::Vec<i32>") dereferenceable(24) %0, [0 x i32]* noalias nocapture nonnull readonly align 4 %slice.0, i64 %slice.1) unnamed_addr #0 personality i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* @rust_eh_personality {
start:
tail call void @llvm.experimental.noalias.scope.decl(metadata !2)
%1 = tail call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %slice.1, i64 4) #7
%2 = extractvalue { i64, i1 } %1, 1
%3 = extractvalue { i64, i1 } %1, 0
%.sroa.3.0.i.i.i.i.i.i.i = select i1 %2, i64 0, i64 4
%.sroa.0.0.i.i.i.i.i.i.i = select i1 %2, i64 undef, i64 %3
br i1 %2, label %bb6.i.i.i.i.i.i, label %bb13.i.i.i.i.i.i
bb6.i.i.i.i.i.i: ; preds = %start
; call alloc::raw_vec::capacity_overflow
tail call void @_ZN5alloc7raw_vec17capacity_overflow17h97153d40f6e2cd3bE(), !noalias !5
unreachable
bb13.i.i.i.i.i.i: ; preds = %start
%4 = icmp eq i64 %.sroa.0.0.i.i.i.i.i.i.i, 0
br i1 %4, label %bb3.i.i.i.i.i.i.i.i, label %bb7.i.i.i.i.i.i.i.i
bb3.i.i.i.i.i.i.i.i: ; preds = %bb13.i.i.i.i.i.i
%_2.i.i.i.i.i.i.i.i.i = inttoptr i64 %.sroa.3.0.i.i.i.i.i.i.i to i8*
br label %"_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$8allocate17hdb6fd4f64ab5e4e5E.exit.i.i.i.i.i.i"
bb7.i.i.i.i.i.i.i.i: ; preds = %bb13.i.i.i.i.i.i
%5 = tail call i8* @__rust_alloc(i64 %.sroa.0.0.i.i.i.i.i.i.i, i64 %.sroa.3.0.i.i.i.i.i.i.i) #7, !noalias !5
br label %"_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$8allocate17hdb6fd4f64ab5e4e5E.exit.i.i.i.i.i.i"
"_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$8allocate17hdb6fd4f64ab5e4e5E.exit.i.i.i.i.i.i": ; preds = %bb7.i.i.i.i.i.i.i.i, %bb3.i.i.i.i.i.i.i.i
%.sroa.0.0.i.i.i.i.i.i.i.i = phi i8* [ %_2.i.i.i.i.i.i.i.i.i, %bb3.i.i.i.i.i.i.i.i ], [ %5, %bb7.i.i.i.i.i.i.i.i ]
%6 = icmp eq i8* %.sroa.0.0.i.i.i.i.i.i.i.i, null
br i1 %6, label %bb20.i.i.i.i.i.i, label %_ZN10playground4hack6to_vec17h302c903ca456a4daE.exit
bb20.i.i.i.i.i.i: ; preds = %"_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$8allocate17hdb6fd4f64ab5e4e5E.exit.i.i.i.i.i.i"
; call alloc::alloc::handle_alloc_error
tail call void @_ZN5alloc5alloc18handle_alloc_error17hbbcab09c85c3ddd4E(i64 %.sroa.0.0.i.i.i.i.i.i.i, i64 %.sroa.3.0.i.i.i.i.i.i.i), !noalias !5
unreachable
_ZN10playground4hack6to_vec17h302c903ca456a4daE.exit: ; preds = %"_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$8allocate17hdb6fd4f64ab5e4e5E.exit.i.i.i.i.i.i"
%7 = shl nuw i64 %slice.1, 2
%8 = bitcast [0 x i32]* %slice.0 to i8*
tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 4 %.sroa.0.0.i.i.i.i.i.i.i.i, i8* nonnull align 4 %8, i64 %7, i1 false) #7, !noalias !2
%9 = bitcast %"std::vec::Vec<i32>"* %0 to i8**
store i8* %.sroa.0.0.i.i.i.i.i.i.i.i, i8** %9, align 8, !alias.scope !11, !noalias !16
%10 = getelementptr inbounds %"std::vec::Vec<i32>", %"std::vec::Vec<i32>"* %0, i64 0, i32 1, i32 1
store i64 %slice.1, i64* %10, align 8, !alias.scope !11, !noalias !16
%11 = getelementptr inbounds %"std::vec::Vec<i32>", %"std::vec::Vec<i32>"* %0, i64 0, i32 3
store i64 %slice.1, i64* %11, align 8, !alias.scope !11, !noalias !16
ret void
}
; Function Attrs: nofree nosync nounwind readnone speculatable willreturn
declare { i64, i1 } @llvm.umul.with.overflow.i64(i64, i64) #1
; Function Attrs: nounwind nonlazybind uwtable
declare i32 @rust_eh_personality(i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*) unnamed_addr #2
; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #3
; Function Attrs: nounwind nonlazybind uwtable
declare noalias i8* @__rust_alloc(i64, i64) unnamed_addr #2
; alloc::raw_vec::capacity_overflow
; Function Attrs: noreturn nonlazybind uwtable
declare void @_ZN5alloc7raw_vec17capacity_overflow17h97153d40f6e2cd3bE() unnamed_addr #4
; alloc::alloc::handle_alloc_error
; Function Attrs: cold noreturn nounwind nonlazybind uwtable
declare void @_ZN5alloc5alloc18handle_alloc_error17hbbcab09c85c3ddd4E(i64, i64) unnamed_addr #5
; Function Attrs: inaccessiblememonly nofree nosync nounwind willreturn
declare void @llvm.experimental.noalias.scope.decl(metadata) #6
attributes #0 = { nonlazybind uwtable "probe-stack"="inline-asm" "target-cpu"="x86-64" }
attributes #1 = { nofree nosync nounwind readnone speculatable willreturn }
attributes #2 = { nounwind nonlazybind uwtable "probe-stack"="inline-asm" "target-cpu"="x86-64" }
attributes #3 = { argmemonly nofree nosync nounwind willreturn }
attributes #4 = { noreturn nonlazybind uwtable "probe-stack"="inline-asm" "target-cpu"="x86-64" }
attributes #5 = { cold noreturn nounwind nonlazybind uwtable "probe-stack"="inline-asm" "target-cpu"="x86-64" }
attributes #6 = { inaccessiblememonly nofree nosync nounwind willreturn }
attributes #7 = { nounwind }
!llvm.module.flags = !{!0, !1}
!0 = !{i32 7, !"PIC Level", i32 2}
!1 = !{i32 2, !"RtLibUseGOT", i32 1}
!2 = !{!3}
!3 = distinct !{!3, !4, !"_ZN10playground4hack6to_vec17h302c903ca456a4daE: argument 0"}
!4 = distinct !{!4, !"_ZN10playground4hack6to_vec17h302c903ca456a4daE"}
!5 = !{!6, !8, !3, !10}
!6 = distinct !{!6, !7, !"_ZN52_$LT$T$u20$as$u20$playground..hack..ConvertBoxed$GT$14to_boxed_slice17hf040d5891f80ff7fE: %s.0"}
!7 = distinct !{!7, !"_ZN52_$LT$T$u20$as$u20$playground..hack..ConvertBoxed$GT$14to_boxed_slice17hf040d5891f80ff7fE"}
!8 = distinct !{!8, !9, !"_ZN10playground4hack14to_boxed_slice17hee7857f846790a13E: %s.0"}
!9 = distinct !{!9, !"_ZN10playground4hack14to_boxed_slice17hee7857f846790a13E"}
!10 = distinct !{!10, !4, !"_ZN10playground4hack6to_vec17h302c903ca456a4daE: %s.0"}
!11 = !{!12, !14, !3}
!12 = distinct !{!12, !13, !"_ZN5alloc3vec16Vec$LT$T$C$A$GT$17from_raw_parts_in17h851da6b596063aefE: argument 0"}
!13 = distinct !{!13, !"_ZN5alloc3vec16Vec$LT$T$C$A$GT$17from_raw_parts_in17h851da6b596063aefE"}
!14 = distinct !{!14, !15, !"_ZN10playground4hack8into_vec17hf50de4bbad45c759E: argument 0"}
!15 = distinct !{!15, !"_ZN10playground4hack8into_vec17hf50de4bbad45c759E"}
!16 = !{!17, !10}
!17 = distinct !{!17, !15, !"_ZN10playground4hack8into_vec17hf50de4bbad45c759E: argument 1"} |
Well, then I am out of guesses. If you want to dig further you can download (there's a tool for that) the try builds that were used for the perf run and then get additional profiling outputs for individual benchmarks that regressed to figure out what's causing it. |
ping from triage: |
I don't think I have enough experience with the internals of the compilation process to debug why compilation time has increased for some of the perf test cases. If someone with more expertise wants to take a look at it, they are welcome to. |
Since there's a tracking issue to keep this topic from being totally dropped, and in lieu of any immediate solutions to the perf problems, I'd recommend closing for now unless a libs team member disagrees. |
I'll close for now, this is tracked by #82725 . |
The current recommended approach seems to be to use
.to_vec().into_boxed_slice()
, but this misses some opportunities for optimization. A Godbolt comparison shows thatVec::into_boxed_slice()
emits unnecessary calls torealloc()
andfree()
and uses some extra registers for local variables: https://godbolt.org/z/eq7Pc7. I was able to remove 3 calls toVec::into_boxed_slice()
from the standard library.If this addition looks reasonable, I can create a tracking issue for the
slice_to_boxed
feature.