Add `to_boxed_slice()` to clone slice into boxed slice #80460

calebsander · 2020-12-28T23:57:32Z

The current recommended approach seems to be to use .to_vec().into_boxed_slice(), but this misses some opportunities for optimization. A Godbolt comparison shows that Vec::into_boxed_slice() emits unnecessary calls to realloc() and free() and uses some extra registers for local variables: https://godbolt.org/z/eq7Pc7. I was able to remove 3 calls to Vec::into_boxed_slice() from the standard library.
If this addition looks reasonable, I can create a tracking issue for the slice_to_boxed feature.

rust-highfive · 2020-12-28T23:57:35Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @sfackler (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

the8472 · 2020-12-29T02:19:58Z

A Godbolt comparison shows that Vec::into_boxed_slice() performs an unnecessary realloc()

Is that code actually executed? It might be dead code that just doesn't get elided by the compiler.

calebsander · 2020-12-29T02:28:46Z

That's true, it internally calls Vec::shrink_to_fit(), which will skip calling realloc(). But this code doesn't need to be emitted, and it still performs an unnecessary comparison of capacity to len.

the8472 · 2020-12-29T15:58:59Z

There already exists a impl<T: Copy> From<&[T]> for Box<[T]>. Not sure why that's not T: Clone

calebsander · 2020-12-29T18:04:47Z

There already exists a impl<T: Copy> From<&[T]> for Box<[T]>. Not sure why that's not T: Clone

Perhaps because it was implemented before specialization? There is an impl<T: Clone> From<&[T]> for Vec<T>, so I've changed the bounds on the Box<[T]> implementation to match, and made it use `to_boxed_slice().

the8472 · 2020-12-30T15:11:11Z

If you extend the From implementation then I think the named method is redundant.

calebsander · 2021-01-03T16:46:42Z

If you extend the From implementation then I think the named method is redundant.

Well there already exist Vec::from(), <[T]>::to_vec(), and <[T]>::to_owned() so I added <[T]>::to_boxed_slice() for consistency. I think some people also prefer postfix conversions and some prefer prefix methods, but it is true that both methods are not strictly necessary.

Dylan-DPC-zz · 2021-02-09T01:40:25Z

r? @m-ou-se

m-ou-se

Looks like a good addition to me, to add unstably so we can try it out. Can you open a tracking issue? Thanks!

library/alloc/src/slice.rs

library/alloc/src/boxed.rs

calebsander · 2021-03-03T17:05:24Z

Thanks for the feedback! I think I've addressed it, but we should probably do a perf test to make sure that implementing <[T]>::to_vec() in terms of <[T]>::to_boxed_slice() didn't cause performance regressions.
I created a tracking issue: #82725.

m-ou-se · 2021-03-03T17:15:59Z

@bors try @rust-timer queue

rust-timer · 2021-03-03T17:16:00Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2021-03-03T17:16:12Z

⌛ Trying commit f50a5923f16d92d2afaafe0b10bb68a67100aacd with merge 678aab286813c93b8a5589c664cc3534f33e46e6...

… r=m-ou-se Avoid unnecessary Vec construction in BufReader As mentioned in rust-lang#80460, creating a `Vec` and calling `Vec::into_boxed_slice()` emits unnecessary calls to `realloc()` and `free()`. Updated the code to use `Box::new_uninit_slice()` to create a boxed slice directly. I think this also makes it more explicit that the initial contents of the buffer are uninitialized. r? `@m-ou-se`

… r=m-ou-se Avoid unnecessary Vec construction in BufReader As mentioned in rust-lang#80460, creating a `Vec` and calling `Vec::into_boxed_slice()` emits unnecessary calls to `realloc()` and `free()`. Updated the code to use `Box::new_uninit_slice()` to create a boxed slice directly. I think this also makes it more explicit that the initial contents of the buffer are uninitialized. r? ``@m-ou-se``

calebsander · 2021-03-08T15:46:40Z

It looks like the perf run was cancelled when I updated the PR to factor out the second commit into #82728. Would it be possible to re-run it?

m-ou-se · 2021-03-08T15:48:39Z

@bors try @rust-timer queue

rust-timer · 2021-03-08T15:48:40Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2021-03-08T15:48:49Z

⌛ Trying commit 47b1dbd with merge 15dafdb539a06e106a54773f45ddf0b7f750c6c2...

calebsander · 2021-03-08T16:29:22Z

Hmm, in the interest of reducing code duplication, it looks like copying and cloning into a boxed slice can be easily implemented in terms of the recently-added functions on MaybeUninit in #79995. But we may want to wait for this feature to be closer to stabilization and see if there are any optimizations to be made to its implementation.

bors · 2021-03-08T16:49:22Z

☀️ Try build successful - checks-actions
Build commit: 15dafdb539a06e106a54773f45ddf0b7f750c6c2 (15dafdb539a06e106a54773f45ddf0b7f750c6c2)

rust-timer · 2021-03-08T16:49:24Z

Queued 15dafdb539a06e106a54773f45ddf0b7f750c6c2 with parent 1d6b0f6, future comparison URL.

rust-timer · 2021-03-08T19:57:05Z

Finished benchmarking try commit (15dafdb539a06e106a54773f45ddf0b7f750c6c2): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf

m-ou-se · 2021-03-21T22:53:11Z

Those results don't look great. :(

calebsander · 2021-03-24T17:03:02Z

Hmm, there seem to be some significant improvements and some significant regressions for each of the performance measures. I'm not sure what is causing this, since the generated assembly for <[T]>::to_vec() is very similar. Here is a Godbolt link comparing the assembly: https://godbolt.org/z/149hxoddx. The main differences I see are:

Copying a slice into a Vec uses one more register for local variables (not sure why):

example::copy_slice_old:
        push    r15
        push    r14
        push    r13
        push    r12
        push    rbx
        ; ...

example::copy_slice_new:
        push    rbp
        push    r15
        push    r14
        push    r13
        push    r12
        push    rbx
        sub     rsp, 8
        ; ...

Copying a slice has fewer instructions to manipulate the return value of __rust_alloc():

example::copy_slice_old:
        ; ...
        mov     rdi, rax
        mov     rax, r13
        shr     rax, 2
        test    rdi, rdi
        cmove   rax, rdi
        test    rdi, rdi
        je      .LBB0_7
        ; ...

example::copy_slice_new:
        ; ...
        mov     rbp, rax
        test    rbp, rbp
        je      .LBB1_7
        ; ...

Cloning a slice no longer uses the stack to store a local variable:

example::clone_slice_old:
        ; ...
        mov     qword ptr [rsp], r14
        ; ...
        mov     rax, qword ptr [rsp]
        ; ...

Cloning a slice now has fewer instructions and jumps in the loop:

example::clone_slice_old:
        ; ...
.LBB0_7:
        cmp     r12, rbx
        je      .LBB0_10
        mov     edi, 4
        mov     esi, 4
        call    r14
        test    rax, rax
        je      .LBB0_13
        mov     rcx, qword ptr [r15 + 8*rbx]
        mov     ecx, dword ptr [rcx]
        mov     dword ptr [rax], ecx
        mov     qword ptr [r13 + 8*rbx], rax
        lea     rax, [rbx + 1]
        mov     rbx, rax
        cmp     rbp, rax
        jne     .LBB0_7
        ; ...

example::clone_slice_new:
        ; ...
.LBB1_7:
        mov     edi, 4
        mov     esi, 4
        call    r13
        test    rax, rax
        je      .LBB1_12
        mov     rcx, qword ptr [r15 + 8*rbp]
        mov     ecx, dword ptr [rcx]
        mov     dword ptr [rax], ecx
        mov     qword ptr [rbx + 8*rbp], rax
        add     rbp, 1
        cmp     r12, rbp
        jne     .LBB1_7
        ; ...

I would expect the code involved in making the allocation or performing a memcpy() to take much longer than these small differences in instruction counts. Is it possible that the performance diff is picking up some unrelated changes?

crlf0710 · 2021-04-09T13:26:53Z

@calebsander Ping from triage, what's next steps on this?

calebsander · 2021-04-09T14:44:20Z

The concern @m-ou-se brought up was that the perf run showed some performance regressions (and some improvements) from implementing <[T]>::to_vec() in terms of <[T]>::to_boxed_slice(). I see a few possible options:

Try to figure out what is causing the performance regressions. I looked at the assembly generated in Godbolt for <[T]>::to_vec() and it is nearly identical. I really have no idea how this change could be causing 1% or 2% slowdowns.
Accept the mixed performance results and merge this PR as is. We could try to improve the performance in a future PR.
Go back to the original version of the PR, which only implemented <[T]>::to_boxed_slice() and did not change the implementation of <[T]>::to_vec(). Unfortunately this version had a significant amount of duplicated code, which is why @m-ou-se suggested using <[T]>::to_boxed_slice() to implement <[T]>::to_vec().

the8472 · 2021-04-09T15:09:15Z

The generated assembly may not be the problem. perf rlo mostly measures compile time, not runtime. So the results aren't necessarily due to box/vec use, they may simply be due to code being being more difficult for the compiler even when it ultimately results in similar assembly.

E.g. this one spends more time in LLVM: https://perf.rust-lang.org/detailed-query.html?commit=15dafdb539a06e106a54773f45ddf0b7f750c6c2&base_commit=1d6b0f626aad4ee9f2eaec4d5582f45620ccab80&benchmark=ripgrep-opt&run_name=full

the8472 · 2021-04-09T15:16:40Z

library/alloc/src/slice.rs

-            // allocated above with the capacity of `s`, and initialize to `s.len()` in
-            // ptr::copy_to_non_overlapping below.
+        fn to_boxed_slice<A: Allocator>(s: &[Self], alloc: A) -> Box<[Self], A> {
+            let mut boxed = Box::new_uninit_slice_in(s.len(), alloc);


Lots of effort went into the Vec allocation path to reduce the amount of IR generated, perhaps the Box equivalent simply isn't as fine-tuned and that causes the regressions?

They're identical except for the additional RawVec::into_box() call, which just moves some pointers and lengths around, so it seems unlikely that this would be the culprit:

impl<T, A: Allocator> Box<[T], A> { pub fn new_uninit_slice_in(len: usize, alloc: A) -> Box<[mem::MaybeUninit<T>], A> { unsafe { RawVec::with_capacity_in(len, alloc).into_box(len) } } } impl<T, A: Allocator> Vec<T, A> { pub fn with_capacity_in(capacity: usize, alloc: A) -> Self { Vec { buf: RawVec::with_capacity_in(capacity, alloc), len: 0 } } } impl<T, A: Allocator> RawVec<T, A> { pub unsafe fn into_box(self, len: usize) -> Box<[MaybeUninit<T>], A> { // Sanity-check one half of the safety requirement (we cannot check the other half). debug_assert!( len <= self.capacity(), "`len` must be smaller than or equal to `self.capacity()`" ); let me = ManuallyDrop::new(self); unsafe { let slice = slice::from_raw_parts_mut(me.ptr() as *mut MaybeUninit<T>, len); Box::from_raw_in(slice, ptr::read(&me.alloc)) } } }

calebsander · 2021-04-09T16:43:18Z

The generated assembly may not be the problem. perf rlo mostly measures compile time, not runtime. So the results aren't necessarily due to box/vec use, they may simply be due to code being being more difficult for the compiler even when it ultimately results in similar assembly.

Okay, that's good to know. It's definitely possible that more IR is being generated, although the Rust playground shows pretty similar LLVM IR generated (in release mode):

Old IR for <[i32]>::to_vec()

; ModuleID = 'playground.5mgwo5pt-cgu.0'
source_filename = "playground.5mgwo5pt-cgu.0"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

%"std::vec::Vec<i32>" = type { [0 x i64], { i32*, i64 }, [0 x i64], i64, [0 x i64] }
%"unwind::libunwind::_Unwind_Exception" = type { [0 x i64], i64, [0 x i64], void (i32, %"unwind::libunwind::_Unwind_Exception"*)*, [0 x i64], [6 x i64], [0 x i64] }
%"unwind::libunwind::_Unwind_Context" = type { [0 x i8] }

; playground::copy_slice_old
; Function Attrs: nonlazybind uwtable
define void @_ZN10playground14copy_slice_old17h6d0009fe79b83180E(%"std::vec::Vec<i32>"* noalias nocapture sret(%"std::vec::Vec<i32>") dereferenceable(24) %0, [0 x i32]* noalias nocapture nonnull readonly align 4 %slice.0, i64 %slice.1) unnamed_addr #0 personality i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* @rust_eh_personality {
start:
  tail call void @llvm.experimental.noalias.scope.decl(metadata !2)
  tail call void @llvm.experimental.noalias.scope.decl(metadata !5), !noalias !8
  tail call void @llvm.experimental.noalias.scope.decl(metadata !10), !noalias !13
  tail call void @llvm.experimental.noalias.scope.decl(metadata !15), !noalias !18
  tail call void @llvm.experimental.noalias.scope.decl(metadata !20), !noalias !23
  %1 = tail call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %slice.1, i64 4) #7
  %2 = extractvalue { i64, i1 } %1, 1
  %3 = extractvalue { i64, i1 } %1, 0
  %.sroa.3.0.i.i.i.i.i.i.i.i = select i1 %2, i64 0, i64 4
  %.sroa.0.0.i.i.i.i.i.i.i.i = select i1 %2, i64 undef, i64 %3
  br i1 %2, label %bb6.i.i.i.i.i.i.i, label %bb13.i.i.i.i.i.i.i

bb6.i.i.i.i.i.i.i:                                ; preds = %start
; call alloc::raw_vec::capacity_overflow
  tail call void @_ZN5alloc7raw_vec17capacity_overflow17h97153d40f6e2cd3bE(), !noalias !25
  unreachable

bb13.i.i.i.i.i.i.i:                               ; preds = %start
  %4 = icmp eq i64 %.sroa.0.0.i.i.i.i.i.i.i.i, 0
  br i1 %4, label %bb3.i.i.i.i.i.i.i.i.i, label %bb7.i.i.i.i.i.i.i.i.i

bb3.i.i.i.i.i.i.i.i.i:                            ; preds = %bb13.i.i.i.i.i.i.i
  %_2.i.i.i.i.i.i.i.i.i.i = inttoptr i64 %.sroa.3.0.i.i.i.i.i.i.i.i to i8*
  br label %"_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$8allocate17hdb6fd4f64ab5e4e5E.exit.i.i.i.i.i.i.i"

bb7.i.i.i.i.i.i.i.i.i:                            ; preds = %bb13.i.i.i.i.i.i.i
  %5 = tail call i8* @__rust_alloc(i64 %.sroa.0.0.i.i.i.i.i.i.i.i, i64 %.sroa.3.0.i.i.i.i.i.i.i.i) #7, !noalias !25
  %6 = icmp eq i8* %5, null
  %.sroa.0.0.i.op.i.i.i.i.i.i.i = lshr i64 %.sroa.0.0.i.i.i.i.i.i.i.i, 2
  %phi.bo.i.i.i.i.i.i.i = select i1 %6, i64 0, i64 %.sroa.0.0.i.op.i.i.i.i.i.i.i
  br label %"_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$8allocate17hdb6fd4f64ab5e4e5E.exit.i.i.i.i.i.i.i"

"_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$8allocate17hdb6fd4f64ab5e4e5E.exit.i.i.i.i.i.i.i": ; preds = %bb7.i.i.i.i.i.i.i.i.i, %bb3.i.i.i.i.i.i.i.i.i
  %.sroa.4.0.i.i.i.i.i.i.i.i.i = phi i64 [ 0, %bb3.i.i.i.i.i.i.i.i.i ], [ %phi.bo.i.i.i.i.i.i.i, %bb7.i.i.i.i.i.i.i.i.i ]
  %.sroa.0.0.i.i.i.i.i.i.i.i.i = phi i8* [ %_2.i.i.i.i.i.i.i.i.i.i, %bb3.i.i.i.i.i.i.i.i.i ], [ %5, %bb7.i.i.i.i.i.i.i.i.i ]
  %7 = icmp eq i8* %.sroa.0.0.i.i.i.i.i.i.i.i.i, null
  br i1 %7, label %bb20.i.i.i.i.i.i.i, label %"_ZN5alloc5slice29_$LT$impl$u20$$u5b$T$u5d$$GT$6to_vec17h82a8a1f74f9ea9ddE.exit"

bb20.i.i.i.i.i.i.i:                               ; preds = %"_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$8allocate17hdb6fd4f64ab5e4e5E.exit.i.i.i.i.i.i.i"
; call alloc::alloc::handle_alloc_error
  tail call void @_ZN5alloc5alloc18handle_alloc_error17hbbcab09c85c3ddd4E(i64 %.sroa.0.0.i.i.i.i.i.i.i.i, i64 %.sroa.3.0.i.i.i.i.i.i.i.i), !noalias !25
  unreachable

"_ZN5alloc5slice29_$LT$impl$u20$$u5b$T$u5d$$GT$6to_vec17h82a8a1f74f9ea9ddE.exit": ; preds = %"_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$8allocate17hdb6fd4f64ab5e4e5E.exit.i.i.i.i.i.i.i"
  %8 = bitcast %"std::vec::Vec<i32>"* %0 to i8**
  store i8* %.sroa.0.0.i.i.i.i.i.i.i.i.i, i8** %8, align 8, !alias.scope !26, !noalias !27
  %9 = getelementptr inbounds %"std::vec::Vec<i32>", %"std::vec::Vec<i32>"* %0, i64 0, i32 1, i32 1
  store i64 %.sroa.4.0.i.i.i.i.i.i.i.i.i, i64* %9, align 8, !alias.scope !26, !noalias !27
  %10 = getelementptr inbounds %"std::vec::Vec<i32>", %"std::vec::Vec<i32>"* %0, i64 0, i32 3
  %11 = shl nuw i64 %slice.1, 2
  %12 = bitcast [0 x i32]* %slice.0 to i8*
  tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 4 %.sroa.0.0.i.i.i.i.i.i.i.i.i, i8* nonnull align 4 %12, i64 %11, i1 false) #7, !noalias !28
  store i64 %slice.1, i64* %10, align 8, !alias.scope !29, !noalias !27
  ret void
}

; Function Attrs: nofree nosync nounwind readnone speculatable willreturn
declare { i64, i1 } @llvm.umul.with.overflow.i64(i64, i64) #1

; Function Attrs: nounwind nonlazybind uwtable
declare i32 @rust_eh_personality(i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*) unnamed_addr #2

; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #3

; Function Attrs: nounwind nonlazybind uwtable
declare noalias i8* @__rust_alloc(i64, i64) unnamed_addr #2

; alloc::raw_vec::capacity_overflow
; Function Attrs: noreturn nonlazybind uwtable
declare void @_ZN5alloc7raw_vec17capacity_overflow17h97153d40f6e2cd3bE() unnamed_addr #4

; alloc::alloc::handle_alloc_error
; Function Attrs: cold noreturn nounwind nonlazybind uwtable
declare void @_ZN5alloc5alloc18handle_alloc_error17hbbcab09c85c3ddd4E(i64, i64) unnamed_addr #5

; Function Attrs: inaccessiblememonly nofree nosync nounwind willreturn
declare void @llvm.experimental.noalias.scope.decl(metadata) #6

attributes #0 = { nonlazybind uwtable "probe-stack"="inline-asm" "target-cpu"="x86-64" }
attributes #1 = { nofree nosync nounwind readnone speculatable willreturn }
attributes #2 = { nounwind nonlazybind uwtable "probe-stack"="inline-asm" "target-cpu"="x86-64" }
attributes #3 = { argmemonly nofree nosync nounwind willreturn }
attributes #4 = { noreturn nonlazybind uwtable "probe-stack"="inline-asm" "target-cpu"="x86-64" }
attributes #5 = { cold noreturn nounwind nonlazybind uwtable "probe-stack"="inline-asm" "target-cpu"="x86-64" }
attributes #6 = { inaccessiblememonly nofree nosync nounwind willreturn }
attributes #7 = { nounwind }

!llvm.module.flags = !{!0, !1}

!0 = !{i32 7, !"PIC Level", i32 2}
!1 = !{i32 2, !"RtLibUseGOT", i32 1}
!2 = !{!3}
!3 = distinct !{!3, !4, !"_ZN5alloc5slice29_$LT$impl$u20$$u5b$T$u5d$$GT$6to_vec17h82a8a1f74f9ea9ddE: argument 0"}
!4 = distinct !{!4, !"_ZN5alloc5slice29_$LT$impl$u20$$u5b$T$u5d$$GT$6to_vec17h82a8a1f74f9ea9ddE"}
!5 = !{!6}
!6 = distinct !{!6, !7, !"_ZN5alloc5slice29_$LT$impl$u20$$u5b$T$u5d$$GT$9to_vec_in17hf0efe825ef187949E: argument 0"}
!7 = distinct !{!7, !"_ZN5alloc5slice29_$LT$impl$u20$$u5b$T$u5d$$GT$9to_vec_in17hf0efe825ef187949E"}
!8 = !{!3, !9}
!9 = distinct !{!9, !4, !"_ZN5alloc5slice29_$LT$impl$u20$$u5b$T$u5d$$GT$6to_vec17h82a8a1f74f9ea9ddE: %self.0"}
!10 = !{!11}
!11 = distinct !{!11, !12, !"_ZN5alloc5slice4hack6to_vec17h4e5b681653d8de3eE: argument 0"}
!12 = distinct !{!12, !"_ZN5alloc5slice4hack6to_vec17h4e5b681653d8de3eE"}
!13 = !{!6, !14, !3, !9}
!14 = distinct !{!14, !7, !"_ZN5alloc5slice29_$LT$impl$u20$$u5b$T$u5d$$GT$9to_vec_in17hf0efe825ef187949E: %self.0"}
!15 = !{!16}
!16 = distinct !{!16, !17, !"_ZN52_$LT$T$u20$as$u20$alloc..slice..hack..ConvertVec$GT$6to_vec17h153ad63a55993150E: %v"}
!17 = distinct !{!17, !"_ZN52_$LT$T$u20$as$u20$alloc..slice..hack..ConvertVec$GT$6to_vec17h153ad63a55993150E"}
!18 = !{!11, !19, !6, !14, !3, !9}
!19 = distinct !{!19, !12, !"_ZN5alloc5slice4hack6to_vec17h4e5b681653d8de3eE: %s.0"}
!20 = !{!21}
!21 = distinct !{!21, !22, !"_ZN5alloc3vec16Vec$LT$T$C$A$GT$16with_capacity_in17hce1a244487fd289dE: argument 0"}
!22 = distinct !{!22, !"_ZN5alloc3vec16Vec$LT$T$C$A$GT$16with_capacity_in17hce1a244487fd289dE"}
!23 = !{!16, !24, !11, !19, !6, !14, !3, !9}
!24 = distinct !{!24, !17, !"_ZN52_$LT$T$u20$as$u20$alloc..slice..hack..ConvertVec$GT$6to_vec17h153ad63a55993150E: %s.0"}
!25 = !{!21, !16, !24, !11, !19, !6, !14, !3, !9}
!26 = !{!21, !16, !11, !6, !3}
!27 = !{!24, !19, !14, !9}
!28 = !{!16, !11, !6, !3}
!29 = !{!30, !16, !11, !6, !3}
!30 = distinct !{!30, !31, !"_ZN5alloc3vec16Vec$LT$T$C$A$GT$7set_len17hcd51c7b209261a71E: %self"}
!31 = distinct !{!31, !"_ZN5alloc3vec16Vec$LT$T$C$A$GT$7set_len17hcd51c7b209261a71E"}

New IR

; ModuleID = 'playground.5mgwo5pt-cgu.0'
source_filename = "playground.5mgwo5pt-cgu.0"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

%"std::vec::Vec<i32>" = type { [0 x i64], { i32*, i64 }, [0 x i64], i64, [0 x i64] }
%"unwind::libunwind::_Unwind_Exception" = type { [0 x i64], i64, [0 x i64], void (i32, %"unwind::libunwind::_Unwind_Exception"*)*, [0 x i64], [6 x i64], [0 x i64] }
%"unwind::libunwind::_Unwind_Context" = type { [0 x i8] }

; playground::copy_slice_new
; Function Attrs: nonlazybind uwtable
define void @_ZN10playground14copy_slice_new17hbe14d17b96488007E(%"std::vec::Vec<i32>"* noalias nocapture sret(%"std::vec::Vec<i32>") dereferenceable(24) %0, [0 x i32]* noalias nocapture nonnull readonly align 4 %slice.0, i64 %slice.1) unnamed_addr #0 personality i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* @rust_eh_personality {
start:
  tail call void @llvm.experimental.noalias.scope.decl(metadata !2)
  %1 = tail call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %slice.1, i64 4) #7
  %2 = extractvalue { i64, i1 } %1, 1
  %3 = extractvalue { i64, i1 } %1, 0
  %.sroa.3.0.i.i.i.i.i.i.i = select i1 %2, i64 0, i64 4
  %.sroa.0.0.i.i.i.i.i.i.i = select i1 %2, i64 undef, i64 %3
  br i1 %2, label %bb6.i.i.i.i.i.i, label %bb13.i.i.i.i.i.i

bb6.i.i.i.i.i.i:                                  ; preds = %start
; call alloc::raw_vec::capacity_overflow
  tail call void @_ZN5alloc7raw_vec17capacity_overflow17h97153d40f6e2cd3bE(), !noalias !5
  unreachable

bb13.i.i.i.i.i.i:                                 ; preds = %start
  %4 = icmp eq i64 %.sroa.0.0.i.i.i.i.i.i.i, 0
  br i1 %4, label %bb3.i.i.i.i.i.i.i.i, label %bb7.i.i.i.i.i.i.i.i

bb3.i.i.i.i.i.i.i.i:                              ; preds = %bb13.i.i.i.i.i.i
  %_2.i.i.i.i.i.i.i.i.i = inttoptr i64 %.sroa.3.0.i.i.i.i.i.i.i to i8*
  br label %"_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$8allocate17hdb6fd4f64ab5e4e5E.exit.i.i.i.i.i.i"

bb7.i.i.i.i.i.i.i.i:                              ; preds = %bb13.i.i.i.i.i.i
  %5 = tail call i8* @__rust_alloc(i64 %.sroa.0.0.i.i.i.i.i.i.i, i64 %.sroa.3.0.i.i.i.i.i.i.i) #7, !noalias !5
  br label %"_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$8allocate17hdb6fd4f64ab5e4e5E.exit.i.i.i.i.i.i"

"_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$8allocate17hdb6fd4f64ab5e4e5E.exit.i.i.i.i.i.i": ; preds = %bb7.i.i.i.i.i.i.i.i, %bb3.i.i.i.i.i.i.i.i
  %.sroa.0.0.i.i.i.i.i.i.i.i = phi i8* [ %_2.i.i.i.i.i.i.i.i.i, %bb3.i.i.i.i.i.i.i.i ], [ %5, %bb7.i.i.i.i.i.i.i.i ]
  %6 = icmp eq i8* %.sroa.0.0.i.i.i.i.i.i.i.i, null
  br i1 %6, label %bb20.i.i.i.i.i.i, label %_ZN10playground4hack6to_vec17h302c903ca456a4daE.exit

bb20.i.i.i.i.i.i:                                 ; preds = %"_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$8allocate17hdb6fd4f64ab5e4e5E.exit.i.i.i.i.i.i"
; call alloc::alloc::handle_alloc_error
  tail call void @_ZN5alloc5alloc18handle_alloc_error17hbbcab09c85c3ddd4E(i64 %.sroa.0.0.i.i.i.i.i.i.i, i64 %.sroa.3.0.i.i.i.i.i.i.i), !noalias !5
  unreachable

_ZN10playground4hack6to_vec17h302c903ca456a4daE.exit: ; preds = %"_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$8allocate17hdb6fd4f64ab5e4e5E.exit.i.i.i.i.i.i"
  %7 = shl nuw i64 %slice.1, 2
  %8 = bitcast [0 x i32]* %slice.0 to i8*
  tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 4 %.sroa.0.0.i.i.i.i.i.i.i.i, i8* nonnull align 4 %8, i64 %7, i1 false) #7, !noalias !2
  %9 = bitcast %"std::vec::Vec<i32>"* %0 to i8**
  store i8* %.sroa.0.0.i.i.i.i.i.i.i.i, i8** %9, align 8, !alias.scope !11, !noalias !16
  %10 = getelementptr inbounds %"std::vec::Vec<i32>", %"std::vec::Vec<i32>"* %0, i64 0, i32 1, i32 1
  store i64 %slice.1, i64* %10, align 8, !alias.scope !11, !noalias !16
  %11 = getelementptr inbounds %"std::vec::Vec<i32>", %"std::vec::Vec<i32>"* %0, i64 0, i32 3
  store i64 %slice.1, i64* %11, align 8, !alias.scope !11, !noalias !16
  ret void
}

; Function Attrs: nofree nosync nounwind readnone speculatable willreturn
declare { i64, i1 } @llvm.umul.with.overflow.i64(i64, i64) #1

; Function Attrs: nounwind nonlazybind uwtable
declare i32 @rust_eh_personality(i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*) unnamed_addr #2

; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #3

; Function Attrs: nounwind nonlazybind uwtable
declare noalias i8* @__rust_alloc(i64, i64) unnamed_addr #2

; alloc::raw_vec::capacity_overflow
; Function Attrs: noreturn nonlazybind uwtable
declare void @_ZN5alloc7raw_vec17capacity_overflow17h97153d40f6e2cd3bE() unnamed_addr #4

; alloc::alloc::handle_alloc_error
; Function Attrs: cold noreturn nounwind nonlazybind uwtable
declare void @_ZN5alloc5alloc18handle_alloc_error17hbbcab09c85c3ddd4E(i64, i64) unnamed_addr #5

; Function Attrs: inaccessiblememonly nofree nosync nounwind willreturn
declare void @llvm.experimental.noalias.scope.decl(metadata) #6

attributes #0 = { nonlazybind uwtable "probe-stack"="inline-asm" "target-cpu"="x86-64" }
attributes #1 = { nofree nosync nounwind readnone speculatable willreturn }
attributes #2 = { nounwind nonlazybind uwtable "probe-stack"="inline-asm" "target-cpu"="x86-64" }
attributes #3 = { argmemonly nofree nosync nounwind willreturn }
attributes #4 = { noreturn nonlazybind uwtable "probe-stack"="inline-asm" "target-cpu"="x86-64" }
attributes #5 = { cold noreturn nounwind nonlazybind uwtable "probe-stack"="inline-asm" "target-cpu"="x86-64" }
attributes #6 = { inaccessiblememonly nofree nosync nounwind willreturn }
attributes #7 = { nounwind }

!llvm.module.flags = !{!0, !1}

!0 = !{i32 7, !"PIC Level", i32 2}
!1 = !{i32 2, !"RtLibUseGOT", i32 1}
!2 = !{!3}
!3 = distinct !{!3, !4, !"_ZN10playground4hack6to_vec17h302c903ca456a4daE: argument 0"}
!4 = distinct !{!4, !"_ZN10playground4hack6to_vec17h302c903ca456a4daE"}
!5 = !{!6, !8, !3, !10}
!6 = distinct !{!6, !7, !"_ZN52_$LT$T$u20$as$u20$playground..hack..ConvertBoxed$GT$14to_boxed_slice17hf040d5891f80ff7fE: %s.0"}
!7 = distinct !{!7, !"_ZN52_$LT$T$u20$as$u20$playground..hack..ConvertBoxed$GT$14to_boxed_slice17hf040d5891f80ff7fE"}
!8 = distinct !{!8, !9, !"_ZN10playground4hack14to_boxed_slice17hee7857f846790a13E: %s.0"}
!9 = distinct !{!9, !"_ZN10playground4hack14to_boxed_slice17hee7857f846790a13E"}
!10 = distinct !{!10, !4, !"_ZN10playground4hack6to_vec17h302c903ca456a4daE: %s.0"}
!11 = !{!12, !14, !3}
!12 = distinct !{!12, !13, !"_ZN5alloc3vec16Vec$LT$T$C$A$GT$17from_raw_parts_in17h851da6b596063aefE: argument 0"}
!13 = distinct !{!13, !"_ZN5alloc3vec16Vec$LT$T$C$A$GT$17from_raw_parts_in17h851da6b596063aefE"}
!14 = distinct !{!14, !15, !"_ZN10playground4hack8into_vec17hf50de4bbad45c759E: argument 0"}
!15 = distinct !{!15, !"_ZN10playground4hack8into_vec17hf50de4bbad45c759E"}
!16 = !{!17, !10}
!17 = distinct !{!17, !15, !"_ZN10playground4hack8into_vec17hf50de4bbad45c759E: argument 1"}

the8472 · 2021-04-09T18:14:01Z

Well, then I am out of guesses. If you want to dig further you can download (there's a tool for that) the try builds that were used for the perf run and then get additional profiling outputs for individual benchmarks that regressed to figure out what's causing it.

JohnCSimon · 2021-04-26T04:43:14Z

ping from triage:
@calebsander what would you like to do with this PR?

calebsander · 2021-04-26T08:18:58Z

I don't think I have enough experience with the internals of the compilation process to debug why compilation time has increased for some of the perf test cases. If someone with more expertise wants to take a look at it, they are welcome to.
I think it would make the most sense to merge these changes and let others contribute optimizations later. But if the team feels that the performance regressions are unacceptable, then it probably makes sense to just close this PR.

bstrie · 2021-05-12T23:28:43Z

But if the team feels that the performance regressions are unacceptable, then it probably makes sense to just close this PR.

Since there's a tracking issue to keep this topic from being totally dropped, and in lieu of any immediate solutions to the perf problems, I'd recommend closing for now unless a libs team member disagrees.

bstrie · 2021-06-02T23:01:13Z

I'll close for now, this is tracked by #82725 .

rust-highfive assigned sfackler Dec 28, 2020

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Dec 28, 2020

calebsander force-pushed the feature/slice-to-box branch 2 times, most recently from 76e0473 to 509280c Compare December 29, 2020 18:04

JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 19, 2021

JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 7, 2021

rust-highfive assigned m-ou-se and unassigned sfackler Feb 9, 2021

JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 28, 2021

m-ou-se reviewed Mar 3, 2021

View reviewed changes

library/alloc/src/slice.rs Outdated Show resolved Hide resolved

library/alloc/src/boxed.rs Outdated Show resolved Hide resolved

calebsander mentioned this pull request Mar 3, 2021

Tracking Issue for slice_to_boxed #82725

Closed

4 tasks

Add function to clone a slice into a boxed slice

f90aeeb

calebsander force-pushed the feature/slice-to-box branch from 509280c to f50a592 Compare March 3, 2021 16:58

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 3, 2021

calebsander mentioned this pull request Mar 3, 2021

Avoid unnecessary Vec construction in BufReader #82728

Merged

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 8, 2021

m-ou-se added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 21, 2021

crlf0710 added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Apr 9, 2021

the8472 reviewed Apr 9, 2021

View reviewed changes

bstrie closed this Jun 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `to_boxed_slice()` to clone slice into boxed slice #80460

Add `to_boxed_slice()` to clone slice into boxed slice #80460

calebsander commented Dec 28, 2020 •

edited

Loading

rust-highfive commented Dec 28, 2020

the8472 commented Dec 29, 2020

calebsander commented Dec 29, 2020

the8472 commented Dec 29, 2020

calebsander commented Dec 29, 2020

the8472 commented Dec 30, 2020

calebsander commented Jan 3, 2021

Dylan-DPC-zz commented Feb 9, 2021

m-ou-se left a comment

calebsander commented Mar 3, 2021

m-ou-se commented Mar 3, 2021

rust-timer commented Mar 3, 2021

bors commented Mar 3, 2021

calebsander commented Mar 8, 2021

m-ou-se commented Mar 8, 2021

rust-timer commented Mar 8, 2021

bors commented Mar 8, 2021

calebsander commented Mar 8, 2021

bors commented Mar 8, 2021

rust-timer commented Mar 8, 2021

rust-timer commented Mar 8, 2021

m-ou-se commented Mar 21, 2021

calebsander commented Mar 24, 2021 •

edited

Loading

crlf0710 commented Apr 9, 2021

calebsander commented Apr 9, 2021

the8472 commented Apr 9, 2021

the8472 Apr 9, 2021

calebsander Apr 9, 2021

calebsander commented Apr 9, 2021 •

edited

Loading

the8472 commented Apr 9, 2021

JohnCSimon commented Apr 26, 2021

calebsander commented Apr 26, 2021

bstrie commented May 12, 2021

bstrie commented Jun 2, 2021

Add to_boxed_slice() to clone slice into boxed slice #80460

Add to_boxed_slice() to clone slice into boxed slice #80460

Conversation

calebsander commented Dec 28, 2020 • edited Loading

rust-highfive commented Dec 28, 2020

the8472 commented Dec 29, 2020

calebsander commented Dec 29, 2020

the8472 commented Dec 29, 2020

calebsander commented Dec 29, 2020

the8472 commented Dec 30, 2020

calebsander commented Jan 3, 2021

Dylan-DPC-zz commented Feb 9, 2021

m-ou-se left a comment

Choose a reason for hiding this comment

calebsander commented Mar 3, 2021

m-ou-se commented Mar 3, 2021

rust-timer commented Mar 3, 2021

bors commented Mar 3, 2021

calebsander commented Mar 8, 2021

m-ou-se commented Mar 8, 2021

rust-timer commented Mar 8, 2021

bors commented Mar 8, 2021

calebsander commented Mar 8, 2021

bors commented Mar 8, 2021

rust-timer commented Mar 8, 2021

rust-timer commented Mar 8, 2021

m-ou-se commented Mar 21, 2021

calebsander commented Mar 24, 2021 • edited Loading

crlf0710 commented Apr 9, 2021

calebsander commented Apr 9, 2021

the8472 commented Apr 9, 2021

the8472 Apr 9, 2021

Choose a reason for hiding this comment

calebsander Apr 9, 2021

Choose a reason for hiding this comment

calebsander commented Apr 9, 2021 • edited Loading

the8472 commented Apr 9, 2021

JohnCSimon commented Apr 26, 2021

calebsander commented Apr 26, 2021

bstrie commented May 12, 2021

bstrie commented Jun 2, 2021

Add `to_boxed_slice()` to clone slice into boxed slice #80460

Add `to_boxed_slice()` to clone slice into boxed slice #80460

calebsander commented Dec 28, 2020 •

edited

Loading

calebsander commented Mar 24, 2021 •

edited

Loading

calebsander commented Apr 9, 2021 •

edited

Loading