Skip to content

Pessimization in SROA #101899

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ChayimFriedman2 opened this issue Aug 4, 2024 · 0 comments
Open

Pessimization in SROA #101899

ChayimFriedman2 opened this issue Aug 4, 2024 · 0 comments

Comments

@ChayimFriedman2
Copy link

Given the following LLVM IR (minimized from real Rust code):

define void @after_all_unwrap(ptr dead_on_unwind noalias nocapture noundef writable writeonly sret([16 x i8]) align 8 dereferenceable(16) %_0, ptr noalias noundef nonnull align 4 %data.0, i64 noundef %data.1, ptr noalias nocapture noundef readonly align 8 dereferenceable(16) %indices) {
start:
  %_8 = alloca [16 x i8], align 8
  %_3 = alloca [16 x i8], align 8
  %0 = icmp eq i8 2, 2
  br i1 %0, label %bb1, label %bb2

bb1:   
  %1 = load i64, ptr %indices, align 8
  %2 = getelementptr inbounds i32, ptr %data.0, i64 %1
  %3 = getelementptr inbounds i8, ptr %indices, i64 8
  %4 = load i64, ptr %3, align 8
  %5 = getelementptr inbounds i32, ptr %data.0, i64 %4
  store ptr %2, ptr %_8, align 8
  %slot.sroa.4.0._0.sroa_idx.i = getelementptr inbounds i8, ptr %_8, i64 8
  store ptr %5, ptr %slot.sroa.4.0._0.sroa_idx.i, align 8
  call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 8 dereferenceable(16) %_3, ptr noundef nonnull align 8 dereferenceable(16) %_8, i64 16, i1 false)
  br label %bb3

bb2:
  %6 = getelementptr inbounds i8, ptr %_3, i64 8
  store i8 0, ptr %6, align 8
  store ptr null, ptr %_3, align 8
  br label %bb3

bb3:
  %8 = load ptr, ptr %_3, align 8
  %9 = icmp eq ptr %8, null
  br i1 %9, label %unwrap, label %exit

unwrap:
  call void @llvm.trap()
  unreachable

exit: ; preds = %bb3
  call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 8 dereferenceable(16) %_0, ptr noundef nonnull align 8 dereferenceable(16) %_3, i64 16, i1 false)
  ret void
}

LLVM transforms it into:

define void @after_all_unwrap(ptr dead_on_unwind noalias nocapture noundef writable writeonly sret([16 x i8]) align 8 dereferenceable(16) %_0, ptr noalias noundef nonnull align 4 %data.0, i64 noundef %data.1, ptr noalias nocapture noundef readonly align 8 dereferenceable(16) %indices) local_unnamed_addr #0 {
  %0 = load i64, ptr %indices, align 8
  %1 = getelementptr inbounds i8, ptr %indices, i64 8
  %2 = load i64, ptr %1, align 8
  %3 = getelementptr inbounds i32, ptr %data.0, i64 %0
  %4 = getelementptr inbounds i32, ptr %data.0, i64 %2
  %5 = ptrtoint ptr %4 to i64
  %_8.sroa.2.9.extract.shift = lshr i64 %5, 8
  %_8.sroa.2.9.extract.trunc = trunc nuw i64 %_8.sroa.2.9.extract.shift to i56
  %_8.sroa.2.8.extract.trunc = trunc i64 %5 to i8
  store ptr %3, ptr %_0, align 8
  %_3.sroa.4.0._0.sroa_idx = getelementptr inbounds i8, ptr %_0, i64 8
  store i8 %_8.sroa.2.8.extract.trunc, ptr %_3.sroa.4.0._0.sroa_idx, align 8
  %_3.sroa.5.0._0.sroa_idx = getelementptr inbounds i8, ptr %_0, i64 9
  store i56 %_8.sroa.2.9.extract.trunc, ptr %_3.sroa.5.0._0.sroa_idx, align 1
  ret void
}

attributes #0 = { mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: readwrite) }

Instead of the clearly more efficient:

define void @after_all_unwrap(ptr dead_on_unwind noalias nocapture noundef writable writeonly sret([16 x i8]) align 8 dereferenceable(16) %_0, ptr noalias noundef nonnull align 4 %data.0, i64 noundef %data.1, ptr noalias nocapture noundef readonly align 8 dereferenceable(16) %indices) local_unnamed_addr #0 {
  %0 = getelementptr inbounds i8, ptr %indices, i64 8
  %1 = load i64, ptr %0, align 8
  %2 = getelementptr inbounds i32, ptr %data.0, i64 %1
  %3 = load i64, ptr %indices, align 8
  %4 = getelementptr inbounds i32, ptr %data.0, i64 %3
  store ptr %4, ptr %_0, align 8
  %_3.sroa.3.0._0.sroa_idx = getelementptr inbounds i8, ptr %_0, i64 8
  store ptr %2, ptr %_3.sroa.3.0._0.sroa_idx, align 8
  ret void
}

attributes #0 = { mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: readwrite) }

It's SROA who inserts the truncs and shifts, and later when we discover the condition is always true we cannot get rid of them.

Please note that the important fact is not the order of passes: I deliberately made it so SROA will have a chance before we notice the condition is true. In the real case, the condition does not always hold, but SROA still pessimizes the output.

In my real case SROA doesn't really matter since the value is anyway going to be written into the return pointer (like here), but even if SROA cannot see that it seems it should not pessimize the code.

Godbolt: https://godbolt.org/z/rr15o7Keb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants