-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[MLIR][Flang][OpenMP] Make omp.wsloop into a loop wrapper #88403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This patch introduces an operation intended to hold loop information associated to the `omp.distribute`, `omp.simdloop`, `omp.taskloop` and `omp.wsloop` operations. This is a stopgap solution to unblock work on transitioning these operations to becoming wrappers, as discussed in [this RFC](https://discourse.llvm.org/t/rfc-representing-combined-composite-constructs-in-the-openmp-dialect/76986). Long-term, this operation will likely be replaced by `omp.canonical_loop`, which is being designed to address missing support for loop transformations, etc.
This patch defines a common interface to be shared by all OpenMP loop wrapper operations. The main restrictions these operations must meet in order to be considered a wrapper are: - They contain a single region. - Their region contains a single block. - Their block only contains another loop wrapper or `omp.loop_nest` and a terminator. The new interface is attached to the `omp.parallel`, `omp.wsloop`, `omp.simdloop`, `omp.distribute` and `omp.taskloop` operations. It is not currently enforced that these operations meet the wrapper restrictions, which would break existing OpenMP loop-generating code. Rather, this will be introduced progressively in subsequent patches.
…/spr/loop-nest-02-wrapper-iface
This patch updates the definition of `omp.simdloop` to enforce the restrictions of a wrapper operation. It has been renamed to `omp.simd`, to better reflect the naming used in the spec. All uses of "simdloop" in function names have been updated accordingly. Some changes to Flang lowering and OpenMP to LLVM IR translation are introduced to prevent the introduction of compilation/test failures. The eventual long term solution might be different.
This patch updates the definition of `omp.wsloop` to enforce the restrictions of a wrapper operation. Given the widespread use of this operation, the changes introduced in this patch are several: - Update the MLIR definition of the `omp.wsloop`, as well as parser/printer, builder and verifier. - Update verifiers for `omp.ordered.region`, `omp.cancel` and `omp.cancellation_point` to correctly check for a parent `omp.wsloop`. - Update MLIR to LLVM IR translation of `omp.wsloop` to keep working after the change in representation. Another patch should be created to reduce the current code duplication between `omp.wsloop` and `omp.simd` after introducing a common `omp.loop_nest` operation. - Update the `scf.parallel` lowering pass to OpenMP to produce the new expected representation. - Update flang lowering to implement `omp.wsloop` representation changes, including changes to `lastprivate`, and `reduction` handling to avoid adding operations into a wrapper and attach entry block arguments to the right operation. - Fix unit tests broken due to the representation change.
@llvm/pr-subscribers-mlir-llvm @llvm/pr-subscribers-mlir-openmp Author: Sergio Afonso (skatrak) ChangesThis patch updates the definition of
Patch is 758.19 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/88403.diff 110 Files Affected:
diff --git a/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp b/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
index e114ab9f4548ab..645c351ac6c08c 100644
--- a/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
@@ -133,8 +133,14 @@ void DataSharingProcessor::insertBarrier() {
}
void DataSharingProcessor::insertLastPrivateCompare(mlir::Operation *op) {
+ mlir::OpBuilder::InsertionGuard guard(firOpBuilder);
+ mlir::omp::LoopNestOp loopOp;
+ if (auto wrapper = mlir::dyn_cast<mlir::omp::LoopWrapperInterface>(op))
+ loopOp = wrapper.isWrapper()
+ ? mlir::cast<mlir::omp::LoopNestOp>(wrapper.getWrappedLoop())
+ : nullptr;
+
bool cmpCreated = false;
- mlir::OpBuilder::InsertPoint localInsPt = firOpBuilder.saveInsertionPoint();
for (const omp::Clause &clause : clauses) {
if (clause.id != llvm::omp::OMPC_lastprivate)
continue;
@@ -213,18 +219,20 @@ void DataSharingProcessor::insertLastPrivateCompare(mlir::Operation *op) {
// Update the original variable just before exiting the worksharing
// loop. Conversion as follows:
//
- // omp.wsloop {
- // omp.wsloop { ...
- // ... store
- // store ===> %v = arith.addi %iv, %step
- // omp.yield %cmp = %step < 0 ? %v < %ub : %v > %ub
- // } fir.if %cmp {
- // fir.store %v to %loopIV
- // ^%lpv_update_blk:
- // }
- // omp.yield
- // }
- //
+ // omp.wsloop { omp.wsloop {
+ // omp.loop_nest { omp.loop_nest {
+ // ... ...
+ // store ===> store
+ // omp.yield %v = arith.addi %iv, %step
+ // } %cmp = %step < 0 ? %v < %ub : %v > %ub
+ // omp.terminator fir.if %cmp {
+ // } fir.store %v to %loopIV
+ // ^%lpv_update_blk:
+ // }
+ // omp.yield
+ // }
+ // omp.terminator
+ // }
// Only generate the compare once in presence of multiple LastPrivate
// clauses.
@@ -232,14 +240,13 @@ void DataSharingProcessor::insertLastPrivateCompare(mlir::Operation *op) {
continue;
cmpCreated = true;
- mlir::Location loc = op->getLoc();
- mlir::Operation *lastOper = op->getRegion(0).back().getTerminator();
+ mlir::Location loc = loopOp.getLoc();
+ mlir::Operation *lastOper = loopOp.getRegion().back().getTerminator();
firOpBuilder.setInsertionPoint(lastOper);
- mlir::Value iv = op->getRegion(0).front().getArguments()[0];
- mlir::Value ub =
- mlir::dyn_cast<mlir::omp::WsloopOp>(op).getUpperBound()[0];
- mlir::Value step = mlir::dyn_cast<mlir::omp::WsloopOp>(op).getStep()[0];
+ mlir::Value iv = loopOp.getIVs()[0];
+ mlir::Value ub = loopOp.getUpperBound()[0];
+ mlir::Value step = loopOp.getStep()[0];
// v = iv + step
// cmp = step < 0 ? v < ub : v > ub
@@ -258,7 +265,7 @@ void DataSharingProcessor::insertLastPrivateCompare(mlir::Operation *op) {
auto ifOp = firOpBuilder.create<fir::IfOp>(loc, cmpOp, /*else*/ false);
firOpBuilder.setInsertionPointToStart(&ifOp.getThenRegion().front());
assert(loopIV && "loopIV was not set");
- firOpBuilder.create<fir::StoreOp>(op->getLoc(), v, loopIV);
+ firOpBuilder.create<fir::StoreOp>(loopOp.getLoc(), v, loopIV);
lastPrivIP = firOpBuilder.saveInsertionPoint();
} else {
TODO(converter.getCurrentLocation(),
@@ -266,7 +273,6 @@ void DataSharingProcessor::insertLastPrivateCompare(mlir::Operation *op) {
"simd/worksharing-loop");
}
}
- firOpBuilder.restoreInsertionPoint(localInsPt);
}
void DataSharingProcessor::collectSymbols(
diff --git a/flang/lib/Lower/OpenMP/OpenMP.cpp b/flang/lib/Lower/OpenMP/OpenMP.cpp
index 1800fcb19dcd2e..b21351382b6bdf 100644
--- a/flang/lib/Lower/OpenMP/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP/OpenMP.cpp
@@ -1626,7 +1626,9 @@ genOmpFlush(Fortran::lower::AbstractConverter &converter,
static llvm::SmallVector<const Fortran::semantics::Symbol *>
genLoopVars(mlir::Operation *op, Fortran::lower::AbstractConverter &converter,
mlir::Location &loc,
- llvm::ArrayRef<const Fortran::semantics::Symbol *> args) {
+ llvm::ArrayRef<const Fortran::semantics::Symbol *> args,
+ llvm::ArrayRef<const Fortran::semantics::Symbol *> wrapperSyms = {},
+ llvm::ArrayRef<mlir::BlockArgument> wrapperArgs = {}) {
fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder();
auto ®ion = op->getRegion(0);
@@ -1637,6 +1639,14 @@ genLoopVars(mlir::Operation *op, Fortran::lower::AbstractConverter &converter,
llvm::SmallVector<mlir::Type> tiv(args.size(), loopVarType);
llvm::SmallVector<mlir::Location> locs(args.size(), loc);
firOpBuilder.createBlock(®ion, {}, tiv, locs);
+
+ // Bind the entry block arguments of parent wrappers to the corresponding
+ // symbols. Do it here so that any hlfir.declare operations created as a
+ // result are inserted inside of the omp.loop_nest rather than the wrapper
+ // operations.
+ for (auto [arg, prv] : llvm::zip_equal(wrapperSyms, wrapperArgs))
+ converter.bindSymbol(*arg, prv);
+
// The argument is not currently in memory, so make a temporary for the
// argument, and store it there, then bind that location to the argument.
mlir::Operation *storeOp = nullptr;
@@ -1650,58 +1660,6 @@ genLoopVars(mlir::Operation *op, Fortran::lower::AbstractConverter &converter,
return llvm::SmallVector<const Fortran::semantics::Symbol *>(args);
}
-static llvm::SmallVector<const Fortran::semantics::Symbol *>
-genLoopAndReductionVars(
- mlir::Operation *op, Fortran::lower::AbstractConverter &converter,
- mlir::Location &loc,
- llvm::ArrayRef<const Fortran::semantics::Symbol *> loopArgs,
- llvm::ArrayRef<const Fortran::semantics::Symbol *> reductionArgs,
- llvm::ArrayRef<mlir::Type> reductionTypes) {
- fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder();
-
- llvm::SmallVector<mlir::Type> blockArgTypes;
- llvm::SmallVector<mlir::Location> blockArgLocs;
- blockArgTypes.reserve(loopArgs.size() + reductionArgs.size());
- blockArgLocs.reserve(blockArgTypes.size());
- mlir::Block *entryBlock;
-
- if (loopArgs.size()) {
- std::size_t loopVarTypeSize = 0;
- for (const Fortran::semantics::Symbol *arg : loopArgs)
- loopVarTypeSize = std::max(loopVarTypeSize, arg->GetUltimate().size());
- mlir::Type loopVarType = getLoopVarType(converter, loopVarTypeSize);
- std::fill_n(std::back_inserter(blockArgTypes), loopArgs.size(),
- loopVarType);
- std::fill_n(std::back_inserter(blockArgLocs), loopArgs.size(), loc);
- }
- if (reductionArgs.size()) {
- llvm::copy(reductionTypes, std::back_inserter(blockArgTypes));
- std::fill_n(std::back_inserter(blockArgLocs), reductionArgs.size(), loc);
- }
- entryBlock = firOpBuilder.createBlock(&op->getRegion(0), {}, blockArgTypes,
- blockArgLocs);
- // The argument is not currently in memory, so make a temporary for the
- // argument, and store it there, then bind that location to the argument.
- if (loopArgs.size()) {
- mlir::Operation *storeOp = nullptr;
- for (auto [argIndex, argSymbol] : llvm::enumerate(loopArgs)) {
- mlir::Value indexVal =
- fir::getBase(op->getRegion(0).front().getArgument(argIndex));
- storeOp =
- createAndSetPrivatizedLoopVar(converter, loc, indexVal, argSymbol);
- }
- firOpBuilder.setInsertionPointAfter(storeOp);
- }
- // Bind the reduction arguments to their block arguments
- for (auto [arg, prv] : llvm::zip_equal(
- reductionArgs,
- llvm::drop_begin(entryBlock->getArguments(), loopArgs.size()))) {
- converter.bindSymbol(*arg, prv);
- }
-
- return llvm::SmallVector<const Fortran::semantics::Symbol *>(loopArgs);
-}
-
static void createSimd(Fortran::lower::AbstractConverter &converter,
Fortran::semantics::SemanticsContext &semaCtx,
Fortran::lower::pft::Evaluation &eval,
@@ -1797,28 +1755,26 @@ static void createWsloop(Fortran::lower::AbstractConverter &converter,
if (ReductionProcessor::doReductionByRef(reductionVars))
byrefOperand = firOpBuilder.getUnitAttr();
- auto wsLoopOp = firOpBuilder.create<mlir::omp::WsloopOp>(
- loc, lowerBound, upperBound, step, linearVars, linearStepVars,
- reductionVars,
+ auto wsloopOp = firOpBuilder.create<mlir::omp::WsloopOp>(
+ loc, linearVars, linearStepVars, reductionVars,
reductionDeclSymbols.empty()
? nullptr
: mlir::ArrayAttr::get(firOpBuilder.getContext(),
reductionDeclSymbols),
scheduleValClauseOperand, scheduleChunkClauseOperand,
- /*schedule_modifiers=*/nullptr,
- /*simd_modifier=*/nullptr, nowaitClauseOperand, byrefOperand,
- orderedClauseOperand, orderClauseOperand,
- /*inclusive=*/firOpBuilder.getUnitAttr());
+ /*schedule_modifiers=*/nullptr, /*simd_modifier=*/nullptr,
+ nowaitClauseOperand, byrefOperand, orderedClauseOperand,
+ orderClauseOperand);
// Handle attribute based clauses.
if (cp.processOrdered(orderedClauseOperand))
- wsLoopOp.setOrderedValAttr(orderedClauseOperand);
+ wsloopOp.setOrderedValAttr(orderedClauseOperand);
if (cp.processSchedule(scheduleValClauseOperand, scheduleModClauseOperand,
scheduleSimdClauseOperand)) {
- wsLoopOp.setScheduleValAttr(scheduleValClauseOperand);
- wsLoopOp.setScheduleModifierAttr(scheduleModClauseOperand);
- wsLoopOp.setSimdModifierAttr(scheduleSimdClauseOperand);
+ wsloopOp.setScheduleValAttr(scheduleValClauseOperand);
+ wsloopOp.setScheduleModifierAttr(scheduleModClauseOperand);
+ wsloopOp.setSimdModifierAttr(scheduleSimdClauseOperand);
}
// In FORTRAN `nowait` clause occur at the end of `omp do` directive.
// i.e
@@ -1828,23 +1784,36 @@ static void createWsloop(Fortran::lower::AbstractConverter &converter,
if (endClauseList) {
if (ClauseProcessor(converter, semaCtx, *endClauseList)
.processNowait(nowaitClauseOperand))
- wsLoopOp.setNowaitAttr(nowaitClauseOperand);
+ wsloopOp.setNowaitAttr(nowaitClauseOperand);
}
+ // Create omp.wsloop wrapper and populate entry block arguments with reduction
+ // variables.
+ llvm::SmallVector<mlir::Location> reductionLocs(reductionSymbols.size(), loc);
+ mlir::Block *wsloopEntryBlock = firOpBuilder.createBlock(
+ &wsloopOp.getRegion(), {}, reductionTypes, reductionLocs);
+ firOpBuilder.setInsertionPoint(
+ Fortran::lower::genOpenMPTerminator(firOpBuilder, wsloopOp, loc));
+
+ // Create nested omp.loop_nest and fill body with loop contents.
+ auto loopOp = firOpBuilder.create<mlir::omp::LoopNestOp>(
+ loc, lowerBound, upperBound, step,
+ /*inclusive=*/firOpBuilder.getUnitAttr());
+
auto *nestedEval = getCollapsedLoopEval(
eval, Fortran::lower::getCollapseValue(beginClauseList));
auto ivCallback = [&](mlir::Operation *op) {
- return genLoopAndReductionVars(op, converter, loc, iv, reductionSymbols,
- reductionTypes);
+ return genLoopVars(op, converter, loc, iv, reductionSymbols,
+ wsloopEntryBlock->getArguments());
};
createBodyOfOp<mlir::omp::WsloopOp>(
- *wsLoopOp, OpWithBodyGenInfo(converter, semaCtx, loc, *nestedEval)
- .setClauses(&beginClauseList)
- .setDataSharingProcessor(&dsp)
- .setReductions(&reductionSymbols, &reductionTypes)
- .setGenRegionEntryCb(ivCallback));
+ *loopOp, OpWithBodyGenInfo(converter, semaCtx, loc, *nestedEval)
+ .setClauses(&beginClauseList)
+ .setDataSharingProcessor(&dsp)
+ .setReductions(&reductionSymbols, &reductionTypes)
+ .setGenRegionEntryCb(ivCallback));
}
static void createSimdWsloop(
@@ -2430,8 +2399,8 @@ static void genOMP(Fortran::lower::AbstractConverter &converter,
mlir::Operation *Fortran::lower::genOpenMPTerminator(fir::FirOpBuilder &builder,
mlir::Operation *op,
mlir::Location loc) {
- if (mlir::isa<mlir::omp::WsloopOp, mlir::omp::DeclareReductionOp,
- mlir::omp::AtomicUpdateOp, mlir::omp::LoopNestOp>(op))
+ if (mlir::isa<mlir::omp::AtomicUpdateOp, mlir::omp::DeclareReductionOp,
+ mlir::omp::LoopNestOp>(op))
return builder.create<mlir::omp::YieldOp>(loc);
return builder.create<mlir::omp::TerminatorOp>(loc);
}
diff --git a/flang/test/Fir/convert-to-llvm-openmp-and-fir.fir b/flang/test/Fir/convert-to-llvm-openmp-and-fir.fir
index fa7979e8875afc..c7c609bbb35623 100644
--- a/flang/test/Fir/convert-to-llvm-openmp-and-fir.fir
+++ b/flang/test/Fir/convert-to-llvm-openmp-and-fir.fir
@@ -7,15 +7,17 @@ func.func @_QPsb1(%arg0: !fir.ref<i32> {fir.bindc_name = "n"}, %arg1: !fir.ref<!
omp.parallel {
%1 = fir.alloca i32 {adapt.valuebyref, pinned}
%2 = fir.load %arg0 : !fir.ref<i32>
- omp.wsloop nowait
- for (%arg2) : i32 = (%c1_i32) to (%2) inclusive step (%c1_i32) {
- fir.store %arg2 to %1 : !fir.ref<i32>
- %3 = fir.load %1 : !fir.ref<i32>
- %4 = fir.convert %3 : (i32) -> i64
- %5 = arith.subi %4, %c1_i64 : i64
- %6 = fir.coordinate_of %arg1, %5 : (!fir.ref<!fir.array<?xi32>>, i64) -> !fir.ref<i32>
- fir.store %3 to %6 : !fir.ref<i32>
- omp.yield
+ omp.wsloop nowait {
+ omp.loop_nest (%arg2) : i32 = (%c1_i32) to (%2) inclusive step (%c1_i32) {
+ fir.store %arg2 to %1 : !fir.ref<i32>
+ %3 = fir.load %1 : !fir.ref<i32>
+ %4 = fir.convert %3 : (i32) -> i64
+ %5 = arith.subi %4, %c1_i64 : i64
+ %6 = fir.coordinate_of %arg1, %5 : (!fir.ref<!fir.array<?xi32>>, i64) -> !fir.ref<i32>
+ fir.store %3 to %6 : !fir.ref<i32>
+ omp.yield
+ }
+ omp.terminator
}
omp.terminator
}
@@ -31,7 +33,7 @@ func.func @_QPsb1(%arg0: !fir.ref<i32> {fir.bindc_name = "n"}, %arg1: !fir.ref<!
// CHECK: %[[I_VAR:.*]] = llvm.alloca %[[ONE_3]] x i32 {pinned} : (i64) -> !llvm.ptr
// CHECK: %[[N:.*]] = llvm.load %[[N_REF]] : !llvm.ptr -> i32
// CHECK: omp.wsloop nowait
-// CHECK-SAME: for (%[[I:.*]]) : i32 = (%[[ONE_2]]) to (%[[N]]) inclusive step (%[[ONE_2]]) {
+// CHECK-NEXT: omp.loop_nest (%[[I:.*]]) : i32 = (%[[ONE_2]]) to (%[[N]]) inclusive step (%[[ONE_2]]) {
// CHECK: llvm.store %[[I]], %[[I_VAR]] : i32, !llvm.ptr
// CHECK: %[[I1:.*]] = llvm.load %[[I_VAR]] : !llvm.ptr -> i32
// CHECK: %[[I1_EXT:.*]] = llvm.sext %[[I1]] : i32 to i64
@@ -42,6 +44,8 @@ func.func @_QPsb1(%arg0: !fir.ref<i32> {fir.bindc_name = "n"}, %arg1: !fir.ref<!
// CHECK: }
// CHECK: omp.terminator
// CHECK: }
+// CHECK: omp.terminator
+// CHECK: }
// CHECK: llvm.return
// CHECK: }
@@ -79,13 +83,16 @@ func.func @_QPsb(%arr: !fir.box<!fir.array<?xi32>> {fir.bindc_name = "arr"}) {
omp.parallel {
%c1 = arith.constant 1 : i32
%c50 = arith.constant 50 : i32
- omp.wsloop for (%indx) : i32 = (%c1) to (%c50) inclusive step (%c1) {
- %1 = fir.convert %indx : (i32) -> i64
- %c1_i64 = arith.constant 1 : i64
- %2 = arith.subi %1, %c1_i64 : i64
- %3 = fir.coordinate_of %arr, %2 : (!fir.box<!fir.array<?xi32>>, i64) -> !fir.ref<i32>
- fir.store %indx to %3 : !fir.ref<i32>
- omp.yield
+ omp.wsloop {
+ omp.loop_nest (%indx) : i32 = (%c1) to (%c50) inclusive step (%c1) {
+ %1 = fir.convert %indx : (i32) -> i64
+ %c1_i64 = arith.constant 1 : i64
+ %2 = arith.subi %1, %c1_i64 : i64
+ %3 = fir.coordinate_of %arr, %2 : (!fir.box<!fir.array<?xi32>>, i64) -> !fir.ref<i32>
+ fir.store %indx to %3 : !fir.ref<i32>
+ omp.yield
+ }
+ omp.terminator
}
omp.terminator
}
@@ -98,9 +105,11 @@ func.func @_QPsb(%arr: !fir.box<!fir.array<?xi32>> {fir.bindc_name = "arr"}) {
// CHECK: omp.parallel {
// CHECK: %[[C1:.*]] = llvm.mlir.constant(1 : i32) : i32
// CHECK: %[[C50:.*]] = llvm.mlir.constant(50 : i32) : i32
-// CHECK: omp.wsloop for (%[[INDX:.*]]) : i32 = (%[[C1]]) to (%[[C50]]) inclusive step (%[[C1]]) {
-// CHECK: llvm.store %[[INDX]], %{{.*}} : i32, !llvm.ptr
-// CHECK: omp.yield
+// CHECK: omp.wsloop {
+// CHECK-NEXT: omp.loop_nest (%[[INDX:.*]]) : i32 = (%[[C1]]) to (%[[C50]]) inclusive step (%[[C1]]) {
+// CHECK: llvm.store %[[INDX]], %{{.*}} : i32, !llvm.ptr
+// CHECK: omp.yield
+// CHECK: omp.terminator
// CHECK: omp.terminator
// CHECK: llvm.return
@@ -708,18 +717,20 @@ func.func @_QPsb() {
// CHECK-SAME: %[[ARRAY_REF:.*]]: !llvm.ptr
// CHECK: %[[RED_ACCUMULATOR:.*]] = llvm.alloca %2 x i32 {bindc_name = "x"} : (i64) -> !llvm.ptr
// CHECK: omp.parallel {
-// CHECK: omp.wsloop reduction(@[[EQV_REDUCTION]] %[[RED_ACCUMULATOR]] -> %[[PRV:.+]] : !llvm.ptr) for
-// CHECK: %[[ARRAY_ELEM_REF:.*]] = llvm.getelementptr %[[ARRAY_REF]][0, %{{.*}}] : (!llvm.ptr, i64) -> !llvm.ptr
-// CHECK: %[[ARRAY_ELEM:.*]] = llvm.load %[[ARRAY_ELEM_REF]] : !llvm.ptr -> i32
-// CHECK: %[[LPRV:.+]] = llvm.load %[[PRV]] : !llvm.ptr -> i32
-// CHECK: %[[ZERO_1:.*]] = llvm.mlir.constant(0 : i64) : i32
-// CHECK: %[[ARGVAL_1:.*]] = llvm.icmp "ne" %[[LPRV]], %[[ZERO_1]] : i32
-// CHECK: %[[ZERO_2:.*]] = llvm.mlir.constant(0 : i64) : i32
-// CHECK: %[[ARGVAL_2:.*]] = llvm.icmp "ne" %[[ARRAY_ELEM]], %[[ZERO_2]] : i32
-// CHECK: %[[RES:.*]] = llvm.icmp "eq" %[[ARGVAL_2]], %[[ARGVAL_1]] : i1
-// CHECK: %[[RES_EXT:.*]] = llvm.zext %[[RES]] : i1 to i32
-// CHECK: llvm.store %[[RES_EXT]], %[[PRV]] : i32, !llvm.ptr
-// CHECK: omp.yield
+// CHECK: omp.wsloop reduction(@[[EQV_REDUCTION]] %[[RED_ACCUMULATOR]] -> %[[PRV:.+]] : !llvm.ptr) {
+// CHECK-NEXT: omp.loop_nest
+// CHECK: %[[ARRAY_ELEM_REF:.*]] = llvm.getelementptr %[[ARRAY_REF]][0, %{{.*}}] : (!llvm.ptr, i64) -> !llvm.ptr
+// CHECK: %[[ARRAY_ELEM:.*]] = llvm.load %[[ARRAY_ELEM_REF]] : !llvm.ptr -> i32
+// CHECK: %[[LPRV:.+]] = llvm.load %[[PRV]] : !llvm.ptr -> i32
+// CHECK: %[[ZERO_1:.*]] = llvm.mlir.constant(0 : i64) : i32
+// CHECK: %[[ARGVAL_1:.*]] = llvm.icmp "ne" %[[LPRV]], %[[ZERO_1]] : i32
+// CHECK: %[[ZERO_2:.*]] = llvm.mlir.constant(0 : i64) : i32
+// CHECK: %[[ARGVAL_2:.*]] = llvm.icmp "ne" %[[ARRAY_ELEM]], %[[ZERO_2]] : i32
+// CHECK: %[[RES:.*]] = llvm.icmp "eq" %[[ARGVAL_2]], %[[ARGVAL_1]] : i1
+// CHECK: %[[RES_EXT:.*]] = llvm.zext %[[RES]] : i1 to i32
+// CHECK: llvm.store %[[RES_EXT]], %[[PRV]] : i32, !llvm.ptr
+// CHECK: omp.yield
+// CHECK: omp.terminator
// CHECK: omp.terminator
// CHECK: llvm.return
@@ -747,21 +758,24 @@ func.func @_QPsimple_reduction(%arg0: !fir.ref<!fir.array<100x!fir.logical<4>>>
%c1_i32 = arith.constant 1 : i32
%c100_i32 = arith.constant 100 : i32
%c1_i32_0 = arith.constant 1 : i32
- omp.wsloop reduction(@eqv_reduction %1 -> %prv : !fir.ref<!fir.logical<4>>) for (%arg1) : i32 = (%c1_i32) to (%c100_i32) inclusive step (%c1_i32_0) {
- fir.store %arg1 to %3 : !fir.ref<i32>
- %4 = fir.load %3 : !fir.ref<i32>
- %5 = fir.convert %4 : (i32) -> i64
- %c1_i64 = arith.constant 1 : i64
- %6 = arith.subi %5, %c1_i64 : i64
- %7 = fir.coordinate_of %arg0, %6 : (!fir.ref<!fir.array<100x!fir.logical<4>>>, i64) -> !fir.ref<!fir.logical<4>>
- %8 = fir.load %7 : !fir.ref<!fir.logical<4>>
- %lprv = fir.load %prv : !fir.ref<!fir.logical<4>>
- %lprv1 = fir.convert %lprv : (!fir.logical<...
[truncated]
|
I'm planning to split this into a PR stack after landing #87365, since it's too large to review. However, only the last commit of the stack will compile and pass tests, so they all would have to land simultaneously. I'm open to suggestions on how to best achieve this. |
Thanks a lot for the effort to split this into smaller patches. Very appreciated. |
This patch updates the definition of
omp.wsloop
to enforce the restrictions of a wrapper operation. Given the widespread use of this operation, the changes introduced in this patch are several:omp.wsloop
, as well as parser/printer, builder and verifier.omp.ordered.region
,omp.cancel
andomp.cancellation_point
to correctly check for a parentomp.wsloop
.omp.wsloop
to keep working after the change in representation. Another patch should be created to reduce the current code duplication betweenomp.wsloop
andomp.simd
after introducing a commonomp.loop_nest
operation.scf.parallel
lowering pass to OpenMP to produce the new expected representation.omp.wsloop
representation changes, including changes tolastprivate
, andreduction
handling to avoid adding operations into a wrapper and attach entry block arguments to the right operation.