Refactor contiguity inference #5677

mzient · 2024-10-14T20:15:17Z

Category:

Refactoring (Redesign of existing code that doesn't affect functionality)

Description:

Rename CanInferOutputs to HasContiguousOutputs, because that's how
it's used
Change the default value to true
Add (much fewer) implementations returning false
Add checks to updating TL from samples (used in SampleWorkspace)
Add opportunistic coalescing mode to MakeContiguous
Set all MakeContiguous nodes to opportunistic in new executor

Additional information:

Affected modules and functionalities:

OperatorBase
Most operators
TensorList
MakeContiguous
New executor - graph analyzer

Key points relevant for the review:

Tests:

The change is mostly refactoring + performance, functionality is not affected.

Checklist

Documentation

DALI team only

Requirements

Implements new requirements
Affects existing requirements
N/A

REQ IDs: N/A

JIRA TASK: N/A

- Rename CanInferOutputs to HasContiguousOutputs, because that's how it's used - Change the default value to true - Add (much fewer) implementations returning false - Add checks to updating TL from samples (used in SampleWorkspace) - Add opportunistic coalescing mode to MakeContiguous - Set all MakeContiguous nodes to opportunistic in new executor Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

dali-automaton · 2024-10-14T20:24:35Z

CI MESSAGE: [19342799]: BUILD STARTED

dali-automaton · 2024-10-14T20:24:42Z

CI MESSAGE: [19342809]: BUILD STARTED

dali-automaton · 2024-10-15T01:17:44Z

CI MESSAGE: [19342799]: BUILD FAILED

dali-automaton · 2024-10-15T01:25:28Z

CI MESSAGE: [19342809]: BUILD FAILED

- rename unsafe_raw_data to contiguous_raw_data - fix as_array: use IsContiguousInMemory instead of IsContiguous when obtaining contiguous buffer (as done in AsReshapedTensor) Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

review-notebook-app · 2024-10-15T10:10:46Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

dali-automaton · 2024-10-15T10:12:32Z

CI MESSAGE: [19361820]: BUILD STARTED

mzient · 2024-10-15T10:19:58Z

dali/pipeline/data/tensor_list.h

@@ -816,20 +816,32 @@ class DLL_PUBLIC TensorList {
   * @brief Return an un-typed pointer to the underlying storage.
   * The TensorList must be either empty or have a valid type and be contiguous.
   */
-  friend void *unsafe_raw_mutable_data(TensorList<Backend> &batch) {
-    DALI_ENFORCE(batch.IsContiguous(), "Data pointer can be obtain only for contiguous batch.");


This was actually a bug, since we checked IsDenseTensor and then used unsafe_raw_mutable_data in as_array Python function.

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

dali-automaton · 2024-10-15T10:28:52Z

CI MESSAGE: [19362121]: BUILD STARTED

mzient · 2024-10-15T10:29:14Z

dali/pipeline/data/tensor_list.h

-    // create new aliasing pointer to current data allocation, so we share the use count
-    // and the deleter correctly.
-    if (batch.IsContiguous()) {
-      return {batch.contiguous_buffer_.get_data_ptr(), batch.raw_mutable_tensor(sample_idx)};


There's no point in doing that - the samples are already created this way.

dali/pipeline/operator/builtin/make_contiguous.h

dali/pipeline/operator/operator.h

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

dali-automaton · 2024-10-15T10:40:31Z

CI MESSAGE: [19362564]: BUILD STARTED

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

dali-automaton · 2024-10-15T10:54:16Z

CI MESSAGE: [19362867]: BUILD STARTED

klecki

Looks ok for the intended purpose, but I think we should think if we should think about splitting the allocation request and shape/type inference as well.

Also: does the empty output_desc mean that we don't return anything or we just didn't fill it? (I know, this is a bit nitpicky).

I think we should also add some validation in Run/Executor.

klecki · 2024-10-15T14:27:42Z

dali/pipeline/operator/operator.h

@@ -112,11 +115,10 @@ class DLL_PUBLIC OperatorBase {
  virtual void RunImpl(Workspace &ws) = 0;

  /**
-   * @brief If Operator can infer the output shapes it means that its output would use a single
-   * underlying allocation, especially for CPU TensorList will use contiguous mode.
+   * @brief If true (default), the operator's output will be stored


Missing part of a comment + suggestion.

Suggested change

* @brief If true (default), the operator's output will be stored

* @brief If true (default), the operator's output will be stored as a contiguous buffer.

Note: this should happen regardless of whether the operator or executor allocates the output.

Also, currently false doesn't indicate the non-contiguity of the operator output, we can mention that as well.

klecki · 2024-10-15T14:41:10Z

dali/pipeline/operator/builtin/conditional/validate_logical_expr.h

+  bool HasContiguousOutputs() const override {
    return false;


This operator is pass through - this means that op either ensures contiguous outputs or we don't know. Would we find it useful to differentiate this as:

always contiguous

based on the input

never contiguous?

I see that the old executor will still follow up through the pass-through.

klecki · 2024-10-15T18:10:05Z

dali/pipeline/operator/operator.h

   * @param output_desc describe the shape and type of the outputs (for the whole batch)
   * @param ws
-   * @return true iff the operator specified the output shape and type
+   * @return Whether the caller should provide buffers for the outputs.
   */
  virtual bool SetupImpl(std::vector<OutputDesc> &output_desc, const Workspace &ws) = 0;


If we are doing this, we probably should split the request of allocation and the shape inference as well. For operators that don't want to request the output allocation most of them can still infer the output shapes based on the input shapes.
From time to time we talk about shape and type inference, and this is yet another thing independent from allocation.

That's true, but I think this can be done incrementally..

klecki · 2024-10-15T18:17:23Z

dali/pipeline/data/tensor_list.cc

+      }
+      base_ptr += shape_[i].num_elements() * size;
+    }
+    DALI_ENFORCE(is_really_contiguous, "The tensor list isn't really contiguous as claimed.");


Nitpick: we should be putting some kind of internal error tag on such errors, something as above would be ok. Imo now it sounds too casual.

klecki · 2024-10-15T18:33:00Z

dali/pipeline/executor/executor_impl.cc

@@ -495,10 +495,6 @@ void Executor<WorkspacePolicy, QueuePolicy>::RunHelper(OpNode &op_node, Workspac
      DALI_ENFORCE(
          static_cast<size_t>(ws.NumOutput()) == output_desc.size(),
          "Operator::Setup returned shape and type information for mismatched number of outputs");
-      DALI_ENFORCE(op.CanInferOutputs(),


Should we add a sanity check after the Run to verify if indeed the operator that claims HasContiguousOutputs didn't break the promise and cause the internal error?

Well..... that's something where a build with asserts would be nice. IsContiguousInMemory isn't exactly free and we check it anyway when we need the memory to be contiguous.
I'll add an assert.

dali-automaton · 2024-10-16T00:12:34Z

CI MESSAGE: [19362867]: BUILD PASSED

- improved documentation and error messages - add a debug check that operators that claim to return contiguous outputs actually do so Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

dali-automaton · 2024-10-16T07:59:55Z

CI MESSAGE: [19398469]: BUILD STARTED

dali-automaton · 2024-10-17T01:25:41Z

CI MESSAGE: [19398469]: BUILD PASSED

mzient force-pushed the refactor_contiguity_inference branch from 3237d82 to 4a78929 Compare October 14, 2024 20:22

Rework obtaining contiguous buffer from TL:

f70d46b

- rename unsafe_raw_data to contiguous_raw_data - fix as_array: use IsContiguousInMemory instead of IsContiguous when obtaining contiguous buffer (as done in AsReshapedTensor) Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

klecki self-assigned this Oct 15, 2024

Adjust examples.

29536ec

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

dali-automaton assigned jantonguirao Oct 15, 2024

mzient commented Oct 15, 2024

View reviewed changes

Simplify getting sample shared_ptr.

2ed26b1

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

mzient commented Oct 15, 2024

View reviewed changes

jantonguirao reviewed Oct 15, 2024

View reviewed changes

dali/pipeline/operator/builtin/make_contiguous.h Outdated Show resolved Hide resolved

jantonguirao reviewed Oct 15, 2024

View reviewed changes

dali/pipeline/operator/operator.h Outdated Show resolved Hide resolved

jantonguirao approved these changes Oct 15, 2024

View reviewed changes

Fix typos in comments.

589b1fc

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

Missing renaming.

5998e41

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

klecki reviewed Oct 15, 2024

View reviewed changes

Review issues

0bc9043

- improved documentation and error messages - add a debug check that operators that claim to return contiguous outputs actually do so Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

klecki approved these changes Oct 16, 2024

View reviewed changes

mzient merged commit 1b51e15 into NVIDIA:main Oct 17, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor contiguity inference #5677

Refactor contiguity inference #5677

mzient commented Oct 14, 2024 •

edited

Loading

dali-automaton commented Oct 14, 2024

dali-automaton commented Oct 14, 2024

dali-automaton commented Oct 15, 2024

dali-automaton commented Oct 15, 2024

review-notebook-app bot commented Oct 15, 2024

dali-automaton commented Oct 15, 2024

mzient Oct 15, 2024

dali-automaton commented Oct 15, 2024

mzient Oct 15, 2024

dali-automaton commented Oct 15, 2024

dali-automaton commented Oct 15, 2024

klecki left a comment

klecki Oct 15, 2024

klecki Oct 15, 2024

klecki Oct 15, 2024

mzient Oct 16, 2024

klecki Oct 15, 2024

klecki Oct 15, 2024

mzient Oct 16, 2024 •

edited

Loading

dali-automaton commented Oct 16, 2024

dali-automaton commented Oct 16, 2024

dali-automaton commented Oct 17, 2024

-   * @brief If true (default), the operator's output will be stored
+   * @brief If true (default), the operator's output will be stored as a contiguous buffer.
+   Note: this should happen regardless of whether the operator or executor allocates the output.

Refactor contiguity inference #5677

Refactor contiguity inference #5677

Conversation

mzient commented Oct 14, 2024 • edited Loading

Category:

Description:

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

Checklist

Documentation

DALI team only

Requirements

dali-automaton commented Oct 14, 2024

dali-automaton commented Oct 14, 2024

dali-automaton commented Oct 15, 2024

dali-automaton commented Oct 15, 2024

review-notebook-app bot commented Oct 15, 2024

dali-automaton commented Oct 15, 2024

Choose a reason for hiding this comment

dali-automaton commented Oct 15, 2024

Choose a reason for hiding this comment

dali-automaton commented Oct 15, 2024

dali-automaton commented Oct 15, 2024

klecki left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mzient Oct 16, 2024 • edited Loading

Choose a reason for hiding this comment

dali-automaton commented Oct 16, 2024

dali-automaton commented Oct 16, 2024

dali-automaton commented Oct 17, 2024

mzient commented Oct 14, 2024 •

edited

Loading

mzient Oct 16, 2024 •

edited

Loading