Skip to content

Refactor contiguity inference #5677

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Oct 17, 2024
Merged

Conversation

mzient
Copy link
Contributor

@mzient mzient commented Oct 14, 2024

Category:

Refactoring (Redesign of existing code that doesn't affect functionality)

Description:

  • Rename CanInferOutputs to HasContiguousOutputs, because that's how
    it's used
  • Change the default value to true
  • Add (much fewer) implementations returning false
  • Add checks to updating TL from samples (used in SampleWorkspace)
  • Add opportunistic coalescing mode to MakeContiguous
  • Set all MakeContiguous nodes to opportunistic in new executor

Additional information:

Affected modules and functionalities:

  • OperatorBase
  • Most operators
  • TensorList
  • MakeContiguous
  • New executor - graph analyzer

Key points relevant for the review:

Tests:

The change is mostly refactoring + performance, functionality is not affected.

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: N/A

- Rename CanInferOutputs to HasContiguousOutputs, because that's how
  it's used
- Change the default value to true
- Add (much fewer) implementations returning false
- Add checks to updating TL from samples (used in SampleWorkspace)
- Add opportunistic coalescing mode to MakeContiguous
- Set all MakeContiguous nodes to opportunistic in new executor

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
@mzient mzient force-pushed the refactor_contiguity_inference branch from 3237d82 to 4a78929 Compare October 14, 2024 20:22
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [19342799]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [19342809]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [19342799]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [19342809]: BUILD FAILED

- rename unsafe_raw_data to contiguous_raw_data
- fix as_array: use IsContiguousInMemory instead of IsContiguous when
  obtaining contiguous buffer (as done in AsReshapedTensor)

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
@klecki klecki self-assigned this Oct 15, 2024
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [19361820]: BUILD STARTED

@@ -816,20 +816,32 @@ class DLL_PUBLIC TensorList {
* @brief Return an un-typed pointer to the underlying storage.
* The TensorList must be either empty or have a valid type and be contiguous.
*/
friend void *unsafe_raw_mutable_data(TensorList<Backend> &batch) {
DALI_ENFORCE(batch.IsContiguous(), "Data pointer can be obtain only for contiguous batch.");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was actually a bug, since we checked IsDenseTensor and then used unsafe_raw_mutable_data in as_array Python function.

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [19362121]: BUILD STARTED

// create new aliasing pointer to current data allocation, so we share the use count
// and the deleter correctly.
if (batch.IsContiguous()) {
return {batch.contiguous_buffer_.get_data_ptr(), batch.raw_mutable_tensor(sample_idx)};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no point in doing that - the samples are already created this way.

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [19362564]: BUILD STARTED

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [19362867]: BUILD STARTED

Copy link
Contributor

@klecki klecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok for the intended purpose, but I think we should think if we should think about splitting the allocation request and shape/type inference as well.

Also: does the empty output_desc mean that we don't return anything or we just didn't fill it? (I know, this is a bit nitpicky).

I think we should also add some validation in Run/Executor.

@@ -112,11 +115,10 @@ class DLL_PUBLIC OperatorBase {
virtual void RunImpl(Workspace &ws) = 0;

/**
* @brief If Operator can infer the output shapes it means that its output would use a single
* underlying allocation, especially for CPU TensorList will use contiguous mode.
* @brief If true (default), the operator's output will be stored
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing part of a comment + suggestion.

Suggested change
* @brief If true (default), the operator's output will be stored
* @brief If true (default), the operator's output will be stored as a contiguous buffer.
Note: this should happen regardless of whether the operator or executor allocates the output.

Also, currently false doesn't indicate the non-contiguity of the operator output, we can mention that as well.

Comment on lines +41 to 42
bool HasContiguousOutputs() const override {
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This operator is pass through - this means that op either ensures contiguous outputs or we don't know. Would we find it useful to differentiate this as:

  • always contiguous
  • based on the input
  • never contiguous?

I see that the old executor will still follow up through the pass-through.

* @param output_desc describe the shape and type of the outputs (for the whole batch)
* @param ws
* @return true iff the operator specified the output shape and type
* @return Whether the caller should provide buffers for the outputs.
*/
virtual bool SetupImpl(std::vector<OutputDesc> &output_desc, const Workspace &ws) = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are doing this, we probably should split the request of allocation and the shape inference as well. For operators that don't want to request the output allocation most of them can still infer the output shapes based on the input shapes.
From time to time we talk about shape and type inference, and this is yet another thing independent from allocation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, but I think this can be done incrementally..

}
base_ptr += shape_[i].num_elements() * size;
}
DALI_ENFORCE(is_really_contiguous, "The tensor list isn't really contiguous as claimed.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: we should be putting some kind of internal error tag on such errors, something as above would be ok. Imo now it sounds too casual.

@@ -495,10 +495,6 @@ void Executor<WorkspacePolicy, QueuePolicy>::RunHelper(OpNode &op_node, Workspac
DALI_ENFORCE(
static_cast<size_t>(ws.NumOutput()) == output_desc.size(),
"Operator::Setup returned shape and type information for mismatched number of outputs");
DALI_ENFORCE(op.CanInferOutputs(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a sanity check after the Run to verify if indeed the operator that claims HasContiguousOutputs didn't break the promise and cause the internal error?

Copy link
Contributor Author

@mzient mzient Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well..... that's something where a build with asserts would be nice. IsContiguousInMemory isn't exactly free and we check it anyway when we need the memory to be contiguous.
I'll add an assert.

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [19362867]: BUILD PASSED

- improved documentation and error messages
- add a debug check that operators that claim to return contiguous
  outputs actually do so

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [19398469]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [19398469]: BUILD PASSED

@mzient mzient merged commit 1b51e15 into NVIDIA:main Oct 17, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants