-
Notifications
You must be signed in to change notification settings - Fork 634
Refactor contiguity inference #5677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Rename CanInferOutputs to HasContiguousOutputs, because that's how it's used - Change the default value to true - Add (much fewer) implementations returning false - Add checks to updating TL from samples (used in SampleWorkspace) - Add opportunistic coalescing mode to MakeContiguous - Set all MakeContiguous nodes to opportunistic in new executor Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
3237d82
to
4a78929
Compare
CI MESSAGE: [19342799]: BUILD STARTED |
CI MESSAGE: [19342809]: BUILD STARTED |
CI MESSAGE: [19342799]: BUILD FAILED |
CI MESSAGE: [19342809]: BUILD FAILED |
- rename unsafe_raw_data to contiguous_raw_data - fix as_array: use IsContiguousInMemory instead of IsContiguous when obtaining contiguous buffer (as done in AsReshapedTensor) Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
CI MESSAGE: [19361820]: BUILD STARTED |
@@ -816,20 +816,32 @@ class DLL_PUBLIC TensorList { | |||
* @brief Return an un-typed pointer to the underlying storage. | |||
* The TensorList must be either empty or have a valid type and be contiguous. | |||
*/ | |||
friend void *unsafe_raw_mutable_data(TensorList<Backend> &batch) { | |||
DALI_ENFORCE(batch.IsContiguous(), "Data pointer can be obtain only for contiguous batch."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was actually a bug, since we checked IsDenseTensor
and then used unsafe_raw_mutable_data
in as_array
Python function.
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
CI MESSAGE: [19362121]: BUILD STARTED |
// create new aliasing pointer to current data allocation, so we share the use count | ||
// and the deleter correctly. | ||
if (batch.IsContiguous()) { | ||
return {batch.contiguous_buffer_.get_data_ptr(), batch.raw_mutable_tensor(sample_idx)}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no point in doing that - the samples are already created this way.
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
CI MESSAGE: [19362564]: BUILD STARTED |
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
CI MESSAGE: [19362867]: BUILD STARTED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks ok for the intended purpose, but I think we should think if we should think about splitting the allocation request and shape/type inference as well.
Also: does the empty output_desc mean that we don't return anything or we just didn't fill it? (I know, this is a bit nitpicky).
I think we should also add some validation in Run/Executor.
dali/pipeline/operator/operator.h
Outdated
@@ -112,11 +115,10 @@ class DLL_PUBLIC OperatorBase { | |||
virtual void RunImpl(Workspace &ws) = 0; | |||
|
|||
/** | |||
* @brief If Operator can infer the output shapes it means that its output would use a single | |||
* underlying allocation, especially for CPU TensorList will use contiguous mode. | |||
* @brief If true (default), the operator's output will be stored |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing part of a comment + suggestion.
* @brief If true (default), the operator's output will be stored | |
* @brief If true (default), the operator's output will be stored as a contiguous buffer. | |
Note: this should happen regardless of whether the operator or executor allocates the output. |
Also, currently false
doesn't indicate the non-contiguity of the operator output, we can mention that as well.
bool HasContiguousOutputs() const override { | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This operator is pass through - this means that op either ensures contiguous outputs or we don't know. Would we find it useful to differentiate this as:
- always contiguous
- based on the input
- never contiguous?
I see that the old executor will still follow up through the pass-through.
* @param output_desc describe the shape and type of the outputs (for the whole batch) | ||
* @param ws | ||
* @return true iff the operator specified the output shape and type | ||
* @return Whether the caller should provide buffers for the outputs. | ||
*/ | ||
virtual bool SetupImpl(std::vector<OutputDesc> &output_desc, const Workspace &ws) = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are doing this, we probably should split the request of allocation and the shape inference as well. For operators that don't want to request the output allocation most of them can still infer the output shapes based on the input shapes.
From time to time we talk about shape and type inference, and this is yet another thing independent from allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true, but I think this can be done incrementally..
dali/pipeline/data/tensor_list.cc
Outdated
} | ||
base_ptr += shape_[i].num_elements() * size; | ||
} | ||
DALI_ENFORCE(is_really_contiguous, "The tensor list isn't really contiguous as claimed."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: we should be putting some kind of internal error tag on such errors, something as above would be ok. Imo now it sounds too casual.
@@ -495,10 +495,6 @@ void Executor<WorkspacePolicy, QueuePolicy>::RunHelper(OpNode &op_node, Workspac | |||
DALI_ENFORCE( | |||
static_cast<size_t>(ws.NumOutput()) == output_desc.size(), | |||
"Operator::Setup returned shape and type information for mismatched number of outputs"); | |||
DALI_ENFORCE(op.CanInferOutputs(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a sanity check after the Run to verify if indeed the operator that claims HasContiguousOutputs
didn't break the promise and cause the internal error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well..... that's something where a build with asserts would be nice. IsContiguousInMemory
isn't exactly free and we check it anyway when we need the memory to be contiguous.
I'll add an assert
.
CI MESSAGE: [19362867]: BUILD PASSED |
- improved documentation and error messages - add a debug check that operators that claim to return contiguous outputs actually do so Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
CI MESSAGE: [19398469]: BUILD STARTED |
CI MESSAGE: [19398469]: BUILD PASSED |
Category:
Refactoring (Redesign of existing code that doesn't affect functionality)
Description:
it's used
Additional information:
Affected modules and functionalities:
Key points relevant for the review:
Tests:
The change is mostly refactoring + performance, functionality is not affected.
Checklist
Documentation
DALI team only
Requirements
REQ IDs: N/A
JIRA TASK: N/A