Skip to content

New OpGraph #5485

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Jun 3, 2024
Merged

New OpGraph #5485

merged 18 commits into from
Jun 3, 2024

Conversation

mzient
Copy link
Contributor

@mzient mzient commented May 27, 2024

Category:

New feature (non-breaking change which adds functionality)
Refactoring (Redesign of existing code that doesn't affect functionality)

Description:

This change moves the old OpGraph to the executor directory and creates a new, simple graph with much less semantic assumptions involved.

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: DALI-3971

@mzient mzient marked this pull request as ready for review May 27, 2024 16:26
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [15353368]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [15353368]: BUILD FAILED

* The string is `const` because it's view is used as a key.
*/
const std::string instance_name;
/** The specification of an operato */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/** The specification of an operato */
/** The specification of an operator */

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

mzient and others added 10 commits May 28, 2024 10:11
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
mzient added 2 commits May 28, 2024 16:08
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Comment on lines +46 to +49
/** A visit marker for various graph processing algorithms. */
mutable bool visited = false;
/** A visit marker for cycle detection. */
mutable bool visit_pending = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how I feel about those, especially if there is no encapsulation (all in public interface).
I am not an expert in designing graph implementations, but shouldn't the processing metadata be separated from the graph representation/information? Especially if we may end up with some processing step that requires other metadata, we probably don't want to add it all here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know.... on the other hand, there's no way to implement it that's efficient and the elegance of the alternatives is also questionable. I know we're not writing a compiler and the size of the graph is going to be small to moderate (thousands of nodes, not millions), but even then looking up helper structures for extremely common type of metadata, such as visit markers, seems silly.
Another option:

union {
    void *meta_ptr;
    uint64_t meta_int;
    uint8_t meta_bytes[8];
};

If an algorithm needs more then 8 bytes, it can create a list of metadata items and use the meta_ptr to associate it with the nodes.

mzient added 2 commits May 29, 2024 14:19
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
explicit Visit(Node *n) : node_(n) {
if (node_->visit_pending)
throw std::logic_error("Cycle detected.");
node_->visit_pending = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we clear visit_pending before entering the visiting phase? I only see that we clear visited markers, as there is no encapsulation, I think we should clear all at the start.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

visit_pending is cleared by the Visit helper. Perhaps I could move it to the header to make it available to other graph algorithms in the future.

public:
explicit Visit(Node *n) : node_(n) {
if (node_->visit_pending)
throw std::logic_error("Cycle detected.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side question, are we considering loops? With AutoGraph it's doable to possibly have cycles with for or while loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that's not supported and was expressly stated as a non-goal for the new executor. It's perhaps just my opinion, but I see diminishing returns in supporting graphs with backend-side flow control.

OpGraph g = std::move(b).GetGraph();

auto &nodes = g.OpNodes();
ASSERT_EQ(nodes.size(), 3_uz) << "The nodes were nod added properly - abandoning test";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ASSERT_EQ(nodes.size(), 3_uz) << "The nodes were nod added properly - abandoning test";
ASSERT_EQ(nodes.size(), 3_uz) << "The nodes were not added properly - abandoning test";

mzient added 2 commits May 31, 2024 17:09
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Comment on lines 188 to 189
graph_.op_nodes_.splice(graph_.op_nodes_.end(), out_ops);
graph_.data_nodes_.splice(graph_.data_nodes_.end(), out_data);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note, you didn't traverse the nodes that are not reachable from the outputs so they may be listed here in non-topological order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. Perhaps I'll just traverse everything when not pruning and then skip this splicing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [15522865]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [15522865]: BUILD PASSED

@mzient mzient merged commit 08c8f05 into NVIDIA:main Jun 3, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants