Skip to content

[Compiler plugin] join operations support #1139

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

koperagen
Copy link
Collaborator

@koperagen koperagen commented Apr 22, 2025

In first commit i adapted ColumnMatch and ColumnList to embed them into compiler plugin column resolving mechanism
New util functions designed to make sure all column names generated by compiler plugin in join are exactly as in runtime, and make sure there are no missing columns. They can however sometimes have nullability where runtime narrows the type to non-nullable

@koperagen koperagen added the Compiler plugin Anything related to the DataFrame Compiler Plugin label Apr 22, 2025
@koperagen koperagen added this to the 1.0.0-Beta1 (0.16) milestone Apr 22, 2025
@koperagen koperagen self-assigned this Apr 22, 2025
@koperagen koperagen changed the title [Compiler plugin join support [Compiler plugin] join operations support Apr 22, 2025
@koperagen koperagen force-pushed the compiler-plugin-join-support branch from ee9ea15 to 2a17323 Compare April 22, 2025 12:02
@koperagen koperagen force-pushed the compiler-plugin-join-support branch from 2a17323 to bccd01e Compare April 22, 2025 16:43
Copy link
Collaborator

@Jolanrensen Jolanrensen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice :)

// This checks that schemas have same set of columns, but compile time columns can be nullable where runtime is narrowed to non-nullable

sealed interface Mismatch
data class AcceptableMismatch(val path: ColumnPath, val compile: KType, val runtime: KType) : Mismatch
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably the nullability fact could be somehow reflected in naming, like NullabilityMismatch or something like this

"Charlie", 30, "Moscow", 90,
)

val typed2 = dataFrameOf("name", "origin", "grade", "age")(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we need here one test for nested tables and their joins


internal data class ColumnMatchApproximation(val left: ColumnsResolver, val right: ColumnsResolver)
internal data class ColumnMatchApproximation(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you give me a hint, what does Approximation mean. I see a lot in naming, but could not understand the idea

}

internal class ColumnListImpl<C>(override val columns: List<ColumnsResolver<C>>) :
ColumnSet<C>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow, formatting looks weird to me

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this is the only way the linter allows it

Copy link
Collaborator

@zaleslaw zaleslaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation is clear for me, but I'm not sure that all required test paths are covered, could you please answer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compiler plugin Anything related to the DataFrame Compiler Plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants