Skip to content

nullable=True is ignored if primary_key is True #13

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
gab23r opened this issue Apr 24, 2025 · 5 comments · May be fixed by #14
Open

nullable=True is ignored if primary_key is True #13

gab23r opened this issue Apr 24, 2025 · 5 comments · May be fixed by #14

Comments

@gab23r
Copy link

gab23r commented Apr 24, 2025

The following snippet fails, where it should pass.

class A(dy.Schema):
    a = dy.String(primary_key=True, nullable=True)

A.validate(pl.DataFrame({"a" : [None]}, schema={"a": pl.String}))
# RuleValidationError: 1 rules failed validation:
#  * Column 'a' failed validation for 1 rules:
#    - 'nullability' failed for 1 rows
@gab23r
Copy link
Author

gab23r commented Apr 24, 2025

I see that this is actually documented in the Column abstract class. primary_key: ... If ``True``, ``nullable`` is automatically set to ``False``.

I don't really understand why a primary column couldn't be nullable after all ? A better solution could be that nullable default to None in the Column.__init__ signature and then nullable takes the value True if primary_key is False and False if primary_key is True if not provided by the caller.

@gab23r
Copy link
Author

gab23r commented Apr 24, 2025

I can propose a PR shortly

@gab23r gab23r linked a pull request Apr 24, 2025 that will close this issue
@borchero
Copy link
Member

borchero commented Apr 24, 2025

Initially, we took this decision to align with (MS?)SQL where primary keys cannot be nullable. But you are right, there is no reason to block this in the realm of data frames. I like the proposed solution to flip the default of nullable based on the value of primary_key 🚀

See the PR for additional comments 😄

@AndreasAlbertQC
Copy link
Collaborator

Beyond the scope of the PR: I think we should probably consider nullable to be False by default also for non-primary key columns. I think we inherited the fact the fact that it defaults to True from thinking about SQL

@gab23r
Copy link
Author

gab23r commented Apr 24, 2025

I agree with that, being strict by default on a validation library makes sense to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants