-
Notifications
You must be signed in to change notification settings - Fork 6k
Improve downloads of sharded variants #9869
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Oh we should also account for model repos without component folders. Updating. |
if not len(filename.split("/")) == 2: | ||
continue | ||
component, component_filename = filename.split("/") | ||
components_with_variant.add(component) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not len(filename.split("/")) == 2: | |
continue | |
component, component_filename = filename.split("/") | |
components_with_variant.add(component) | |
components_with_variant.add(filename.split("/")[0]) |
I think we should match the logic to find "component" between variant_filenames and non_variant_filenames
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohh I realize you actually just updated this to account for files that are not put inside any subfolders
maybe make this a in-line function find_component
or something and do the same the for f in non_variant_filenames too
for f in non_variant_filenames: | ||
variant_filename = convert_to_variant(f) | ||
if variant_filename not in usable_filenames: | ||
component = f.split("/")[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this work in case we're loading from a non-subfolder location?
@yiyixuxu Made it a bit more robust by using the existing regexes and sharded variant index files. Now we
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome!! thank you
@DN6 I think it'd be nice to also add a test for this improvement. Great work! |
* update * update * update * update --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
What does this PR do?
Our current check for variants in
variant_compatible_sibling
downloads extra non variant files. See: #9253 (comment)This is because we check if a non-variant filename has an equivalent variant filename. e.g.
transformers/model-0003-0005.safetensors
hastransformers/model-fp16-0003-0005.safetensors
in its variants. If it doesn't, we add the file to the list of allowed patterns to download.This type of check doesn't work for sharded checkpoints since variants of smaller dtypes such as
fp16
will always have a fewer number of shards. Therefore the non-variant shard file have no equivalent variant shard file.This PR:
Changes the check to see if a variant exists for a pipeline component (regardless of whether it's sharded or not). If a variant exists for the component, we don't attempt to add any more files related to that component.
Fixes # (issue)
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.