gguf-py : decouple adding metadata from writing in GGUFWriter #7827

compilade · 2024-06-08T02:11:05Z

This is an attempt to simplify both #7499 and #6942 by slightly changing how GGUFWriter works internally.

@mofosyne, @christianazinn, I think this might be interesting for you.

Summary of changes

The only breaking change is that GGUFWriter.add_key and GGUFWriter.add_val have been replaced by GGUFWriter.add_key_value.
- add_key should never have been used alone anyway; this was error-prone.
- Only gguf-py/scripts/gguf-new-metadata.py used this, and was easy enough to adapt.
use_temp_file is now opt-in instead of opt-out. (it now defaults to False)
- It's not necessary in most cases, because Lazy tensors, and because it was otherwise unnecessarily enabled when not thinking about it, like in gguf-new-metadata.py which uses the default use_temp_file value but which already uses mmaped Numpy tensors from GGUFReader, so a temporary file is redundant when it's there by surprise.
GGUFWriter doesn't really need the output file name until when actually writing to it
- It's now possible to choose the file name at any time before writing the file header, which includes the brief time when the metadata of all included tensors is available
  - This should simplify the parameter count in convert-*.py: GGUF Naming Convention Refactor and Metadata Override Refactor #7499
GGUFWriter doesn't really need to eagerly prepare the data layout of the metadata
- It's possible to choose what to include and in what order, by modifying the tensors and kv_data dicts of a GGUFWriter instance before writing to a file.
  - These are now dict because they map tensor names to tensor info, and keys to values, respectively, and duplicates of either tensor names and keys are detected and should not happen anyway. Since Python 3.7, the iteration order of dict is guaranteed to be the same as the insertion order (and gguf-py already depends on Python >=3.8 in its pyproject.toml).
  - This should help Option to split during conversion #6942 with integrating splitting in GGUFWriter.

Testing

I already listed checksums of many models in #7234, so I'll be testing some of these same models to ensure nothing changed.

Tinyllama https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0:

$ sha256sum tinyllama-lazy-decouple.*
5af5abc9122fd3cd113b5bdd828e0f1c1737a432e3de0b80cd403a32659f07a6  tinyllama-lazy-decouple.bf16.gguf
06d3a29d46ac6d2a128a4c357544f24e311a582eb9af18134db76fe3044111e8  tinyllama-lazy-decouple.q8_0.gguf
$ sha256sum tinyllama-*-tempfile.q8_0.gguf
06d3a29d46ac6d2a128a4c357544f24e311a582eb9af18134db76fe3044111e8  tinyllama-eager-decouple-tempfile.q8_0.gguf
06d3a29d46ac6d2a128a4c357544f24e311a582eb9af18134db76fe3044111e8  tinyllama-lazy-decouple-tempfile.q8_0.gguf

Matches the older PR.

TinyMistral MoE https://huggingface.co/jtatman/TinyMistral-248m-v2.5-4x-Moe

$ sha256sum tinymistral-moe-lazy-decouple.*
d6e6e1977ffa9cc365d425c107d7a79770b9425cab9db04d695e092fedd00d72  tinymistral-moe-lazy-decouple.bf16.gguf
316c22ee97cd3717736ff7cd4ee6cabfdb811b389bc23c4d7cf071c0b514144b  tinymistral-moe-lazy-decouple.q8_0.gguf

Matches the older PR.

Since gguf-new-metadata.py was also changed, why not also test it:

Tinyllama: round-trip with --general-description "Hello world", then removing it with --remove-metadata general.description

$ sha256sum tinyllama-lazy-decouple-*
44fda7dd804d827a0e03f1bffc3d734f41426a29fe5c4359d133423a8430b809  tinyllama-lazy-decouple-description.q8_0.gguf
06d3a29d46ac6d2a128a4c357544f24e311a582eb9af18134db76fe3044111e8  tinyllama-lazy-decouple-no-description.q8_0.gguf

Matches the above

christianazinn · 2024-06-08T02:39:32Z

Thanks for notifying me here @compilade. Not sure how working on two "competing" PRs at once works - should I merge this branch into #6942 and add support, or can we merge that first and write GGUFSplitWriter support into this PR? #6942 is ready to merge whenever you give the green light, and I think I've resolved everything from your last review, so I'd lean towards the latter. The changes would be minimal anyway, just changing the override signatures slightly.

I think I already resolved most of the integration/split management nicely-ish in GGUFSplitWriter (and don't really want to integrate splitting directly into GGUFWriter for a couple reasons, primarily that GGUFWriter should only write a single file), and unfortunately this PR doesn't allow for the last ideal change - only adding KV data/tensors in their respective write_to_file functions rather than in an init_shards setup function - because both the self.tensors list and self.kv_data dict must be fully populated by the call to write_header_to_file for the correct ti/kv data counts to be written (and there's no good way around this anyway).

That being said, as a general thing, I like these changes. add_key being essentially an alias for a particular call to add_value had confused me, and re: tempfile, good change - most users will never encounter nor need this. One thing you didn't note that I might add is that delaying the naming of the GGUFWriter outfile also delays the file creation - which has some small benefits, e.g. not creating or overwriting a previous output file with a blank file if the script is stopped before it actually needs to write anything.

Also, not very familiar with the inner workings of the GGUF specification, but does it matter which order the metadata is included? In theory, you could have two identical model GGUFs that only differ in metadata ordering, where they would appear different by SHA256 checksum but are, well, identical. Is that a possibility worth concern?

compilade · 2024-06-08T03:40:27Z

Thanks for notifying me here @compilade. Not sure how working on two "competing" PRs at once works - should I merge this branch into #6942 and add support, or can we merge that first and write GGUFSplitWriter support into this PR?

@christianazinn I use git worktree in a bunch of folders :)

I'm fine with either PR being merged into master first. (merges across PRs can otherwise be weird unless well prepared)

#6942 is ready to merge whenever you give the green light, and I think I've resolved everything from your last review, so I'd lean towards the latter.

Yeah, I'll test #6942 tomorrow (it's night for me right now) and will likely approve after that. But I think it would be cleaner to refactor first to make the feature simpler, although it's not strictly necessary (the simplifications can come after too).

I think I already resolved most of the integration/split management nicely-ish in GGUFSplitWriter (and don't really want to integrate splitting directly into GGUFWriter for a couple reasons, primarily that GGUFWriter should only write a single file)

There's also another way to see GGUFWriter, that it writes a single model. But I agree it's likely cleaner to keep it to a single GGUFWriter per output file. We'll see.

Also, not very familiar with the inner workings of the GGUF specification, but does it matter which order the metadata is included?

The order of the metadata doesn't really matter, and it's currently governed by the insertion order from convert-hf-to-gguf.py. It was like this before this PR, and this behavior is kept in this PR too.

In theory, you could have two identical model GGUFs that only differ in metadata ordering, where they would appear different by SHA256 checksum but are, well, identical. Is that a possibility worth concern?

Hmm, you're right that this could sometimes be problematic, like when adding (for example) tokenizer.ggml.pre to a model which did not have it, with gguf-new-metadata.py. Might be worth it eventually to sort the keys, but it's out of scope for this PR.

mofosyne · 2024-06-08T04:05:22Z

I like this PR moving towards decoupling the writing and the metadata. Was wishing for a way to not have to keep making new functions for each metadata entry

Changing what happens when the output file is opened will be easier, since this reduces the cases to consider. * gguf-py : prevent GGUFWriter from writing all tensors multiple times It was already checked with an assertion before, but using WriterState should make the error message slightly less cryptic.

compilade · 2024-06-08T17:04:40Z

Did some further tests with 32d11db,

(using https://huggingface.co/jtatman/TinyMistral-248m-v2.5-4x-Moe)

I've tested --vocab-only to make sure it still works

$ sha256sum tinymistral-moe-*defer*
316c22ee97cd3717736ff7cd4ee6cabfdb811b389bc23c4d7cf071c0b514144b  tinymistral-moe-lazy-defer.q8_0.gguf
ff2a7bec22e7691ab4c1b9533731f20a5a44218791a9d7d7225e977f22ad085d  tinymistral-moe-lazy-defer-vocab.q8_0.gguf
$ sha256sum tinymistral-moe-master-vocab.q8_0.gguf 
ff2a7bec22e7691ab4c1b9533731f20a5a44218791a9d7d7225e977f22ad085d  tinymistral-moe-master-vocab.q8_0.gguf

(the checksum with all tensors matches the one in this PR's main post)

I've also tested to see whether convert-legacy-llama.py still works, and it does!

$ sha256sum tinymistral-moe-*legacy*
a877bbc20528db4d7c9986629cc0edbe5abe7751da454b34e163cd9313210b64  tinymistral-moe-legacy.f16.gguf
aeb30a6b3e560cb0cb24ac2793906fda57e0d6688bafdf628d21af52cbecae59  tinymistral-moe-legacy-vocab.f16.gguf
a877bbc20528db4d7c9986629cc0edbe5abe7751da454b34e163cd9313210b64  tinymistral-moe-master-legacy.f16.gguf
aeb30a6b3e560cb0cb24ac2793906fda57e0d6688bafdf628d21af52cbecae59  tinymistral-moe-master-legacy-vocab.f16.gguf

mofosyne

sanity checking before merging. Looks alright

* support splits in convert.py * Support split by size and dry run to write estimated shards/filesizes * Move split functionality to new GGUFManager class * fix improper function signature * tentative push of convert-hf-to-gguf support * resolve merge + SplitArguments for easier parsing * Fix eager tensor memory leak and remove convert.py changes Removed a memory leak caused by unexpected reference retention to eager tensors. Also removed GGUFManager functionality in convert.py in favor of specializing for convert-hf-to-gguf.py. * refactor SplitStrategy to be a deque Instead of having SplitStrategy have a `data` field that is a deque, just have SplitStrategy be a subclass of deque itself. * fix Q8 quantization * remove unnecessary imports in gguf_manager * fix final? merge issue * fix gguf_writer placement and remove comments * oops, actually fix gguf_writer placement * reduce duplicated code from gguf_writer * further simplify GGUFManager * simplify even further and standardize with GGUFWriter * reduce diffs with master * form shards while adding tensors, SHA256 sums agree with master * re-add type hint Co-authored-by: compilade <git@compilade.net> * GGUFWriter compatibility fix Co-authored-by: compilade <git@compilade.net> * Shard dataclass and un-negative dont_add_architecture * type consistency in format_n_bytes_to_str * move kv keys to constants.py * make pathlib explicit * base-1024 bytes to base-1000 * rename GGUFManager to GGUFWriterSplit * Update gguf-py/gguf/constants.py Co-authored-by: compilade <git@compilade.net> * fix convert-hf-to-gguf.py permissions * fix line endings * Update gguf-py/gguf/gguf_writer_split.py Co-authored-by: compilade <git@compilade.net> * convert-hf : restore executable file permission * examples/convert-legacy-llama.py: restore executable file permission * reinstate original gguf package import and fix type annotation * attempt to appease the linter * attempt 2 to appease the linter * attempt 3 to appease the linter * comma consistency * Update convert-hf-to-gguf.py Co-authored-by: compilade <git@compilade.net> * edit cmd line args * use simplification from #7827 * kv/ti data are still wrong * try to refactor kv data (still fails) * fix ti data messiness * tidy up * fix linting * actually make the linter happy * cleanup round 1 * remove SplitStrategy, SplitArguments * appease linter * fix typing and clean up * fix linting * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <git@compilade.net> * progress bar, fix split logic * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <git@compilade.net> * catch oversights * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <git@compilade.net> * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <git@compilade.net> * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <git@compilade.net> * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <git@compilade.net> * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <git@compilade.net> * swap bar orders * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <git@compilade.net> * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <git@compilade.net> * compatibility fix * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <git@compilade.net> * Update convert-hf-to-gguf.py Co-authored-by: compilade <git@compilade.net> --------- Co-authored-by: Brian <mofosyne@gmail.com> Co-authored-by: compilade <git@compilade.net>

* support splits in convert.py * Support split by size and dry run to write estimated shards/filesizes * Move split functionality to new GGUFManager class * fix improper function signature * tentative push of convert-hf-to-gguf support * resolve merge + SplitArguments for easier parsing * Fix eager tensor memory leak and remove convert.py changes Removed a memory leak caused by unexpected reference retention to eager tensors. Also removed GGUFManager functionality in convert.py in favor of specializing for convert-hf-to-gguf.py. * refactor SplitStrategy to be a deque Instead of having SplitStrategy have a `data` field that is a deque, just have SplitStrategy be a subclass of deque itself. * fix Q8 quantization * remove unnecessary imports in gguf_manager * fix final? merge issue * fix gguf_writer placement and remove comments * oops, actually fix gguf_writer placement * reduce duplicated code from gguf_writer * further simplify GGUFManager * simplify even further and standardize with GGUFWriter * reduce diffs with master * form shards while adding tensors, SHA256 sums agree with master * re-add type hint Co-authored-by: compilade <git@compilade.net> * GGUFWriter compatibility fix Co-authored-by: compilade <git@compilade.net> * Shard dataclass and un-negative dont_add_architecture * type consistency in format_n_bytes_to_str * move kv keys to constants.py * make pathlib explicit * base-1024 bytes to base-1000 * rename GGUFManager to GGUFWriterSplit * Update gguf-py/gguf/constants.py Co-authored-by: compilade <git@compilade.net> * fix convert-hf-to-gguf.py permissions * fix line endings * Update gguf-py/gguf/gguf_writer_split.py Co-authored-by: compilade <git@compilade.net> * convert-hf : restore executable file permission * examples/convert-legacy-llama.py: restore executable file permission * reinstate original gguf package import and fix type annotation * attempt to appease the linter * attempt 2 to appease the linter * attempt 3 to appease the linter * comma consistency * Update convert-hf-to-gguf.py Co-authored-by: compilade <git@compilade.net> * edit cmd line args * use simplification from ggml-org#7827 * kv/ti data are still wrong * try to refactor kv data (still fails) * fix ti data messiness * tidy up * fix linting * actually make the linter happy * cleanup round 1 * remove SplitStrategy, SplitArguments * appease linter * fix typing and clean up * fix linting * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <git@compilade.net> * progress bar, fix split logic * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <git@compilade.net> * catch oversights * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <git@compilade.net> * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <git@compilade.net> * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <git@compilade.net> * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <git@compilade.net> * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <git@compilade.net> * swap bar orders * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <git@compilade.net> * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <git@compilade.net> * compatibility fix * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <git@compilade.net> * Update convert-hf-to-gguf.py Co-authored-by: compilade <git@compilade.net> --------- Co-authored-by: Brian <mofosyne@gmail.com> Co-authored-by: compilade <git@compilade.net>

gguf-py : decouple adding metadata from writing in GGUFWriter

fe59f20

github-actions bot added the python python script changes label Jun 8, 2024

compilade added refactoring Refactoring Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level labels Jun 8, 2024

ggerganov approved these changes Jun 8, 2024

View reviewed changes

christianazinn mentioned this pull request Jun 8, 2024

Option to split during conversion #6942

Merged

compilade added merge ready indicates that this may be ready to merge soon and is just holding out in case of objections and removed merge ready indicates that this may be ready to merge soon and is just holding out in case of objections labels Jun 8, 2024

compilade added the merge ready indicates that this may be ready to merge soon and is just holding out in case of objections label Jun 8, 2024

mofosyne approved these changes Jun 9, 2024

View reviewed changes

mofosyne merged commit ed9f252 into master Jun 9, 2024
22 checks passed

christianazinn added a commit to christianazinn/llama.cpp that referenced this pull request Jun 9, 2024

use simplification from ggml-org#7827

03cc9bc

mofosyne deleted the compilade/gguf-py-decouple-writer-meta branch June 9, 2024 04:09

maab19 mentioned this pull request Jun 14, 2024

Bug: convert-hf-to-gguf.py on Gemma model ValueError: Duplicated key name 'tokenizer.chat_template' #7923

Closed

compilade mentioned this pull request Jul 19, 2024

convert_hf : fix Gemma v1 conversion #8597

Merged

2 tasks

compilade mentioned this pull request Apr 20, 2025

convert : experimental support for --mmproj flag #13023

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gguf-py : decouple adding metadata from writing in GGUFWriter #7827

gguf-py : decouple adding metadata from writing in GGUFWriter #7827

compilade commented Jun 8, 2024

christianazinn commented Jun 8, 2024 •

edited

Loading

compilade commented Jun 8, 2024

mofosyne commented Jun 8, 2024

compilade commented Jun 8, 2024

mofosyne left a comment

gguf-py : decouple adding metadata from writing in GGUFWriter #7827

gguf-py : decouple adding metadata from writing in GGUFWriter #7827

Conversation

compilade commented Jun 8, 2024

Summary of changes

Testing

christianazinn commented Jun 8, 2024 • edited Loading

compilade commented Jun 8, 2024

mofosyne commented Jun 8, 2024

compilade commented Jun 8, 2024

mofosyne left a comment

Choose a reason for hiding this comment

christianazinn commented Jun 8, 2024 •

edited

Loading