Skip to content

server: rename legacy --ctx-size to --kv-size option #5546

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

phymbert
Copy link
Collaborator

Context
--ctx-size is a legacy name before introduction of parallelism slots and creates confusion (see discussion #4130).

Proposed changes
Introduce --kv-size option and deprecate --ctx-size one.

@ggerganov Thanks for the amazing job you are doing here, hope this small contribution will help.

@pugzly
Copy link

pugzly commented Feb 18, 2024

Gatekeeping intensifies..
At this rate llama.cpp will become near-unreadable in another year or so for all the new people who weren't closely watching its development from the first line of code.
At least documentation of that flag should clearly state it's "former/renamed --ctx-size"

@phymbert
Copy link
Collaborator Author

Gatekeeping intensifies.. At this rate llama.cpp will become near-unreadable in another year or so for all the new people who weren't closely watching its development from the first line of code. At least documentation of that flag should clearly state it's "former/renamed --ctx-size"

@pugzly Agreed , server README.md updated with deprecation note.

@phymbert
Copy link
Collaborator Author

Taking into account @pugzly feedback, @ggerganov I am wondering if this change should also be aplied to the whole code base.

For example, n_ctx in llama_kv_cache_init might also be renamed to kv_size:
https://github.com/ggerganov/llama.cpp/blob/8f1be0d42f23016cb6819dbae01126699c4bd9bc/llama.cpp#L1941-L1948

@snajpa
Copy link

snajpa commented Feb 18, 2024

I'm new to llama.cpp but the behavior of this option was immediately obvious from the first-run output, that the "context size" gets divided by the number of slots and thus that it's not exactly a context size, but rather the space allocated for context. I'm just a N=1 datapoint, but I think the confusion could be corrected simply by updating the docs of the server example to deal with parallel slots and the need for raising the ctx_size to slots*ctx_size -- or the code could multiply it itself, while treating the ctx_size as if it's only for a single slot.

@ggerganov
Copy link
Member

I am wondering if this change should also be aplied to the whole code base.

Yes, eventually n_ctx / ctx_size should be changed to kv_size in the entire codebase

@phymbert phymbert marked this pull request as draft February 18, 2024 18:19
@phymbert
Copy link
Collaborator Author

I am wondering if this change should also be aplied to the whole code base.

Yes, eventually n_ctx / ctx_size should be changed to kv_size in the entire codebase

OK reverted to draft PR, I will give it a try.

@phymbert phymbert force-pushed the fix/server-better-kv-size-naming-params branch from cd99def to c8e172a Compare February 18, 2024 19:59
@phymbert
Copy link
Collaborator Author

@ggerganov I have tried, but I have the feeling that it's a bigbang change and I am not confident to be the one to bring it to master. Even if I spent some time on it, please feel free to simply close the PR, otherwise I will add necessary changes you request.

@ggerganov
Copy link
Member

No worries, I've moved the changes to #5568 in order to run ggml-ci on it. Will think about if it is worth merging or not. Thanks

@ggerganov ggerganov closed this Feb 18, 2024
@pugzly
Copy link

pugzly commented Feb 18, 2024

I'm new to llama.cpp but the behavior of this option was immediately obvious from the first-run output, that the "context size" gets divided by the number of slots and thus that it's not exactly a context size, but rather the space allocated for context. I'm just a N=1 datapoint, but I think the confusion could be corrected simply by updating the docs of the server example to deal with parallel slots and the need for raising the ctx_size to slots*ctx_size -- or the code could multiply it itself, while treating the ctx_size as if it's only for a single slot.

That is great, but there's is 1 year worth of guides, tutorials, and applications built around and on top of llama.cpp, many of which may or may not be rendered obsolete, due to this, to the most part, "aesthetic" change.
I'm not saying it should not be done, but if done, ideally it should be well documented and with plenty of warning time.

@ggerganov
Copy link
Member

Don't worry - when and if the change is applied, there will be deprecation notices. Plus it's actually a tiny API change (see llama.h), so it's very easy to adapt.

@phymbert phymbert deleted the fix/server-better-kv-size-naming-params branch March 16, 2024 17:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants