-
Notifications
You must be signed in to change notification settings - Fork 11.6k
server: rename legacy --ctx-size to --kv-size option #5546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: rename legacy --ctx-size to --kv-size option #5546
Conversation
Gatekeeping intensifies.. |
@pugzly Agreed , server |
Taking into account @pugzly feedback, @ggerganov I am wondering if this change should also be aplied to the whole code base. For example, |
I'm new to llama.cpp but the behavior of this option was immediately obvious from the first-run output, that the "context size" gets divided by the number of slots and thus that it's not exactly a context size, but rather the space allocated for context. I'm just a N=1 datapoint, but I think the confusion could be corrected simply by updating the docs of the server example to deal with parallel slots and the need for raising the ctx_size to slots*ctx_size -- or the code could multiply it itself, while treating the ctx_size as if it's only for a single slot. |
Yes, eventually |
OK reverted to draft PR, I will give it a try. |
cd99def
to
c8e172a
Compare
@ggerganov I have tried, but I have the feeling that it's a bigbang change and I am not confident to be the one to bring it to master. Even if I spent some time on it, please feel free to simply close the PR, otherwise I will add necessary changes you request. |
No worries, I've moved the changes to #5568 in order to run |
That is great, but there's is 1 year worth of guides, tutorials, and applications built around and on top of llama.cpp, many of which may or may not be rendered obsolete, due to this, to the most part, "aesthetic" change. |
Don't worry - when and if the change is applied, there will be deprecation notices. Plus it's actually a tiny API change (see |
Context
--ctx-size
is a legacy name before introduction of parallelism slots and creates confusion (see discussion #4130).Proposed changes
Introduce
--kv-size
option and deprecate--ctx-size
one.@ggerganov Thanks for the amazing job you are doing here, hope this small contribution will help.