server: rename legacy --ctx-size to --kv-size option #5546

phymbert · 2024-02-17T10:56:32Z

Context
--ctx-size is a legacy name before introduction of parallelism slots and creates confusion (see discussion #4130).

Proposed changes
Introduce --kv-size option and deprecate --ctx-size one.

@ggerganov Thanks for the amazing job you are doing here, hope this small contribution will help.

pugzly · 2024-02-18T10:12:46Z

Gatekeeping intensifies..
At this rate llama.cpp will become near-unreadable in another year or so for all the new people who weren't closely watching its development from the first line of code.
At least documentation of that flag should clearly state it's "former/renamed --ctx-size"

phymbert · 2024-02-18T10:22:17Z

Gatekeeping intensifies.. At this rate llama.cpp will become near-unreadable in another year or so for all the new people who weren't closely watching its development from the first line of code. At least documentation of that flag should clearly state it's "former/renamed --ctx-size"

@pugzly Agreed , server README.md updated with deprecation note.

phymbert · 2024-02-18T10:31:14Z

Taking into account @pugzly feedback, @ggerganov I am wondering if this change should also be aplied to the whole code base.

For example, n_ctx in llama_kv_cache_init might also be renamed to kv_size:
https://github.com/ggerganov/llama.cpp/blob/8f1be0d42f23016cb6819dbae01126699c4bd9bc/llama.cpp#L1941-L1948

snajpa · 2024-02-18T11:56:08Z

I'm new to llama.cpp but the behavior of this option was immediately obvious from the first-run output, that the "context size" gets divided by the number of slots and thus that it's not exactly a context size, but rather the space allocated for context. I'm just a N=1 datapoint, but I think the confusion could be corrected simply by updating the docs of the server example to deal with parallel slots and the need for raising the ctx_size to slots*ctx_size -- or the code could multiply it itself, while treating the ctx_size as if it's only for a single slot.

ggerganov · 2024-02-18T17:25:24Z

I am wondering if this change should also be aplied to the whole code base.

Yes, eventually n_ctx / ctx_size should be changed to kv_size in the entire codebase

phymbert · 2024-02-18T18:19:49Z

I am wondering if this change should also be aplied to the whole code base.

Yes, eventually n_ctx / ctx_size should be changed to kv_size in the entire codebase

OK reverted to draft PR, I will give it a try.

phymbert · 2024-02-18T20:01:48Z

@ggerganov I have tried, but I have the feeling that it's a bigbang change and I am not confident to be the one to bring it to master. Even if I spent some time on it, please feel free to simply close the PR, otherwise I will add necessary changes you request.

ggerganov · 2024-02-18T20:16:14Z

No worries, I've moved the changes to #5568 in order to run ggml-ci on it. Will think about if it is worth merging or not. Thanks

pugzly · 2024-02-18T23:04:24Z

I'm new to llama.cpp but the behavior of this option was immediately obvious from the first-run output, that the "context size" gets divided by the number of slots and thus that it's not exactly a context size, but rather the space allocated for context. I'm just a N=1 datapoint, but I think the confusion could be corrected simply by updating the docs of the server example to deal with parallel slots and the need for raising the ctx_size to slots*ctx_size -- or the code could multiply it itself, while treating the ctx_size as if it's only for a single slot.

That is great, but there's is 1 year worth of guides, tutorials, and applications built around and on top of llama.cpp, many of which may or may not be rendered obsolete, due to this, to the most part, "aesthetic" change.
I'm not saying it should not be done, but if done, ideally it should be well documented and with plenty of warning time.

ggerganov · 2024-02-19T08:31:03Z

Don't worry - when and if the change is applied, there will be deprecation notices. Plus it's actually a tiny API change (see llama.h), so it's very easy to adapt.

phymbert added 2 commits February 18, 2024 18:49

server: rename legacy --ctx-size to --kv-size

d54696f

server: document the --ctx-size deprecation in server README.md

12addf2

phymbert marked this pull request as draft February 18, 2024 18:19

rename n_ctx to kv_size

c8e172a

phymbert force-pushed the fix/server-better-kv-size-naming-params branch from cd99def to c8e172a Compare February 18, 2024 19:59

fix some spaces added by IDE in math op

d7d4d97

ggerganov mentioned this pull request Feb 18, 2024

llama : rename n_ctx to kv_size #5568

Closed

ggerganov closed this Feb 18, 2024

phymbert deleted the fix/server-better-kv-size-naming-params branch March 16, 2024 17:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: rename legacy --ctx-size to --kv-size option #5546

server: rename legacy --ctx-size to --kv-size option #5546

phymbert commented Feb 17, 2024

pugzly commented Feb 18, 2024

phymbert commented Feb 18, 2024

phymbert commented Feb 18, 2024

snajpa commented Feb 18, 2024 •

edited

Loading

ggerganov commented Feb 18, 2024

phymbert commented Feb 18, 2024

phymbert commented Feb 18, 2024

ggerganov commented Feb 18, 2024

pugzly commented Feb 18, 2024 •

edited

Loading

ggerganov commented Feb 19, 2024

server: rename legacy --ctx-size to --kv-size option #5546

server: rename legacy --ctx-size to --kv-size option #5546

Conversation

phymbert commented Feb 17, 2024

pugzly commented Feb 18, 2024

phymbert commented Feb 18, 2024

phymbert commented Feb 18, 2024

snajpa commented Feb 18, 2024 • edited Loading

ggerganov commented Feb 18, 2024

phymbert commented Feb 18, 2024

phymbert commented Feb 18, 2024

ggerganov commented Feb 18, 2024

pugzly commented Feb 18, 2024 • edited Loading

ggerganov commented Feb 19, 2024

snajpa commented Feb 18, 2024 •

edited

Loading

pugzly commented Feb 18, 2024 •

edited

Loading