-
Notifications
You must be signed in to change notification settings - Fork 11.6k
Misc. bug: n_probs is not working with llama.cpp server #10733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Adding that the same issue occurs on /chat/completions. I realized because I shifted from a local build (b3912) to a dockerized version running the latest one (b4291 at the time of this writing). Same observation as @henryclw, server-cuda-b4274 works, and b4277 does not produce the same output. Basically, if you were to pass curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{"model": "gpt-4", "messages": [{"role": "system", "content": "You are an expert nutritionist."}, {"role": "user", "content": "What are mangoes? Respond very briefly."}], "logprobs": true, "top_logprobs": 2}' You'd get (truncated here to the first two tokens, followed by the last token): {"choices":[{"finish_reason":"stop","index":0,"message":{"content":"Mangoes are a tropical fruit from the genus Mangifera, specifically the species Mangifera indica.","role":"assistant"}}],"created":1733789118,"model":"gpt-4","object":"chat.completion","usage":{"completion_tokens":23,"prompt_tokens":40,"total_tokens":63},"id":"chatcmpl-oiyl71CKlD1C3X3MqofQAI7HlIrSOQAx","completion_probabilities":[{"content":"M","probs":[{"tok_str":"M","prob":1.0},{"tok_str":"A","prob":0.0}]},{"content":"ango","probs":[{"tok_str":"ango","prob":1.0},{"tok_str":"ang","prob":0.0}]},{"content":"<|im_end|>","probs":[{"tok_str":"<|im_end|>","prob":1.0},{"tok_str":" They","prob":0.0}]}]} Now, on b4277 (also on b4291 image), with the same POST, it returns the following output: {"choices":[{"finish_reason":"stop","index":0,"message":{"content":"Mangoes are a type of tropical fruit.","role":"assistant"}}],"created":1733792861,"model":"gpt-4","object":"chat.completion","usage":{"completion_tokens":11,"prompt_tokens":40,"total_tokens":51},"id":"chatcmpl-acfFHInMS3Ha3Hvb72e1KTgkRdAYasBj","timings":{"prompt_n":40,"prompt_ms":9609.985,"prompt_per_token_ms":240.249625,"prompt_per_second":4.162337402191574,"predicted_n":11,"predicted_ms":9075.829,"predicted_per_token_ms":825.0753636363636,"predicted_per_second":1.2120104951294257}} This is not technically the OpenAI compatible probabilities (which is logprobs and not probabilities), so it might be intentional as part of the shift? I think this because in 6c5bc06 within |
@thkodin The On the bright side, this new structure allow adding |
I'm here also because trying to calc perplexity. Waiting for the merge! |
Some resources on logprobs usefulness: https://cookbook.openai.com/examples/using_logprobs https://www.comet.com/site/blog/perplexity-for-llm-evaluation/ |
Hi, thank you for the kind reply. I really love the idea of splitting the API into non-OpenAI format and the OpenAI format. On one hand, the OpenAI format could easily cooperate with other projects. But the non-OpenAI format is the one that we could do research on. And yes, as others have mentioned, the logprobs is really useful in some cases. |
Name and Version
build: 4291 (ce8784b) with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
Docker image name:
ggerganov/llama.cpp:server-cuda
Docker image hash: sha256:
8fa3ccfdcd21874c8a8b257b6bf6abf10070d612e00394b477ec124bd56f2d12
Operating systems
No response
Which llama.cpp modules do you know to be affected?
llama-server
Problem description & steps to reproduce
Started the server with no speculative decoding.
The output doesn't contain
completion_probabilities
, which it shouldFirst Bad Commit
HINT:
For docker image
server-cuda-b4274
, n_probs is working as expectedFor docker image
server-cuda-b4277
, n_probs is not workingRelevant log output
No response
The text was updated successfully, but these errors were encountered: