Misc. bug: n_probs is not working with llama.cpp server #10733

henryclw · 2024-12-09T07:10:40Z

Name and Version

build: 4291 (ce8784b) with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu

Docker image name: ggerganov/llama.cpp:server-cuda
Docker image hash: sha256:8fa3ccfdcd21874c8a8b257b6bf6abf10070d612e00394b477ec124bd56f2d12

Operating systems

No response

Which llama.cpp modules do you know to be affected?

llama-server

Problem description & steps to reproduce

Started the server with no speculative decoding.

curl --request POST \
     --url http://localhost:8080/completion \
     --header "Content-Type: application/json" \
     --data '{"prompt": "Why is the sky is blue?", "n_probs": 10}'

The output doesn't contain completion_probabilities, which it should

First Bad Commit

HINT:

For docker image server-cuda-b4274, n_probs is working as expected
For docker image server-cuda-b4277, n_probs is not working

Relevant log output

No response

The text was updated successfully, but these errors were encountered:

henryclw · 2024-12-09T16:49:56Z

I just found out this bug was introduced at 6c5bc06
@ngxson Hi, do you mind giving this a look if you have a minute?

thkodin · 2024-12-10T03:03:28Z

Adding that the same issue occurs on /chat/completions. I realized because I shifted from a local build (b3912) to a dockerized version running the latest one (b4291 at the time of this writing). Same observation as @henryclw, server-cuda-b4274 works, and b4277 does not produce the same output.

Basically, if you were to pass logprobs=True and top_logprobs=5 in the OAI-like chat request, e.g:

curl http://localhost:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer sk-1234" \
    -d '{"model": "gpt-4", "messages": [{"role": "system", "content": "You are an expert nutritionist."}, {"role": "user", "content": "What are mangoes? Respond very briefly."}], "logprobs": true, "top_logprobs": 2}'

You'd get (truncated here to the first two tokens, followed by the last token):

{"choices":[{"finish_reason":"stop","index":0,"message":{"content":"Mangoes are a tropical fruit from the genus Mangifera, specifically the species Mangifera indica.","role":"assistant"}}],"created":1733789118,"model":"gpt-4","object":"chat.completion","usage":{"completion_tokens":23,"prompt_tokens":40,"total_tokens":63},"id":"chatcmpl-oiyl71CKlD1C3X3MqofQAI7HlIrSOQAx","completion_probabilities":[{"content":"M","probs":[{"tok_str":"M","prob":1.0},{"tok_str":"A","prob":0.0}]},{"content":"ango","probs":[{"tok_str":"ango","prob":1.0},{"tok_str":"ang","prob":0.0}]},{"content":"<|im_end|>","probs":[{"tok_str":"<|im_end|>","prob":1.0},{"tok_str":" They","prob":0.0}]}]}

Now, on b4277 (also on b4291 image), with the same POST, it returns the following output:

{"choices":[{"finish_reason":"stop","index":0,"message":{"content":"Mangoes are a type of tropical fruit.","role":"assistant"}}],"created":1733792861,"model":"gpt-4","object":"chat.completion","usage":{"completion_tokens":11,"prompt_tokens":40,"total_tokens":51},"id":"chatcmpl-acfFHInMS3Ha3Hvb72e1KTgkRdAYasBj","timings":{"prompt_n":40,"prompt_ms":9609.985,"prompt_per_token_ms":240.249625,"prompt_per_second":4.162337402191574,"predicted_n":11,"predicted_ms":9075.829,"predicted_per_token_ms":825.0753636363636,"predicted_per_second":1.2120104951294257}}

This is not technically the OpenAI compatible probabilities (which is logprobs and not probabilities), so it might be intentional as part of the shift? I think this because in 6c5bc06 within server.cpp, it looks like probs_output which seems to be the struct representing the completion probabilities is not present in any of the to_json functions for OpenAI compatible responses, though I am nowhere near adept at C++ so I am probably wrong.

ngxson · 2024-12-11T09:04:49Z

@thkodin The probs_output has been removed from /chat/completion because it's not openai-compat. The idea of the commit that you mentioned is to completely separate the to_json to 2 versions: non-oai-compat and oai-compat.

On the bright side, this new structure allow adding llama_token_probs::to_json_oai_compat very easily. In fact, I planned to do that this week. Token probs is needed for benchmarking quality (i.e. calculating perplexity)

StoyanStAtanasov · 2024-12-12T17:20:40Z

I'm here also because trying to calc perplexity. Waiting for the merge!

StoyanStAtanasov · 2024-12-13T14:58:22Z

Some resources on logprobs usefulness: https://cookbook.openai.com/examples/using_logprobs https://www.comet.com/site/blog/perplexity-for-llm-evaluation/

henryclw · 2024-12-15T18:38:09Z

@thkodin The probs_output has been removed from /chat/completion because it's not openai-compat. The idea of the commit that you mentioned is to completely separate the to_json to 2 versions: non-oai-compat and oai-compat.

On the bright side, this new structure allow adding llama_token_probs::to_json_oai_compat very easily. In fact, I planned to do that this week. Token probs is needed for benchmarking quality (i.e. calculating perplexity)

Hi, thank you for the kind reply. I really love the idea of splitting the API into non-OpenAI format and the OpenAI format. On one hand, the OpenAI format could easily cooperate with other projects. But the non-OpenAI format is the one that we could do research on.

And yes, as others have mentioned, the logprobs is really useful in some cases.

henryclw added the bug-unconfirmed label Dec 9, 2024

ngxson mentioned this issue Dec 11, 2024

server : fix logprobs, make it OAI-compatible #10783

Merged

ngxson closed this as completed in #10783 Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Misc. bug: n_probs is not working with llama.cpp server #10733

Misc. bug: n_probs is not working with llama.cpp server #10733

henryclw commented Dec 9, 2024

henryclw commented Dec 9, 2024

thkodin commented Dec 10, 2024 •

edited

Loading

ngxson commented Dec 11, 2024

StoyanStAtanasov commented Dec 12, 2024

StoyanStAtanasov commented Dec 13, 2024

henryclw commented Dec 15, 2024

Misc. bug: n_probs is not working with llama.cpp server #10733

Misc. bug: n_probs is not working with llama.cpp server #10733

Comments

henryclw commented Dec 9, 2024

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Problem description & steps to reproduce

First Bad Commit

Relevant log output

henryclw commented Dec 9, 2024

thkodin commented Dec 10, 2024 • edited Loading

ngxson commented Dec 11, 2024

StoyanStAtanasov commented Dec 12, 2024

StoyanStAtanasov commented Dec 13, 2024

henryclw commented Dec 15, 2024

thkodin commented Dec 10, 2024 •

edited

Loading