-
Notifications
You must be signed in to change notification settings - Fork 11.9k
server
: fix tool-call of DeepSeek R1 Qwen, return reasoning_content (Command 7RB & DeepSeek R1) unless --reasoning-format none
#11607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 91 commits
Commits
Show all changes
94 commits
Select commit
Hold shift + click to select a range
d3b60b8
minja: enhance backfill of templates w/o tools description (use examp…
87de852
pass vocab to common_chat_params_init
130ca22
DeepSeek R1: parse thoughts / return in separate field in API (non st…
04d511b
Avoid double bos w/ jinja
2834587
server/oai: ensure content is null when there are tool calls
c80cb30
update logs
0871628
rename tests
73d08d4
tool-call: allow `--jinja --chat-template chatml`
04be723
tool-call: fix command-r7b parsing when response is multiline
ae9d581
tool-calls: add DeepSeek R1 Qwen 7B to server test_hello_world
19bea4e
tell DS R1 not to overthink (weather test)
5e6f2a2
add deepseek models to server tool call section in readme
1e9acd2
tool-call: allow `--jinja --chat-template chatml`
77ae97e
Update test_tool_call.py
a76073c
minimize diffs
cf83623
fix typo
5d18d76
fix double bos issue (drop bos/eos tokens from jinja template)
aa98e59
fix bad merge
2b3c482
fix build / rm diff
4cb0e1d
Merge branch 'jinja-chatml' into r1-toolcall
b2dd490
add missing try catch around jinja parsing to default to chatml
08271b5
Merge branch 'jinja-chatml' into r1-toolcall
df3474e
tool-calls: r1: add missing <|tool▁calls▁end|> to grammar!
c397bd1
tweak delta logic
569610e
tool-calls: accommodate variety of wrong tool call opening tags both …
d73448d
Simplify default chatml logic
0be7f65
Merge branch 'jinja-chatml' into r1-toolcall
7dc271f
tool-calls: add deepseek r1 template + accommodate broken official te…
c6214ee
rm unneeded vocab
1c302e1
simpler hacky fixes for original broken template (+ fix minja example…
108da90
sync: minja https://github.com/google/minja/pull/46
bc6d910
Merge branch 'master' into r1-toolcall
11c1f0c
actually we want eos_token in the template to infer tool call example…
30ea359
update to minja's new api
bbd45bf
sync: minja
bff549d
simplify hack to fix original template's backfill from minja
ce28224
tool-call: r1: add one more trigger approx "<|tool calls begin|>"
e84ee88
r1: fix inadvertent newline in grammar before <|tool▁call▁end|>
18a11f4
tool-call: r1: fix grammar
9a6847c
move trigger_words init inside non-llguidance branch
a682d12
fix / test parsing of r1 parser
f0154a6
Fix / test models/templates/llama-cpp-deepseek-r1.jinja
326e700
update test_calc_result
78b47bb
fix test_calc_result
86994db
fix spaces
09caa63
`sync`: minja
b152729
Update test-chat.cpp
56a14dd
fix mistral chat test: need empty tokens
f12e350
Update chat.cpp
d43e4f6
Merge branch 'sync-minja-4' into r1-toolcall
812544a
server: check that content is null when we get tool_calls
d44eb95
tool-call: ensure we don't return content when there are tool calls /…
b6e14a4
fix mistral expectation
1f5ec59
ensure deepseek r1 thoughts parsed even w/o tool calls
438ce0b
fix test-chat
21f2071
Update chat.cpp
b5b117f
Merge branch 'sync-minja-4' into r1-toolcall
0db9881
Fix r1 grammar since we made <|tool▁calls▁begin|> optional (triggerin…
d1b6691
r1: revert making <|tool▁calls▁begin|> optional as somehow sampling t…
39c1d81
return thoughts in reasoning_content field
b2d1728
update readme section about common model tool call formats
933f7a1
Merge branch 'master' into r1-toolcall
5d60ceb
Update test_tool_call.py
1f1f06a
Merge branch 'master' into r1-toolcall
ochafik 9d7c3cc
--think to force any model to return reasoning_content (or just parse…
d20c2ce
Merge branch 'r1-toolcall' of github.com:ochafik/llama.cpp into r1-to…
f3e9f8b
fix test_thoughts
3841a16
fix compiler warning about parens
e6d9b52
align Command R7B w/ --think / reasoning_content behaviour
39b50c3
Update README.md
0917e0a
fix --think arg env
098629d
disable some failing chatml tests
33efcb3
Update README.md
994301d
use existing string_strip
d1a0640
revert tool example backfill change - command 7rb just needs the righ…
cc2c712
Merge remote-tracking branch 'origin/master' into r1-toolcall
c0f972b
Use --reasoning-format, remove forced thinking for now
af63886
return reasoning_content before content
a59fde2
update model template / format mapping
b829cab
fix test-chat
95cddfd
rm thoughts from generic parser
e598e7a
sync: minja (https://github.com/google/minja/pull/52)
91542ca
tool-calls: allow r1 output to miss <think> opening tag (since latest…
8d82be9
sync: minja (https://github.com/ggerganov/llama.cpp/pull/11774)
30dcfaa
rm wrong warning in command-r parser (when normal text)
e1bff8f
update deepseek r1 templates (+ put update commands in ./scripts/get_…
a29dc92
fix server test_tool_calls.py
ea2f41e
add models/templates/README.md
8409bf1
fix test_calc_result & test_thoughts
01db429
fix test-chat (update delta to latest r1 template change)
37a4bb2
Merge remote-tracking branch 'origin/master' into r1-toolcall
d52579a
prefer json::at to operator[] in chat.cpp
4700245
Merge remote-tracking branch 'origin/master' into r1-toolcall
043cb99
Apply suggestions from code review
ochafik File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO the
--think
flag is not very intuitive, it should be something like--reasoning-format
, but personally I still prefer to do it per-request basis.Also, for future-proof, we should make this flag accepts a param. For example,
--reasoning-format deepseek
will return it asreasoning_content
. Again, this is because we are pretty sure that openai will break the whole thing in the near future.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a usability standpoint,
--think
feels a bit more intuitive / memorable to me about what it does.Other alternatives might be
--separate-thinking
or--extract-reasoning
or--format-reasoning
or...?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, what non-intuitive about it is that the model will still think even with --think not being added. This flag is supposed to force the model to start the reasoning process, but not to enable/disable it completely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tbh I'm not really a fan of a query parameter yet, mostly because:
Re/ flags, I think there may be room for two: one that controls the reasoning behaviour (extract, leave inline, force), and one for the return format. For the first one, how about:
--reasoning=extract
→ Parses DeepSeek R1 & Command R7B thoughts (default)--reasoning=inline
→ Leaves thoughts inline in the content (format is model-specific)--reasoning=force
→ Coerces non-thinking models to think (edit: maybeforce-experimental
for now)As for the format, there are already two APIs out there:
reasoning_content
I'd favour just sticking to
reasoning_content
for now until OpenAI announces their own format (and when they do, default to OpenAI's format and offer--reasoning-format=deepseek
for backwards compatibility). OR decide to create our own format now / default--reasoning-format=llama.cpp
that returns thoughts in themessage.thoughts
field, for instance.WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems ok for me, but eventually I think someone will gonna add this as a per-request param.
Re. having 2 flags for behavior / format, this seems more reasonable for me. From a functional programming perspective, it can be expressed as
response = format(behavior(original_generated_content))
But I think your idea is still mixed between these 2 layers.
For
--reasoning extract|inline|force
:force
is supposed to do, but seems like it needs some prompt engineering so I think you should consider that, maintaining prompts can be a burdeninline
does that means reasoning can appear middle of generation? Example,content..think..content..think
. Please lmk if I understand correctly.For
--reasoning-format
I don't get why we want to invent a new--reasoning-format llama.cpp
that put things insidemessage.thoughts
. IMO we should keep thing simple until openai drop their format. Probably we can have--reasoning-format deepseek|none
and setdeepseek
as the default for now, then change the default tooai
once we have that openai formatThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO if we want
--reasoning
to control the behavior, then it should affect the generation phrase (for example, control grammar/logits bias). So, it should 3 values:enabled
: the model behaves as usualdisabled
: we never allow model to use<think>
token ==> control via logits biasforce
: force the model to think, maybe use grammar or prompt engineering?Then for the
--reasoning-format
, it should only "transform" the result into the desired format (a.k.a a pure function), we can have 3 values:deepseek
: put content insidereasoning_content
none
: do not format, simply forward all the generated tokens to useroai
can be added in the futureThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ngxson Makes sense, removed the forced thinking from this PR / will explore again as follow up (also, see more of a case of this option as a query param, while reasoning-format now has stronger flag vibes)
Good point. In earlier experimentations I tried controlling the entire tool call process (even grammar generation) from Jinja templates, might play with this again.
Updated the code (defaulting to deepseek), thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that I’ve updated the code to the latest DeepSeek template changes (they added a trailing
<think>
😅; updated minja accordingly: #11774 (comment) )There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @ngxson, any more concerns / suggestions about this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I'm a bit busy recently, will do a review later today or tomorrow