Skip to content

Commit ef898f1

Browse files
ochafikngxsonggerganov
authored andcommitted
Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars (ggml-org#9639)
--------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
1 parent d9fb05c commit ef898f1

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

48 files changed

+3861
-156
lines changed

.editorconfig

+8
Original file line numberDiff line numberDiff line change
@@ -40,3 +40,11 @@ indent_style = tab
4040
[examples/cvector-generator/*.txt]
4141
trim_trailing_whitespace = unset
4242
insert_final_newline = unset
43+
44+
[models/templates/*.jinja]
45+
indent_style = unset
46+
indent_size = unset
47+
end_of_line = unset
48+
charset = unset
49+
trim_trailing_whitespace = unset
50+
insert_final_newline = unset

.github/workflows/server.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -205,7 +205,7 @@ jobs:
205205
run: |
206206
cd examples/server/tests
207207
$env:PYTHONIOENCODING = ":replace"
208-
pytest -v -x
208+
pytest -v -x -m "not slow"
209209
210210
- name: Slow tests
211211
id: server_integration_tests_slow

Makefile

+9
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ TEST_TARGETS = \
5252
tests/test-arg-parser \
5353
tests/test-autorelease \
5454
tests/test-backend-ops \
55+
tests/test-chat \
5556
tests/test-chat-template \
5657
tests/test-double-float \
5758
tests/test-grammar-integration \
@@ -983,6 +984,7 @@ OBJ_COMMON = \
983984
$(DIR_COMMON)/ngram-cache.o \
984985
$(DIR_COMMON)/sampling.o \
985986
$(DIR_COMMON)/speculative.o \
987+
$(DIR_COMMON)/chat.o \
986988
$(DIR_COMMON)/build-info.o \
987989
$(DIR_COMMON)/json-schema-to-grammar.o
988990

@@ -1361,6 +1363,8 @@ llama-server: \
13611363
examples/server/httplib.h \
13621364
examples/server/index.html.hpp \
13631365
examples/server/loading.html.hpp \
1366+
common/chat.cpp \
1367+
common/chat.hpp \
13641368
common/chat-template.hpp \
13651369
common/json.hpp \
13661370
common/minja.hpp \
@@ -1471,6 +1475,11 @@ tests/test-json-schema-to-grammar: tests/test-json-schema-to-grammar.cpp \
14711475
$(CXX) $(CXXFLAGS) -Iexamples/server -c $< -o $(call GET_OBJ_FILE, $<)
14721476
$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
14731477

1478+
tests/test-chat: tests/test-chat.cpp \
1479+
$(OBJ_ALL)
1480+
$(CXX) $(CXXFLAGS) -Iexamples/server -c $< -o $(call GET_OBJ_FILE, $<)
1481+
$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
1482+
14741483
tests/test-opt: tests/test-opt.cpp \
14751484
$(OBJ_GGML)
14761485
$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)

README.md

+1
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ Inference of Meta's [LLaMA](https://arxiv.org/abs/2302.13971) model (and others)
1818

1919
- **How to use [MTLResidencySet](https://developer.apple.com/documentation/metal/mtlresidencyset?language=objc) to keep the GPU memory active?** https://github.com/ggerganov/llama.cpp/pull/11427
2020
- **VS Code extension for FIM completions:** https://github.com/ggml-org/llama.vscode
21+
- Universal tool call support in `llama-server`: https://github.com/ggerganov/llama.cpp/pull/9639
2122
- Vim/Neovim plugin for FIM completions: https://github.com/ggml-org/llama.vim
2223
- Introducing GGUF-my-LoRA https://github.com/ggerganov/llama.cpp/discussions/10123
2324
- Hugging Face Inference Endpoints now support GGUF out of the box! https://github.com/ggerganov/llama.cpp/discussions/9669

common/CMakeLists.txt

+2
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,8 @@ add_library(${TARGET} STATIC
5656
arg.cpp
5757
arg.h
5858
base64.hpp
59+
chat.cpp
60+
chat.hpp
5961
chat-template.hpp
6062
common.cpp
6163
common.h

0 commit comments

Comments
 (0)