-
Notifications
You must be signed in to change notification settings - Fork 11.6k
Add Jinja template support #11016
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Add Jinja template support #11016
Changes from all commits
Commits
Show all changes
47 commits
Select commit
Hold shift + click to select a range
abd274a
Copy minja from https://github.com/google/minja/commit/58f0ca6dd74bcb…
ochafik e5113e8
Add --jinja and --chat-template-file flags
ochafik 80138d9
Add missing <optional> include
ochafik 06b5159
Avoid print in get_hf_chat_template.py
ochafik ce48584
No designated initializers yet
ochafik 389d79b
Try and work around msvc++ non-macro max resolution quirk
ochafik 238b968
Update test_chat_completion.py
ochafik cb72cf1
Merge remote-tracking branch 'origin/master' into jinja
ochafik 78861a3
Wire LLM_KV_TOKENIZER_CHAT_TEMPLATE_N in llama_model_chat_template
ochafik 1aac99a
Refactor test-chat-template
ochafik 7c84ebc
Test templates w/ minja
ochafik 18f257b
Fix deprecation
ochafik 8dd4f33
Add --jinja to llama-run
ochafik c04c50e
Merge remote-tracking branch 'origin/master' into jinja
ochafik a6afb27
Update common_chat_format_example to use minja template wrapper
ochafik b4083e4
Test chat_template in e2e test
ochafik b7e2171
Update utils.py
ochafik a57bb94
Update test_chat_completion.py
ochafik 4daae0b
Update run.cpp
ochafik 1b3bb7e
Update arg.cpp
ochafik 3ed670b
Merge remote-tracking branch 'origin/master' into jinja
ochafik b75d062
Refactor common_chat_* functions to accept minja template + use_jinja…
ochafik 40db789
Merge remote-tracking branch 'origin/master' into jinja
ochafik 81c0d43
Attempt to fix linkage of LLAMA_CHATML_TEMPLATE
ochafik d5fa351
Revert LLAMA_CHATML_TEMPLATE refactor
ochafik ee1e10e
Normalize newlines in test-chat-templates for windows tests
ochafik e63520f
Forward decl minja::chat_template to avoid eager json dep
ochafik 33322e8
Flush stdout in chat template before potential crash
ochafik 5074e6f
Fix copy elision warning
ochafik fc60802
Rm unused optional include
ochafik 0e74c9d
Add missing optional include to server.cpp
ochafik e3c475c
Disable jinja test that has a cryptic windows failure
ochafik cc50356
minja: fix vigogne (https://github.com/google/minja/pull/22)
ochafik 153e852
Apply suggestions from code review
ochafik db9dd0c
Finish suggested renamings
ochafik c9e8fdd
Move chat_templates inside server_context + remove mutex
ochafik 8c84aef
Update --chat-template-file w/ recent change to --chat-template
ochafik 154bfaa
Refactor chat template validation
ochafik 099f983
Merge remote-tracking branch 'origin/master' into jinja
ochafik 54a669e
Guard against missing eos/bos tokens (null token otherwise throws in …
ochafik 8348c60
Warn against missing eos / bos tokens when jinja template references …
ochafik ee475d2
rename: common_chat_template[s]
ochafik 8a7c89e
reinstate assert on chat_templates.template_default
ochafik 8347da9
Update minja to https://github.com/google/minja/commit/b8437df626ac6c…
ochafik ff2cce5
Update minja to https://github.com/google/minja/pull/25
ochafik 9d8ebd6
Update minja from https://github.com/google/minja/pull/27
ochafik cbb9b81
rm unused optional header
ochafik File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,249 @@ | ||
/* | ||
Copyright 2024 Google LLC | ||
|
||
Use of this source code is governed by an MIT-style | ||
license that can be found in the LICENSE file or at | ||
https://opensource.org/licenses/MIT. | ||
*/ | ||
// SPDX-License-Identifier: MIT | ||
#pragma once | ||
|
||
#include "minja.hpp" | ||
#include <json.hpp> | ||
#include <string> | ||
#include <vector> | ||
|
||
using json = nlohmann::ordered_json; | ||
|
||
namespace minja { | ||
|
||
class chat_template { | ||
public: | ||
|
||
private: | ||
bool supports_tools_ = true; | ||
// Meta-Llama-3.1-8B-Instruct's template expects arguments to be an object. | ||
// Most other templates (and OpenAI's API) expect the arguments object to be stringified. | ||
bool requires_object_arguments_ = false; | ||
bool supports_system_role_ = true; | ||
bool supports_parallel_tool_calls_ = false; | ||
std::string source_; | ||
std::string bos_token_; | ||
std::string eos_token_; | ||
std::shared_ptr<minja::TemplateNode> template_root_; | ||
|
||
std::string try_render( | ||
const nlohmann::ordered_json & messages, | ||
const nlohmann::ordered_json & tools, | ||
bool add_generation_prompt, | ||
const nlohmann::ordered_json & extra_context = nlohmann::ordered_json()) const | ||
{ | ||
try { | ||
auto prompt = apply(messages, tools, add_generation_prompt, extra_context); | ||
// fprintf(stderr, "Prompt: %s\n", prompt.c_str()); | ||
return prompt; | ||
} catch (const std::exception & e) { | ||
// fprintf(stderr, "Error: %s\n", e.what()); | ||
return ""; | ||
} | ||
} | ||
|
||
public: | ||
chat_template(const std::string & source, const std::string & bos_token, const std::string & eos_token) | ||
: source_(source), bos_token_(bos_token), eos_token_(eos_token) | ||
{ | ||
template_root_ = minja::Parser::parse(source_, { | ||
/* .trim_blocks = */ true, | ||
/* .lstrip_blocks = */ true, | ||
/* .keep_trailing_newline = */ false, | ||
}); | ||
supports_tools_ = source.find("tools") != std::string::npos; | ||
|
||
auto renders_string_arguments = | ||
try_render({ | ||
{ | ||
{"role", "user"}, | ||
{"content", "Hey"} | ||
}, | ||
{ | ||
{"role", "assistant"}, | ||
{"tool_calls", json::array({ | ||
{ | ||
{"id", "call_1___"}, | ||
{"type", "function"}, | ||
{"function", { | ||
{"arguments", "{\"code\": \"print('Hello, World!')\"}"}, | ||
{"name", "ipython"}, | ||
}}, | ||
}, | ||
})}, | ||
} | ||
}, {}, false).find("{\"code\": \"print") != std::string::npos; | ||
if (!renders_string_arguments) { | ||
auto renders_object_arguments = | ||
try_render({ | ||
{ | ||
{"role", "user"}, | ||
{"content", "Hey"} | ||
}, | ||
{ | ||
{"role", "assistant"}, | ||
{"tool_calls", json::array({ | ||
{ | ||
{"id", "call_1___"}, | ||
{"type", "function"}, | ||
{"function", { | ||
{"arguments", { | ||
{"code", "print('Hello, World!')"}, | ||
}}, | ||
{"name", "ipython"}, | ||
}}, | ||
}, | ||
})}, | ||
} | ||
}, {}, false).find("{\"code\": \"print") != std::string::npos; | ||
requires_object_arguments_ = renders_object_arguments; | ||
} | ||
supports_parallel_tool_calls_ = source.find("tool_call_id") != std::string::npos; | ||
|
||
supports_system_role_ = try_render({ | ||
{{"role", "system"}, {"content", "<System Needle>"}}, | ||
{{"role", "user"}, {"content", "Hey"}} | ||
}, {}, false).find("<System Needle>") != std::string::npos; | ||
} | ||
|
||
const std::string & source() const { return source_; } | ||
const std::string & bos_token() const { return bos_token_; } | ||
const std::string & eos_token() const { return eos_token_; } | ||
bool supports_tools() const { return supports_tools_; } | ||
bool supports_parallel_tool_calls() const { return supports_parallel_tool_calls_; } | ||
|
||
std::string apply( | ||
const nlohmann::ordered_json & messages, | ||
const nlohmann::ordered_json & tools, | ||
bool add_generation_prompt, | ||
const nlohmann::ordered_json & extra_context = nlohmann::ordered_json()) const | ||
{ | ||
json actual_messages; | ||
|
||
// First, "fix" messages so they have a chance to be rendered correctly by the template | ||
|
||
if (requires_object_arguments_ || !supports_system_role_ || !supports_tools_) { | ||
actual_messages = json::array(); | ||
|
||
std::string pending_system; | ||
auto flush_sys = [&]() { | ||
if (!pending_system.empty()) { | ||
actual_messages.push_back({ | ||
{"role", "user"}, | ||
{"content", pending_system}, | ||
}); | ||
pending_system.clear(); | ||
} | ||
}; | ||
for (const auto & message_ : messages) { | ||
auto message = message_; | ||
if (!message.contains("role") || !message.contains("content")) { | ||
throw std::runtime_error("message must have 'role' and 'content' fields: " + message.dump()); | ||
} | ||
std::string role = message.at("role"); | ||
|
||
if (message.contains("tool_calls")) { | ||
if (requires_object_arguments_ || !supports_tools_) { | ||
for (auto & tool_call : message.at("tool_calls")) { | ||
if (tool_call["type"] == "function") { | ||
auto & function = tool_call.at("function"); | ||
std::string arguments = function.at("arguments"); | ||
function["arguments"] = json::parse(arguments); | ||
} | ||
} | ||
} | ||
if (!supports_tools_) { | ||
auto content = message.at("content"); | ||
auto tool_calls = json::array(); | ||
for (const auto & tool_call : message.at("tool_calls")) { | ||
if (tool_call.at("type") != "function") { | ||
continue; | ||
} | ||
const auto & function = tool_call.at("function"); | ||
auto tc = json { | ||
{"name", function.at("name")}, | ||
{"arguments", function.at("arguments")}, | ||
}; | ||
if (tool_call.contains("id")) { | ||
tc["id"] = tool_call["id"]; | ||
} | ||
tool_calls.push_back(tc); | ||
} | ||
auto obj = json { | ||
{"tool_calls", tool_calls}, | ||
}; | ||
if (!content.is_null() && content != "") { | ||
obj["content"] = content; | ||
} | ||
message["content"] = obj.dump(2); | ||
message.erase("tool_calls"); | ||
} | ||
} | ||
if (!supports_tools_ && role == "tool") { | ||
message["role"] = "user"; | ||
auto obj = json { | ||
{"tool_response", { | ||
{"tool", message.at("name")}, | ||
{"content", message.at("content")}, | ||
}}, | ||
}; | ||
if (message.contains("tool_call_id")) { | ||
obj["tool_response"]["tool_call_id"] = message.at("tool_call_id"); | ||
} | ||
message["content"] = obj.dump(2); | ||
message.erase("name"); | ||
} | ||
|
||
if (!message["content"].is_null() && !supports_system_role_) { | ||
std::string content = message.at("content"); | ||
if (role == "system") { | ||
if (!pending_system.empty()) pending_system += "\n"; | ||
pending_system += content; | ||
continue; | ||
} else { | ||
if (role == "user") { | ||
if (!pending_system.empty()) { | ||
message["content"] = pending_system + (content.empty() ? "" : "\n" + content); | ||
pending_system.clear(); | ||
} | ||
} else { | ||
flush_sys(); | ||
} | ||
} | ||
} | ||
actual_messages.push_back(message); | ||
} | ||
flush_sys(); | ||
} else { | ||
actual_messages = messages; | ||
} | ||
|
||
auto context = minja::Context::make(json({ | ||
{"messages", actual_messages}, | ||
{"add_generation_prompt", add_generation_prompt}, | ||
{"bos_token", bos_token_}, | ||
{"eos_token", eos_token_}, | ||
})); | ||
|
||
if (!tools.is_null()) { | ||
auto tools_val = minja::Value(tools); | ||
context->set("tools", tools_val); | ||
} | ||
if (!extra_context.is_null()) { | ||
for (auto & kv : extra_context.items()) { | ||
minja::Value val(kv.value()); | ||
context->set(kv.key(), val); | ||
} | ||
} | ||
|
||
return template_root_->render(context); | ||
} | ||
}; | ||
|
||
} // namespace minja |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One idea to be able to
#include "chat-template.hpp"
in main is to forward declarejson
here without#include <json.hpp>
, only define the prototype ofclass chat_template
here. Then we will need a new filechat-template.cpp
that hold the actual implementation, including#include <json.hpp>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Not sure if this even works, but we can do in another PR, just noting my idea here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping to keep minja header-only for now, but happy to explore options as follow up :-)