Skip to content

Server: fallback to chatml, add AlphaMonarch chat template #5628

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Feb 22, 2024

Conversation

ngxson
Copy link
Collaborator

@ngxson ngxson commented Feb 21, 2024

Closes #5627

Fix server crash if the template is not supported

If the template is not supported, we will now see an error in the log. The server will still work, but fallback to chatml template.

{"timestamp":1708509262,"level":"INFO","function":"main","line":2723,"message":"model loaded"}
{"timestamp":1708509262,"level":"ERROR","function":"validate_model_chat_template","line":408,"message":"The chat template comes with this model is not yet supported, falling back to chatml. This may cause the model to output suboptimal responses"}
{"timestamp":1708509262,"level":"VERBOSE","function":"start_loop","line":293,"message":"have new task"}
{"timestamp":1708509262,"level":"VERBOSE","function":"start_loop","line":308,"message":"callback_all_task_finished"}
all slots are idle and system prompt is empty, clear the KV cache
{"timestamp":1708509262,"level":"VERBOSE","function":"start_loop","line":329,"message":"wait for new task"}
{"timestamp":1708509286,"level":"VERBOSE","function":"format_chat","line":208,"message":"formatted_chat","text":"<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nhi, how are you<|im_end|>\n<|im_start|>assistant\n"}

Add new template

New template added: https://huggingface.co/mlabonne/AlphaMonarch-7B/blob/main/tokenizer_config.json#L36

The new template is also updated in the research: #5527

Example for the formatted prompt:

{"timestamp":1708510304,"level":"VERBOSE","function":"format_chat","line":208,"message":"formatted_chat","text":"system\nYou are a helpful assistant.</s>\n<s>user\nhi, how are you</s>\n<s>assistant\n"}

@ngxson ngxson requested a review from ggerganov February 21, 2024 10:14
@ngxson
Copy link
Collaborator Author

ngxson commented Feb 21, 2024

@infozzdatalabs Can you please check this out if possible? Thank you!

@infozzdatalabs
Copy link

infozzdatalabs commented Feb 21, 2024

I think it's great that it warns in case it doesn't match with any available one, until now it was trial and error without knowing it at all. It would be amazing if I could directly use a template that is not hardcoded, taking it from the model's built-in, but this solves many problems already, thanks a lot!

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for looking into this and fixing!

But there 2 things still missing IMO:

  • For each new template we must add a test in tests/test-chat-template.cpp
  • We should not validate the model's template if a custom template is provided

@infozzdatalabs
Copy link

infozzdatalabs commented Feb 21, 2024

Thanks to both of you @ngxson @ggerganov for fixing this so fast.
I have not found any information about this, if i want to use a custom template in docker with the --chat-template flag, would i have to provide the path to the jinja file or put it directly as a string? i have tried to put it in string and it did not start the server and threw the wrong argument exception.

Edit: I searched inside #5593 files and found that is a string.

@ngxson
Copy link
Collaborator Author

ngxson commented Feb 21, 2024

@ggerganov Bien vu! I've fixed that on the 2 commits above.

For the test, before, I only check if output contains a pre-defined string (I was quite lazy to deal with \n). But now it checks if the output is exactly equal to what we expect.

@infozzdatalabs Yeah you need to input a string as argument. I suspect that it doesn't work in your case because the template contains special characters that are reserved by shell language (brackets {} for example). I'll document it better in a future PR.

@ngxson ngxson requested a review from ggerganov February 21, 2024 19:50
Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome 🦙

@ggerganov ggerganov merged commit a46f507 into ggml-org:master Feb 22, 2024
jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024
…#5628)

* server: fallback to chatml

* add new chat template

* server: add AlphaMonarch to test chat template

* server: only check model template if there is no custom tmpl

* remove TODO
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
…#5628)

* server: fallback to chatml

* add new chat template

* server: add AlphaMonarch to test chat template

* server: only check model template if there is no custom tmpl

* remove TODO
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

server: Error\nvector::_M_default_append when using certain models since "llama_chat_apply_template"
3 participants