Skip to content

[Bug report] Performance deterioration of LLaMA-2 model due to hardcoded rms_norm_eps #2373

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
xx205 opened this issue Jul 24, 2023 · 2 comments · Fixed by #2374
Closed

[Bug report] Performance deterioration of LLaMA-2 model due to hardcoded rms_norm_eps #2373

xx205 opened this issue Jul 24, 2023 · 2 comments · Fixed by #2374

Comments

@xx205
Copy link

xx205 commented Jul 24, 2023

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • [Yes] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • [Yes] I carefully followed the README.md.
  • [Yes] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • [Yes] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

When running converted ggml model, the eps used in RMSNorm is consistent with original model definition.

Current Behavior

The norm_eps used in RMSNorm is hardcoded to 1e-6, in all backends: X86, CUDA, Metal.
Related commit: Change RMSNorm eps to 1e-6 #173 (22213a1)

Environment and Context

Recently I want to evaluate LLaMA-1 and LLaMA-2 models on MMLU (Measuring Massive Multitask Language Understanding, https://github.com/hendrycks/test) test set, and I chose llama.cpp as the inference engine.
The performance of LLaMA-1 models are nearly the same as the paper reported, but for LLaMA-2 7B and 13B models, they just got the LLaMA-1 7B level scores.
Then I check the model definitions of LLaMA-2 7B and 13B and found the “rms_norm_eps” in config.json is 1e-5 instead of 1e-6.
After recompiling the source code with the change of eps=1-5, the test results of LLaMA-2 models are finally looking good.

Related issue:
GGML model showing noticeable quality issues when compared to HF model #2354

Affected discussions:
LLaMA-2 Perplexities #2352
Presentation on llama.cpp on 25.07.2023 at karlsruhe.ai #2281

@slaren
Copy link
Member

slaren commented Jul 24, 2023

Thanks for the report, I am working on a fix for this.

@klosax
Copy link
Contributor

klosax commented Jul 24, 2023

Perplexity change on LLaMA2-7b by changing epsilon to 1e-5:
6.006 --> 5.918

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants