-
Notifications
You must be signed in to change notification settings - Fork 11.6k
sampling: add Top-nσ sampler #11223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sampling: add Top-nσ sampler #11223
Conversation
Thank you for this implementation! Top-nσ is definitely special and needs a lot of testing. I like the results so far, especially since high temperature is not a problem, as shown in the paper, and I'm going to test it more and see what its limitations are. |
Sorry for the ping, but what is needed so that this approved PR gets merged? I like the new feature very much as it allows the benefits of a higher temperature without the usual drawbacks. |
@hdu-hh I agree! This would be a cool feature to have but I don't think it's a priority for the maintainers at the moment. |
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* initial sampling changes: * completed top nsigma sampler implementation * apply parameter to only llama-cli * updated readme * added tests and fixed nsigma impl * cleaned up pr * format * format * format * removed commented tests * cleanup pr and remove explicit floats * added top-k sampler to improve performance * changed sigma to float * fixed string format to float * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update common/sampling.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * added llama_sampler_init --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* initial sampling changes: * completed top nsigma sampler implementation * apply parameter to only llama-cli * updated readme * added tests and fixed nsigma impl * cleaned up pr * format * format * format * removed commented tests * cleanup pr and remove explicit floats * added top-k sampler to improve performance * changed sigma to float * fixed string format to float * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update common/sampling.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * added llama_sampler_init --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* initial sampling changes: * completed top nsigma sampler implementation * apply parameter to only llama-cli * updated readme * added tests and fixed nsigma impl * cleaned up pr * format * format * format * removed commented tests * cleanup pr and remove explicit floats * added top-k sampler to improve performance * changed sigma to float * fixed string format to float * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update common/sampling.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * added llama_sampler_init --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* initial sampling changes: * completed top nsigma sampler implementation * apply parameter to only llama-cli * updated readme * added tests and fixed nsigma impl * cleaned up pr * format * format * format * removed commented tests * cleanup pr and remove explicit floats * added top-k sampler to improve performance * changed sigma to float * fixed string format to float * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update common/sampling.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * added llama_sampler_init --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
- update libllama to match `llama.cpp/include/llama.h@71e90e8813f90097701e62f7fce137d96ddf41e2` - add top-n-sigma sampler (ref: [llama.cpp#11223](ggml-org/llama.cpp#11223)) - add missing newlines to Gemma3 prompt format - add new `Llama.n_head_kv()`, `Llama.bpw()`, `Llama.warmup()` - add sampler parameter info the the sampler chain string - adopt new sampler parameter defaults (temp=1.0, min_p=0.1, all others neutral) - improved logic for applying the penalties sampler - warming-up the model now does a single token decode as well as a full batch decode - fix `Llama.chat_template()` - fix return type annotations for some token methods of Llama (`int` -> `Optional[int]`) - remove old sampler presets - continue working on new server + webui
Top-nσ: Not All Logits Are You Need
https://arxiv.org/pdf/2411.07641
The authors of this paper propose a new sampling method known as Top-nσ. The main feature of this sampler is that "unlike existing methods (e.g., top-p, min-p) that inadvertently include more noise tokens at higher temperatures, top-nσ maintains a stable sampling space regardless of temperature scaling". They discovered that logits natually separate into a gaussian-distributed noisy region and an informative region.
This PR implements the sampling method proposed in the paper. Here the algorithm implemented from the paper:

Since the manipulation is done directly on the logits pre-softmax, I added it as a stand-alone sampler instead of chaining it with the common samplers. The changes only add support for
llama-cli
.sampler chain: logits -> logit-bias -> temp -> top-n-sigma -> dist
I'm aware that this algorithm is still in it's early phases so we could tag this as demo for now but I'll leave that choice up to the maintainers.
resolves #11057
Relavent Links:
https://huggingface.co/papers/2411.07641
https://arxiv.org/pdf/2411.07641
https://github.com/Tomorrowdawn/top_nsigma
#11057