Math & Code Benchmark/Testing for GGUFs #13131

Bobchenyx · 2025-04-26T19:24:54Z

Bobchenyx
Apr 26, 2025

Are there any open source frameworks/tools that could be used to test code/math benchmark for GGUF models?

JohannesGaessler · 2025-04-27T07:58:22Z

JohannesGaessler
Apr 27, 2025
Collaborator

OpenAI I think at some point open-sourced their evaluation code which should be compatible with the llama.cpp HTTP server. I personally have started working on Elo HeLLM but it's not yet in a usable state.

0 replies

steampunque · 2025-04-27T10:18:24Z

steampunque
Apr 27, 2025

I have not open sourced my shell framework but I bench a lot of the smaller models (<32G) on categories of general, code, and math. Results page is here: https://github.com/steampunque/benchlm. At minimum this should give a good idea of many of the common evals you want to see to compare the models. Code is missing spider (SQL based competence stuff) but otherwise I think is comprehensive enough for a good comparison of category based model performance.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Math & Code Benchmark/Testing for GGUFs #13131

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Math & Code Benchmark/Testing for GGUFs #13131

Bobchenyx Apr 26, 2025

Replies: 2 comments

JohannesGaessler Apr 27, 2025 Collaborator

steampunque Apr 27, 2025

Bobchenyx
Apr 26, 2025

JohannesGaessler
Apr 27, 2025
Collaborator

steampunque
Apr 27, 2025