Replies: 2 comments
-
OpenAI I think at some point open-sourced their evaluation code which should be compatible with the llama.cpp HTTP server. I personally have started working on Elo HeLLM but it's not yet in a usable state. |
Beta Was this translation helpful? Give feedback.
-
I have not open sourced my shell framework but I bench a lot of the smaller models (<32G) on categories of general, code, and math. Results page is here: https://github.com/steampunque/benchlm. At minimum this should give a good idea of many of the common evals you want to see to compare the models. Code is missing spider (SQL based competence stuff) but otherwise I think is comprehensive enough for a good comparison of category based model performance. |
Beta Was this translation helpful? Give feedback.
-
Are there any open source
frameworks/tools
that could be used to testcode/math
benchmark for GGUF models?Beta Was this translation helpful? Give feedback.
All reactions