Compiling LLAMA for cuda is single threaded #311

B3none · 2024-09-18T16:30:44Z

I ran npx --no node-llama-cpp download --cuda and it takes a seriously long time to compile, seemingly because it's only running on one thread.

Is there anything I can do to speed it up?

The text was updated successfully, but these errors were encountered:

giladgd · 2024-09-18T19:09:01Z

It takes a long time due to many template compilations for inference performance optimizations.
Nothing can be done to shorten it other than removing support for some GGUF file formats (which is undesirable outside of the development of llama.cpp itself), and the compilation times mainly depend on your hardware.
It will only get slower over time due to increasing support of new features and model architectures in llama.cpp.

I recommend you to switch to version 3 beta, which ships with prebuilt binaries that you can use without compiling anything.

giladgd closed this as completed Sep 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compiling LLAMA for cuda is single threaded #311

Compiling LLAMA for cuda is single threaded #311

B3none commented Sep 18, 2024

giladgd commented Sep 18, 2024

Compiling LLAMA for cuda is single threaded #311

Compiling LLAMA for cuda is single threaded #311

Comments

B3none commented Sep 18, 2024

giladgd commented Sep 18, 2024