You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/server/README.md
+1
Original file line number
Diff line number
Diff line change
@@ -18,6 +18,7 @@ The project is under active development, and we are [looking for feedback and co
18
18
19
19
-`--threads N`, `-t N`: Set the number of threads to use during generation.
20
20
-`-tb N, --threads-batch N`: Set the number of threads to use during batch and prompt processing. If not specified, the number of threads will be set to the number of threads used for generation.
21
+
-`--threads-http N`: number of threads in the http server pool to process requests (default: `std::thread::hardware_concurrency()`)
21
22
-`-m FNAME`, `--model FNAME`: Specify the path to the LLaMA model file (e.g., `models/7B/ggml-model.gguf`).
22
23
-`-a ALIAS`, `--alias ALIAS`: Set an alias for the model. The alias will be returned in API responses.
23
24
-`-c N`, `--ctx-size N`: Set the size of the prompt context. The default is 512, but LLaMA models were built with a context of 2048, which will provide better results for longer input/inference. The size may differ in other models, for example, baichuan models were build with a context of 4096.
0 commit comments