Skip to content

Commit 4336231

Browse files
SlyEchoardfork
andcommitted
add hipBLAS to README
--------- Co-authored-by: ardfork <134447697+ardfork@users.noreply.github.com>
1 parent f8e3fc6 commit 4336231

File tree

1 file changed

+29
-0
lines changed

1 file changed

+29
-0
lines changed

README.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -408,6 +408,35 @@ Building the program with BLAS support may lead to some performance improvements
408408
| LLAMA_CUDA_DMMV_F16 | Boolean | false | If enabled, use half-precision floating point arithmetic for the CUDA dequantization + mul mat vec kernels. Can improve performance on relatively recent GPUs. |
409409
| LLAMA_CUDA_KQUANTS_ITER | 1 or 2 | 2 | Number of values processed per iteration and per CUDA thread for Q2_K and Q6_K quantization formats. Setting this value to 1 can improve performance for slow GPUs. |
410410

411+
- #### hipBLAS
412+
413+
This provide BLAS acceleation on HIP supported GPU like AMD GPU.
414+
Make sure to have ROCm installed.
415+
You can download it from your Linux distro's package manager or from here: [ROCm Quick Start (Linux)](https://rocm.docs.amd.com/en/latest/deploy/linux/quick_start.html).
416+
Windows support is coming soon...
417+
418+
- Using `make`:
419+
```bash
420+
make LLAMA_HIPBLAS=1
421+
```
422+
- Using `CMake`:
423+
```bash
424+
mkdir build
425+
cd build
426+
CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ cmake .. -DLLAMA_HIPBLAS=ON
427+
cmake --build .
428+
```
429+
430+
The environment variable [`HIP_VISIBLE_DEVICES`](https://rocm.docs.amd.com/en/latest/understand/gpu_isolation.html#hip-visible-devices) can be used to specify which GPU(s) will be used.
431+
If your GPU is not officialy supported you can use the environment variable [`HSA_OVERRIDE_GFX_VERSION`] set to a similar GPU, for example 10.3.0 on RDNA2 or 11.0.0 on RDNA3.
432+
The following compilation options are also available to tweak performance (yes, they refer to CUDA, not HIP, because it uses the same code as the cuBLAS version above):
433+
434+
| Option | Legal values | Default | Description |
435+
|-------------------------|------------------------|---------|-------------|
436+
| LLAMA_CUDA_DMMV_X | Positive integer >= 32 | 32 | Number of values in x direction processed by the CUDA dequantization + matrix vector multiplication kernel per iteration. Increasing this value can improve performance on fast GPUs. Power of 2 heavily recommended. Does not affect k-quants. |
437+
| LLAMA_CUDA_MMV_Y | Positive integer | 1 | Block size in y direction for the CUDA mul mat vec kernels. Increasing this value can improve performance on fast GPUs. Power of 2 recommended. Does not affect k-quants. |
438+
| LLAMA_CUDA_KQUANTS_ITER | 1 or 2 | 2 | Number of values processed per iteration and per CUDA thread for Q2_K and Q6_K quantization formats. Setting this value to 1 can improve performance for slow GPUs. |
439+
411440
- #### CLBlast
412441
413442
OpenCL acceleration is provided by the matrix multiplication kernels from the [CLBlast](https://github.com/CNugteren/CLBlast) project and custom kernels for ggml that can generate tokens on the GPU.

0 commit comments

Comments
 (0)