Misc. bug: llama fails to run on older x86 hardware. #12866

kraxel · 2025-04-10T07:57:56Z

Name and Version

using latest docker image
build: 5097 (fe5b78c) with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-cli

Command line

ramalama pull tiny
podman run --rm --pull=newer --device nvidia.com/gpu=all --volume /home/kraxel/.local/share/ramalama:/models ghcr.io/ggml-org/llama.cpp:full-cuda --run -m /models/models/ollama/tinyllama:latest -p hello

Problem description & steps to reproduce

podman run --rm --pull=newer --device nvidia.com/gpu=all --volume /home/kraxel/.local/share/ramalama:/models ghcr.io/ggml-org/llama.cpp:full-cuda --run -m /models/models/ollama/tinyllama:latest -p hello
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices: 
  Device 0: NVIDIA GeForce GTX 1060 6GB, compute capability 6.1, VMM: yes
load_backend: loaded CUDA backend from /app/libggml-cuda.so
[ ... ]
load_tensors: loading model tensors, this can take a while... (mmap = true)

Finds the GPU, loads the model, then just stops.

The linux kernel logs a segfault:

[ 2408.935610] llama-cli[3154]: segfault at 78 ip 00007f92766fe4d4 sp 00007fff1bbe0f78 error 4 in libggml-base.so[284d4,7f92766e7000+63000] likely on CPU 3 (core 3, socket 0)
[ 2408.935673] Code: 84 00 00 00 00 00 f3 0f 1e fa 66 0f ef c0 48 c7 46 20 00 00 00 00 0f 11 06 0f 11 46 10 ff 67 20 66 0f 1f 44 00 00 f3 0f 1e fa <48> 8b 47 78 c3 0f 1f 80 00 00 00 00 f3 0f 1e fa ff 67 28 66 0f 1f

The CPU is older and has no AVX vector instructions. Apparently llama uses AVX without checking beforehand the CPU actually supports these instructions.

Ran into this with ramalama first, see containers/ramalama#1145, where I see the same behavior but a slightly different kernel error message:

[ 1767.875857] traps: llama-run[2356] trap invalid opcode ip:7ff5c06812ac sp:7ffc2d06c4e0 error:0 in libggml-cpu.so[3a2ac,7ff5c064f000+60000]

First Bad Commit

No response

Relevant log output

The text was updated successfully, but these errors were encountered:

ericcurtin · 2025-04-10T09:39:27Z

Note RamaLama build llama.cpp with:

-DGGML_NATIVE=OFF

also

kraxel added the bug-unconfirmed label Apr 10, 2025

marcelklehr mentioned this issue Apr 10, 2025

Cannot get a query to complete, or engage, with my GPU. Cannot locate the logs. nextcloud/llm2#62

Open

slaren mentioned this issue Apr 10, 2025

ggml : add SSE 4.2 and x64 base variant for CPUs without AVX #12871

Merged

slaren closed this as completed in #12871 Apr 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Misc. bug: llama fails to run on older x86 hardware. #12866

Misc. bug: llama fails to run on older x86 hardware. #12866

kraxel commented Apr 10, 2025 •

edited

Loading

ericcurtin commented Apr 10, 2025

Misc. bug: llama fails to run on older x86 hardware. #12866

Misc. bug: llama fails to run on older x86 hardware. #12866

Comments

kraxel commented Apr 10, 2025 • edited Loading

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

ericcurtin commented Apr 10, 2025

kraxel commented Apr 10, 2025 •

edited

Loading