Skip to content

Misc. bug: llama fails to run on older x86 hardware. #12866

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kraxel opened this issue Apr 10, 2025 · 1 comment · Fixed by #12871
Closed

Misc. bug: llama fails to run on older x86 hardware. #12866

kraxel opened this issue Apr 10, 2025 · 1 comment · Fixed by #12871

Comments

@kraxel
Copy link

kraxel commented Apr 10, 2025

Name and Version

using latest docker image
build: 5097 (fe5b78c) with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-cli

Command line

ramalama pull tiny
podman run --rm --pull=newer --device nvidia.com/gpu=all --volume /home/kraxel/.local/share/ramalama:/models ghcr.io/ggml-org/llama.cpp:full-cuda --run -m /models/models/ollama/tinyllama:latest -p hello

Problem description & steps to reproduce

podman run --rm --pull=newer --device nvidia.com/gpu=all --volume /home/kraxel/.local/share/ramalama:/models ghcr.io/ggml-org/llama.cpp:full-cuda --run -m /models/models/ollama/tinyllama:latest -p hello
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices: 
  Device 0: NVIDIA GeForce GTX 1060 6GB, compute capability 6.1, VMM: yes
load_backend: loaded CUDA backend from /app/libggml-cuda.so
[ ... ]
load_tensors: loading model tensors, this can take a while... (mmap = true)

Finds the GPU, loads the model, then just stops.

The linux kernel logs a segfault:

[ 2408.935610] llama-cli[3154]: segfault at 78 ip 00007f92766fe4d4 sp 00007fff1bbe0f78 error 4 in libggml-base.so[284d4,7f92766e7000+63000] likely on CPU 3 (core 3, socket 0)
[ 2408.935673] Code: 84 00 00 00 00 00 f3 0f 1e fa 66 0f ef c0 48 c7 46 20 00 00 00 00 0f 11 06 0f 11 46 10 ff 67 20 66 0f 1f 44 00 00 f3 0f 1e fa <48> 8b 47 78 c3 0f 1f 80 00 00 00 00 f3 0f 1e fa ff 67 28 66 0f 1f

The CPU is older and has no AVX vector instructions. Apparently llama uses AVX without checking beforehand the CPU actually supports these instructions.

Ran into this with ramalama first, see containers/ramalama#1145, where I see the same behavior but a slightly different kernel error message:

[ 1767.875857] traps: llama-run[2356] trap invalid opcode ip:7ff5c06812ac sp:7ffc2d06c4e0 error:0 in libggml-cpu.so[3a2ac,7ff5c064f000+60000]

First Bad Commit

No response

Relevant log output

@ericcurtin
Copy link
Collaborator

Note RamaLama build llama.cpp with:

-DGGML_NATIVE=OFF

also

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants