Skip to content

[Bug]: Cannot found flash_attn_interface after adding stub files #17246

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task done
tywuAMD opened this issue Apr 27, 2025 · 9 comments
Closed
1 task done

[Bug]: Cannot found flash_attn_interface after adding stub files #17246

tywuAMD opened this issue Apr 27, 2025 · 9 comments
Labels
bug Something isn't working

Comments

@tywuAMD
Copy link
Contributor

tywuAMD commented Apr 27, 2025

Your current environment

The output of `python collect_env.py`
Your output of `python collect_env.py` here

🐛 Describe the bug

The following backtrace showing that flash_attn_interface cannot be found was observed after #17228 got merged:

Traceback (most recent call last):
  File "/mnt/vllm/benchmarks/./ds.py", line 3, in <module>
    llm = LLM(model="/mnt/model/DeepSeek-R1/DeepSeek-R1-UD-Q2_K_XL.gguf",
  File "/mnt/vllm/vllm/utils.py", line 1161, in inner
    return fn(*args, **kwargs)
  File "/mnt/vllm/vllm/entrypoints/llm.py", line 247, in __init__
    self.llm_engine = LLMEngine.from_engine_args(
  File "/mnt/vllm/vllm/engine/llm_engine.py", line 516, in from_engine_args
    return engine_cls.from_vllm_config(
  File "/mnt/vllm/vllm/engine/llm_engine.py", line 492, in from_vllm_config
    return cls(
  File "/mnt/vllm/vllm/engine/llm_engine.py", line 281, in __init__
    self.model_executor = executor_class(vllm_config=vllm_config, )
  File "/mnt/vllm/vllm/executor/executor_base.py", line 286, in __init__
    super().__init__(*args, **kwargs)
  File "/mnt/vllm/vllm/executor/executor_base.py", line 52, in __init__
    self._init_executor()
  File "/mnt/vllm/vllm/executor/mp_distributed_executor.py", line 123, in _init_executor
    self._run_workers("init_worker", all_kwargs)
  File "/mnt/vllm/vllm/executor/mp_distributed_executor.py", line 185, in _run_workers
    driver_worker_output = run_method(self.driver_worker, sent_method,
  File "/mnt/vllm/vllm/utils.py", line 2456, in run_method
    return func(*args, **kwargs)
  File "/mnt/vllm/vllm/worker/worker_base.py", line 594, in init_worker
    self.worker = worker_class(**kwargs)
  File "/mnt/vllm/vllm/worker/worker.py", line 82, in __init__
    self.model_runner: GPUModelRunnerBase = ModelRunnerClass(
  File "/mnt/vllm/vllm/worker/model_runner.py", line 1071, in __init__
    self.attn_backend = get_attn_backend(
  File "/mnt/vllm/vllm/attention/selector.py", line 95, in get_attn_backend
    return _cached_get_attn_backend(
  File "/mnt/vllm/vllm/attention/selector.py", line 148, in _cached_get_attn_backend
    attention_cls = current_platform.get_attn_backend_cls(
  File "/mnt/vllm/vllm/platforms/rocm.py", line 145, in get_attn_backend_cls
    from vllm.attention.backends.rocm_aiter_mla import (
  File "/mnt/vllm/vllm/attention/backends/rocm_aiter_mla.py", line 11, in <module>
    from vllm.attention.backends.mla.common import (MLACommonBackend,
  File "/mnt/vllm/vllm/attention/backends/mla/common.py", line 217, in <module>
    from vllm.vllm_flash_attn.fa_utils import get_flash_attn_version
  File "/mnt/vllm/vllm/vllm_flash_attn/__init__.py", line 11, in <module>
    from .flash_attn_interface import (fa_version_unsupported_reason,
ModuleNotFoundError: No module named 'vllm.vllm_flash_attn.flash_attn_interface'

This error message cannot be reproduced after rewinding back to the previous commit dc2ceca

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@tywuAMD tywuAMD added the bug Something isn't working label Apr 27, 2025
@tywuAMD tywuAMD changed the title [Bug]: Cannot found flash_attn_interface after adding the stub files [Bug]: Cannot found flash_attn_interface after adding stub files Apr 27, 2025
@DarkLight1337
Copy link
Member

cc @aarnphm

@DarkLight1337
Copy link
Member

Do you still get this error after re-building vLLM?

@jeejeelee
Copy link
Collaborator

#17247 is not related to this issue.

@DarkLight1337
Copy link
Member

Oh, sorry. Let me unlink that PR then

@aarnphm
Copy link
Collaborator

aarnphm commented Apr 27, 2025

Did you install vllm from source?

Because we copy all *.py in both route:

@aarnphm
Copy link
Collaborator

aarnphm commented Apr 27, 2025

Can you also give a quick rundown of the ds.py file as well?

After installing with VLLM_USE_PRECOMPILED:

Image

And compiled from source:

Image

@tywuAMD
Copy link
Contributor Author

tywuAMD commented Apr 28, 2025

Thank you for following up, @aarnphm.
I built vLLM from source. With the hints you provided, I think it is because I was running with the ROCm stack on an AMD GPU, and the vllm_flash_attn only gets built for CUDA:

vllm/CMakeLists.txt

Lines 715 to 719 in cb3f2d8

# For CUDA we also build and ship some external projects.
if (VLLM_GPU_LANG STREQUAL "CUDA")
include(cmake/external_projects/flashmla.cmake)
include(cmake/external_projects/vllm_flash_attn.cmake)
endif ()

There's no binary under vllm/vllm_flash_attn after the local compilation for ROCm.

ll vllm/vllm_flash_attn/
total 28
drwxr-xr-x  3 root root 4096 Apr 28 02:55 ./
drwxr-xr-x 32 root root 4096 Apr 27 05:43 ../
-rw-r--r--  1 root root    0 Mar 27 01:16 .gitkeep
-rw-r--r--  1 root root  884 Apr 28 02:55 __init__.py
drwxr-xr-x  2 root root 4096 Apr 27 05:44 __pycache__/
-rw-r--r--  1 root root 2018 Apr  2 08:46 fa_utils.py
-rw-r--r--  1 root root 7994 Apr 28 02:55 flash_attn_interface.pyi

@aarnphm
Copy link
Collaborator

aarnphm commented Apr 28, 2025

ok, can you try with the latest change on main?

@tywuAMD
Copy link
Contributor Author

tywuAMD commented Apr 28, 2025

Just did a quick verification and confirmed this issue has been resolved by #17267. Thank you very much and I will close this issue.

@tywuAMD tywuAMD closed this as completed Apr 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants