[Bug]: Cannot found `flash_attn_interface` after adding stub files #17246

tywuAMD · 2025-04-27T06:08:46Z

Your current environment

The output of `python collect_env.py`

Your output of `python collect_env.py` here

🐛 Describe the bug

The following backtrace showing that flash_attn_interface cannot be found was observed after #17228 got merged:

Traceback (most recent call last):
  File "/mnt/vllm/benchmarks/./ds.py", line 3, in <module>
    llm = LLM(model="/mnt/model/DeepSeek-R1/DeepSeek-R1-UD-Q2_K_XL.gguf",
  File "/mnt/vllm/vllm/utils.py", line 1161, in inner
    return fn(*args, **kwargs)
  File "/mnt/vllm/vllm/entrypoints/llm.py", line 247, in __init__
    self.llm_engine = LLMEngine.from_engine_args(
  File "/mnt/vllm/vllm/engine/llm_engine.py", line 516, in from_engine_args
    return engine_cls.from_vllm_config(
  File "/mnt/vllm/vllm/engine/llm_engine.py", line 492, in from_vllm_config
    return cls(
  File "/mnt/vllm/vllm/engine/llm_engine.py", line 281, in __init__
    self.model_executor = executor_class(vllm_config=vllm_config, )
  File "/mnt/vllm/vllm/executor/executor_base.py", line 286, in __init__
    super().__init__(*args, **kwargs)
  File "/mnt/vllm/vllm/executor/executor_base.py", line 52, in __init__
    self._init_executor()
  File "/mnt/vllm/vllm/executor/mp_distributed_executor.py", line 123, in _init_executor
    self._run_workers("init_worker", all_kwargs)
  File "/mnt/vllm/vllm/executor/mp_distributed_executor.py", line 185, in _run_workers
    driver_worker_output = run_method(self.driver_worker, sent_method,
  File "/mnt/vllm/vllm/utils.py", line 2456, in run_method
    return func(*args, **kwargs)
  File "/mnt/vllm/vllm/worker/worker_base.py", line 594, in init_worker
    self.worker = worker_class(**kwargs)
  File "/mnt/vllm/vllm/worker/worker.py", line 82, in __init__
    self.model_runner: GPUModelRunnerBase = ModelRunnerClass(
  File "/mnt/vllm/vllm/worker/model_runner.py", line 1071, in __init__
    self.attn_backend = get_attn_backend(
  File "/mnt/vllm/vllm/attention/selector.py", line 95, in get_attn_backend
    return _cached_get_attn_backend(
  File "/mnt/vllm/vllm/attention/selector.py", line 148, in _cached_get_attn_backend
    attention_cls = current_platform.get_attn_backend_cls(
  File "/mnt/vllm/vllm/platforms/rocm.py", line 145, in get_attn_backend_cls
    from vllm.attention.backends.rocm_aiter_mla import (
  File "/mnt/vllm/vllm/attention/backends/rocm_aiter_mla.py", line 11, in <module>
    from vllm.attention.backends.mla.common import (MLACommonBackend,
  File "/mnt/vllm/vllm/attention/backends/mla/common.py", line 217, in <module>
    from vllm.vllm_flash_attn.fa_utils import get_flash_attn_version
  File "/mnt/vllm/vllm/vllm_flash_attn/__init__.py", line 11, in <module>
    from .flash_attn_interface import (fa_version_unsupported_reason,
ModuleNotFoundError: No module named 'vllm.vllm_flash_attn.flash_attn_interface'

This error message cannot be reproduced after rewinding back to the previous commit dc2ceca

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

DarkLight1337 · 2025-04-27T07:59:54Z

cc @aarnphm

DarkLight1337 · 2025-04-27T09:06:56Z

Do you still get this error after re-building vLLM?

jeejeelee · 2025-04-27T09:09:39Z

#17247 is not related to this issue.

DarkLight1337 · 2025-04-27T09:10:32Z

Oh, sorry. Let me unlink that PR then

aarnphm · 2025-04-27T11:51:38Z

Did you install vllm from source?

Because we copy all *.py in both route:

install from source:

vllm/setup.py

Line 268 in 30215ca

def run(self):
install with VLLM_USE_PRECOMPILED:

vllm/setup.py

Line 332 in 30215ca

def run(self) -> None:

aarnphm · 2025-04-27T11:55:50Z

Can you also give a quick rundown of the ds.py file as well?

After installing with VLLM_USE_PRECOMPILED:

And compiled from source:

tywuAMD · 2025-04-28T02:53:28Z

Thank you for following up, @aarnphm.
I built vLLM from source. With the hints you provided, I think it is because I was running with the ROCm stack on an AMD GPU, and the vllm_flash_attn only gets built for CUDA:

vllm/CMakeLists.txt

Lines 715 to 719 in cb3f2d8

    
           # For CUDA we also build and ship some external projects. 
        
           if (VLLM_GPU_LANG STREQUAL "CUDA") 
        
               include(cmake/external_projects/flashmla.cmake) 
        
               include(cmake/external_projects/vllm_flash_attn.cmake) 
        
           endif ()

There's no binary under vllm/vllm_flash_attn after the local compilation for ROCm.

ll vllm/vllm_flash_attn/
total 28
drwxr-xr-x  3 root root 4096 Apr 28 02:55 ./
drwxr-xr-x 32 root root 4096 Apr 27 05:43 ../
-rw-r--r--  1 root root    0 Mar 27 01:16 .gitkeep
-rw-r--r--  1 root root  884 Apr 28 02:55 __init__.py
drwxr-xr-x  2 root root 4096 Apr 27 05:44 __pycache__/
-rw-r--r--  1 root root 2018 Apr  2 08:46 fa_utils.py
-rw-r--r--  1 root root 7994 Apr 28 02:55 flash_attn_interface.pyi

aarnphm · 2025-04-28T06:14:59Z

ok, can you try with the latest change on main?

tywuAMD · 2025-04-28T06:51:20Z

Just did a quick verification and confirmed this issue has been resolved by #17267. Thank you very much and I will close this issue.

tywuAMD added the bug Something isn't working label Apr 27, 2025

tywuAMD changed the title ~~[Bug]: Cannot found flash_attn_interface after adding the stub files~~ [Bug]: Cannot found flash_attn_interface after adding stub files Apr 27, 2025

DarkLight1337 mentioned this issue Apr 27, 2025

[Bugfix] Fix vllm_flash_attn rotary import #17247

Closed

DarkLight1337 mentioned this issue Apr 27, 2025

[Chore] ignore override default __init__.py when building from source #17260

Closed

tywuAMD closed this as completed Apr 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Cannot found `flash_attn_interface` after adding stub files #17246

[Bug]: Cannot found `flash_attn_interface` after adding stub files #17246

tywuAMD commented Apr 27, 2025

DarkLight1337 commented Apr 27, 2025

DarkLight1337 commented Apr 27, 2025

jeejeelee commented Apr 27, 2025

DarkLight1337 commented Apr 27, 2025

aarnphm commented Apr 27, 2025 •

edited

Loading

aarnphm commented Apr 27, 2025 •

edited

Loading

tywuAMD commented Apr 28, 2025 •

edited

Loading

aarnphm commented Apr 28, 2025

tywuAMD commented Apr 28, 2025

[Bug]: Cannot found flash_attn_interface after adding stub files #17246

[Bug]: Cannot found flash_attn_interface after adding stub files #17246

Comments

tywuAMD commented Apr 27, 2025

Your current environment

🐛 Describe the bug

Before submitting a new issue...

DarkLight1337 commented Apr 27, 2025

DarkLight1337 commented Apr 27, 2025

jeejeelee commented Apr 27, 2025

DarkLight1337 commented Apr 27, 2025

aarnphm commented Apr 27, 2025 • edited Loading

aarnphm commented Apr 27, 2025 • edited Loading

tywuAMD commented Apr 28, 2025 • edited Loading

aarnphm commented Apr 28, 2025

tywuAMD commented Apr 28, 2025

[Bug]: Cannot found `flash_attn_interface` after adding stub files #17246

[Bug]: Cannot found `flash_attn_interface` after adding stub files #17246

aarnphm commented Apr 27, 2025 •

edited

Loading

aarnphm commented Apr 27, 2025 •

edited

Loading

tywuAMD commented Apr 28, 2025 •

edited

Loading