Skip to content

Commit 8bed11f

Browse files
vllmellmfrieda-huang
authored andcommitted
[Bugfix] Triton FA function takes no keyword arguments (vllm-project#16902)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Signed-off-by: Frieda (Jingying) Huang <jingyingfhuang@gmail.com>
1 parent e606eed commit 8bed11f

File tree

1 file changed

+8
-1
lines changed

1 file changed

+8
-1
lines changed

vllm/attention/backends/mla/common.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1091,7 +1091,14 @@ def _flash_attn_varlen_diff_headdims(self, q, k, v, softmax_scale,
10911091
q,
10921092
k,
10931093
maybe_padded_v,
1094-
**kwargs,
1094+
None, # output
1095+
kwargs["cu_seqlens_q"],
1096+
kwargs["cu_seqlens_k"],
1097+
kwargs["max_seqlen_q"],
1098+
kwargs["max_seqlen_k"],
1099+
kwargs["causal"],
1100+
softmax_scale,
1101+
None, # bias
10951102
)
10961103
if is_vllm_fa:
10971104
attn_out = self.flash_attn_varlen_func(

0 commit comments

Comments
 (0)