Skip to content

Commit 5b0368a

Browse files
maleksan85SageMoorerootAleksandr Malyshevqli88
authored andcommitted
[Kernel][ROCM] Upstream prefix prefill speed up for vLLM V1 (vllm-project#13305)
Signed-off-by: Sage Moore <sage@neuralmagic.com> Signed-off-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu> Signed-off-by: Aleksandr Malyshev <maleksan@amd.com> Signed-off-by: root <root@banff-cyxtera-s65-4.amd.com> Signed-off-by: maleksan85 <maleksan@amd.com> Signed-off-by: <> Co-authored-by: Sage Moore <sage@neuralmagic.com> Co-authored-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu> Co-authored-by: Aleksandr Malyshev <maleksan@amd.com> Co-authored-by: qli88 <qiang.li2@amd.com> Co-authored-by: root <root@banff-cyxtera-s65-4.amd.com> Signed-off-by: Frieda (Jingying) Huang <jingyingfhuang@gmail.com>
1 parent cb5161e commit 5b0368a

File tree

2 files changed

+824
-816
lines changed

2 files changed

+824
-816
lines changed

tests/core/block/e2e/test_correctness.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -195,15 +195,15 @@ def test_lookahead_greedy_equality_with_preemption(baseline_llm_generator,
195195
])
196196
@pytest.mark.parametrize("per_test_common_llm_kwargs",
197197
[{
198-
"block_size": 8,
198+
"block_size": 16,
199199
"max_num_batched_tokens": 2,
200200
"max_num_seqs": 2,
201201
}, {
202-
"block_size": 8,
202+
"block_size": 16,
203203
"max_num_batched_tokens": 3,
204204
"max_num_seqs": 2,
205205
}, {
206-
"block_size": 8,
206+
"block_size": 16,
207207
"max_num_batched_tokens": 256,
208208
"max_num_seqs": 10,
209209
}])

0 commit comments

Comments
 (0)