You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have been doing perf bench on MTBench so that e2e speedup and AL are comparable with other setups and academic papers. Thanks to @luyuzhe111 and others for the discussion and helping with measuring the gaps!
ekagra-ranjan
changed the title
[V1][Benchmark][Spec Decode][EAGLE] Tracking benchmark done for V1 EAGLE
[Benchmark][V1][Spec Decode][EAGLE] Tracking benchmark done for V1 EAGLE
May 7, 2025
ekagra-ranjan
changed the title
[Benchmark][V1][Spec Decode][EAGLE] Tracking benchmark done for V1 EAGLE
[Benchmark][V1][Spec Decode][EAGLE] Tracking benchmark for V1 EAGLE
May 7, 2025
Do you have a unified table/doc that keeps all the up-to-date benchmarking (and also the gaps with ideal condition), which I believe would be much helpful?
Do you have a unified table/doc that keeps all the up-to-date benchmarking (and also the gaps with ideal condition), which I believe would be much helpful?
I didnt get the time to do it. I believe the most recent comment would be the one to refer.
We have been doing perf bench on MTBench so that e2e speedup and AL are comparable with other setups and academic papers. Thanks to @luyuzhe111 and others for the discussion and helping with measuring the gaps!
llama 3 8b
During model wt loading
During KV Cache slot
llama 3.1 8b
torch compile & CUDA graph:
The text was updated successfully, but these errors were encountered: