Skip to content

[Benchmark][V1][Spec Decode][EAGLE] Tracking benchmark for V1 EAGLE #17812

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ekagra-ranjan opened this issue May 7, 2025 · 3 comments
Open
Labels
performance Performance-related issues

Comments

@ekagra-ranjan
Copy link
Contributor

ekagra-ranjan commented May 7, 2025

We have been doing perf bench on MTBench so that e2e speedup and AL are comparable with other setups and academic papers. Thanks to @luyuzhe111 and others for the discussion and helping with measuring the gaps!

llama 3 8b

During model wt loading

During KV Cache slot

llama 3.1 8b

torch compile & CUDA graph:

@ekagra-ranjan ekagra-ranjan added the performance Performance-related issues label May 7, 2025
@ekagra-ranjan ekagra-ranjan changed the title [V1][Benchmark][Spec Decode][EAGLE] Tracking benchmark done for V1 EAGLE [Benchmark][V1][Spec Decode][EAGLE] Tracking benchmark done for V1 EAGLE May 7, 2025
@ekagra-ranjan ekagra-ranjan changed the title [Benchmark][V1][Spec Decode][EAGLE] Tracking benchmark done for V1 EAGLE [Benchmark][V1][Spec Decode][EAGLE] Tracking benchmark for V1 EAGLE May 7, 2025
@wwl2755
Copy link
Contributor

wwl2755 commented May 7, 2025

Great job and thanks on collecting these!

Do you have a unified table/doc that keeps all the up-to-date benchmarking (and also the gaps with ideal condition), which I believe would be much helpful?

@wwl2755
Copy link
Contributor

wwl2755 commented May 7, 2025

And also, making the metrics-related PRs (#16367 and #17010) finialized and merged would be great.

@ekagra-ranjan
Copy link
Contributor Author

Do you have a unified table/doc that keeps all the up-to-date benchmarking (and also the gaps with ideal condition), which I believe would be much helpful?

I didnt get the time to do it. I believe the most recent comment would be the one to refer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance-related issues
Projects
None yet
Development

No branches or pull requests

2 participants