-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Extend the Llama-Nemotron-Nano-8B perf-integration-tests (cpp) #4195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend the Llama-Nemotron-Nano-8B perf-integration-tests (cpp) #4195
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR extends the performance integration tests for llama_v3.1_nemotron_nano_8b so that the test cases reflect the updated NIM benchmarking configurations.
- Removed previous torch and TRT backend tests
- Added 32 new test scenarios covering both C++ (cpp) and Python (pyt) backends with various precision, batch, and sequence length configurations
I'm locally seeing issue with the above configurations when using pyt backend, hence limiting this PR to only cpp backend. Don't want to check-in tests that break. |
42fdc1f
to
cd3cbdc
Compare
/bot run |
PR_Github #4892 [ run ] triggered by Bot |
PR_Github #4892 [ run ] completed with state |
/bot run --disable-fail-fast |
PR_Github #4901 [ run ] triggered by Bot |
/bot run --disable-fail-fast |
PR_Github #4904 [ run ] triggered by Bot |
PR_Github #4901 [ run ] completed with state |
PR_Github #4904 [ run ] completed with state |
cd3cbdc
to
04514c0
Compare
@LarryXFly @kaiyux can this be merged? |
79c8384
to
a860905
Compare
/bot run |
PR_Github #5348 [ run ] triggered by Bot |
PR_Github #5348 [ run ] completed with state |
a860905
to
0aac105
Compare
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
- When validating the pytorch tests with the isl/osl/conc/quant settings (that is done for cpp backend too), seeing hangs that need further debugging. - Therefore don't want to block this PR, hence removing them. - Seeing Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
0aac105
to
74ff4d8
Compare
/bot run --disable-fail-fast |
PR_Github #5531 [ run ] triggered by Bot |
PR_Github #5531 [ run ] completed with state |
/bot run |
PR_Github #5545 [ run ] triggered by Bot |
PR_Github #5545 [ run ] completed with state |
/bot skip --comment "Irrelevant TOT failures" |
PR_Github #5573 [ skip ] triggered by Bot |
PR_Github #5573 [ skip ] completed with state |
Description
Expand release-perf-regession-testing converage of
llama_v3.1_nemotron_nano_8b
to map to NIM benchmarking configs only for the cpp backend (as the PyT backend seems to be having issues that need further debugging).What’s inside
Cross product of the following:
📊 Performance Benchmark Summary (subset)
As sanity check, below is perf overview of subset (8) of the 32 cases - that span the TRT flow on 1 H100.
🧵 Observations
📂 Dataset Details
Each configuration used synthetic datasets generated with consistent parameters (
512
sequences per run) using the script inTensorRT-LLM/benchmarks/cpp/prepare_dataset.py
.