Extend the Llama-Nemotron-Nano-8B perf-integration-tests (cpp) #4195

venkywonka · 2025-05-09T17:02:22Z

Description

Expand release-perf-regession-testing converage of llama_v3.1_nemotron_nano_8b to map to NIM benchmarking configs only for the cpp backend (as the PyT backend seems to be having issues that need further debugging).

What’s inside

Cross product of the following:

- model: llama_v3.1_nemotron_nano_8b
- runtime: bench
- backend: [trt]
- con: [1, 250] # concurrency
- input_output_len:
  - [5000, 500]
  - [500, 2000]
  - [1000, 1000]
  - [20000, 2000]
- quant: [none, fp8]
- dtype: [bfloat16]
- maxbs: [64]

Total: 16 newly added cpp-backend perf teststests/integration/test_lists/qa/trt_llm_release_perf_test.yml, removing previous cpp-backend perf tests, preserving the pyt-backend perf tests.

📊 Performance Benchmark Summary (subset)

As sanity check, below is perf overview of subset (8) of the 32 cases - that span the TRT flow on 1 H100.

Input/Output Length	Concurrency	Precision	Request Throughput (req/s)	Output TPS (tokens/s)	P50 Latency (ms)
5000 / 500	1	BF16	0.2312	115.60	4328.22
5000 / 500	1	FP8	0.4163	208.13	2397.91
500 / 2000	1	BF16	0.0616	123.26	16203.74
500 / 2000	1	FP8	0.1051	210.18	9516.47
5000 / 500	250	BF16	2.0712	1035.62	117121.77
5000 / 500	250	FP8	3.6120	1805.98	67227.43
500 / 2000	250	BF16	0.8633	1726.57	278001.38
500 / 2000	250	FP8	1.4663	2932.65	163683.41

🧵 Observations

FP8 consistently improves both request throughput and token throughput across all configurations, with up to 2× higher output TPS in high concurrency scenarios.
Latency (P50) significantly reduces with FP8, especially under low concurrency, showing strong inference-time gains.
Concurrency scaling: Increasing concurrency to 250 leads to significantly higher overall throughput, though with expected increases in tail latencies.

📂 Dataset Details

Each configuration used synthetic datasets generated with consistent parameters (512 sequences per run) using the script in TensorRT-LLM/benchmarks/cpp/prepare_dataset.py.

Copilot

Pull Request Overview

This PR extends the performance integration tests for llama_v3.1_nemotron_nano_8b so that the test cases reflect the updated NIM benchmarking configurations.

Removed previous torch and TRT backend tests
Added 32 new test scenarios covering both C++ (cpp) and Python (pyt) backends with various precision, batch, and sequence length configurations

tests/integration/test_lists/qa/trt_llm_release_perf_test.yml

venkywonka · 2025-05-12T15:12:48Z

I'm locally seeing issue with the above configurations when using pyt backend, hence limiting this PR to only cpp backend. Don't want to check-in tests that break.

venkywonka · 2025-05-12T16:09:33Z

/bot run

tensorrt-cicd · 2025-05-12T16:16:47Z

PR_Github #4892 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-12T18:44:12Z

PR_Github #4892 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3544 completed with status: 'FAILURE'

venkywonka · 2025-05-12T20:01:11Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-05-12T20:07:27Z

PR_Github #4901 [ run ] triggered by Bot

venkywonka · 2025-05-12T20:23:28Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-05-12T20:29:35Z

PR_Github #4904 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-12T20:29:37Z

PR_Github #4901 [ run ] completed with state ABORTED

tensorrt-cicd · 2025-05-12T22:08:41Z

PR_Github #4904 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3556 completed with status: 'FAILURE'

venkywonka · 2025-05-14T14:02:07Z

@LarryXFly @kaiyux can this be merged?

venkywonka · 2025-05-15T13:10:09Z

/bot run

tensorrt-cicd · 2025-05-15T13:16:18Z

PR_Github #5348 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-15T15:11:50Z

PR_Github #5348 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3901 completed with status: 'FAILURE'

Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>

- When validating the pytorch tests with the isl/osl/conc/quant settings (that is done for cpp backend too), seeing hangs that need further debugging. - Therefore don't want to block this PR, hence removing them. - Seeing Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>

venkywonka · 2025-05-16T19:11:06Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-05-16T19:17:10Z

PR_Github #5531 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-16T22:51:49Z

PR_Github #5531 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4032 completed with status: 'FAILURE'

venkywonka · 2025-05-16T23:23:51Z

/bot run

tensorrt-cicd · 2025-05-16T23:30:10Z

PR_Github #5545 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-17T01:04:42Z

PR_Github #5545 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4046 completed with status: 'FAILURE'

chzblych · 2025-05-17T14:32:51Z

/bot skip --comment "Irrelevant TOT failures"

tensorrt-cicd · 2025-05-17T14:39:01Z

PR_Github #5573 [ skip ] triggered by Bot

tensorrt-cicd · 2025-05-17T14:44:46Z

PR_Github #5573 [ skip ] completed with state SUCCESS
Skipping testing for commit 74ff4d8

venkywonka requested a review from Copilot May 9, 2025 17:02

Copilot AI reviewed May 9, 2025

View reviewed changes

venkywonka marked this pull request as ready for review May 9, 2025 17:06

venkywonka requested review from tijyojwad, LarryXFly and ruodil May 9, 2025 17:07

ruodil reviewed May 12, 2025

View reviewed changes

tests/integration/test_lists/qa/trt_llm_release_perf_test.yml Outdated Show resolved Hide resolved

ruodil reviewed May 12, 2025

View reviewed changes

tests/integration/test_lists/qa/trt_llm_release_perf_test.yml Outdated Show resolved Hide resolved

venkywonka changed the title ~~Extend the Llama-Nemotron-Nano-8B perf-integration-tests~~ Extend the Llama-Nemotron-Nano-8B perf-integration-tests (cpp) May 12, 2025

venkywonka force-pushed the user/venky/ll-nemo-nano-perf-test-ext branch from 42fdc1f to cd3cbdc Compare May 12, 2025 16:08

venkywonka force-pushed the user/venky/ll-nemo-nano-perf-test-ext branch from cd3cbdc to 04514c0 Compare May 13, 2025 01:50

venkywonka requested review from schetlur-nv and ruodil May 13, 2025 16:16

venkywonka requested a review from kaiyux May 14, 2025 14:02

venkywonka force-pushed the user/venky/ll-nemo-nano-perf-test-ext branch 2 times, most recently from 79c8384 to a860905 Compare May 15, 2025 13:09

venkywonka force-pushed the user/venky/ll-nemo-nano-perf-test-ext branch from a860905 to 0aac105 Compare May 15, 2025 20:58

venkywonka added 3 commits May 16, 2025 12:10

add ll-nm-nano tests that map to nim requirements

4ac062f

Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>

prune some pytorch cases (fp8)

c901864

Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>

venkywonka force-pushed the user/venky/ll-nemo-nano-perf-test-ext branch from 0aac105 to 74ff4d8 Compare May 16, 2025 19:10

chzblych approved these changes May 17, 2025

View reviewed changes

chzblych merged commit fb663b6 into NVIDIA:main May 17, 2025
3 checks passed

Extend the Llama-Nemotron-Nano-8B perf-integration-tests (cpp) #4195

Extend the Llama-Nemotron-Nano-8B perf-integration-tests (cpp) #4195

Uh oh!

Conversation

venkywonka commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

What’s inside

📊 Performance Benchmark Summary (subset)

🧵 Observations

📂 Dataset Details

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

venkywonka commented May 12, 2025

Uh oh!

venkywonka commented May 12, 2025

Uh oh!

tensorrt-cicd commented May 12, 2025

Uh oh!

tensorrt-cicd commented May 12, 2025

Uh oh!

venkywonka commented May 12, 2025

Uh oh!

tensorrt-cicd commented May 12, 2025

Uh oh!

venkywonka commented May 12, 2025

Uh oh!

tensorrt-cicd commented May 12, 2025

Uh oh!

tensorrt-cicd commented May 12, 2025

Uh oh!

tensorrt-cicd commented May 12, 2025

Uh oh!

venkywonka commented May 14, 2025

Uh oh!

venkywonka commented May 15, 2025

Uh oh!

tensorrt-cicd commented May 15, 2025

Uh oh!

tensorrt-cicd commented May 15, 2025

Uh oh!

venkywonka commented May 16, 2025

Uh oh!

tensorrt-cicd commented May 16, 2025

Uh oh!

tensorrt-cicd commented May 16, 2025

Uh oh!

venkywonka commented May 16, 2025

Uh oh!

tensorrt-cicd commented May 16, 2025

Uh oh!

tensorrt-cicd commented May 17, 2025

Uh oh!

chzblych commented May 17, 2025

Uh oh!

tensorrt-cicd commented May 17, 2025

Uh oh!

tensorrt-cicd commented May 17, 2025

Uh oh!

Uh oh!

Uh oh!

venkywonka commented May 9, 2025 •

edited

Loading