[TRTLLM-5050][feat] Enable per-request stats with PyT backend #4156

pcastonguay · 2025-05-08T13:21:26Z

Description

Adds support for per-request stats via 'metrics' endpoint. Per-request stats can be obtained with the bollowing pytorch_backend_config parameters:

   enable_iter_perf_stats
   enable_iter_req_stats

Test Coverage

test_llm.py

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

pcastonguay · 2025-05-08T16:25:48Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-05-08T16:32:27Z

PR_Github #4593 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-08T23:39:20Z

PR_Github #4593 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3307 completed with status: 'FAILURE'

pcastonguay · 2025-05-09T02:58:32Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-05-09T03:03:51Z

PR_Github #4631 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-09T04:50:36Z

PR_Github #4631 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3340 completed with status: 'FAILURE'

pcastonguay · 2025-05-09T13:44:37Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-05-09T13:50:24Z

PR_Github #4721 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-09T22:51:58Z

PR_Github #4721 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3406 completed with status: 'FAILURE'

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

pcastonguay · 2025-05-12T11:41:41Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-05-12T11:47:41Z

PR_Github #4874 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-13T01:10:09Z

PR_Github #4874 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3532 completed with status: 'SUCCESS'

SimengLiu-nv · 2025-05-20T21:38:06Z

tensorrt_llm/_torch/pyexecutor/py_executor.py

+            req_stat.stage = req.stage
+            req_stats.append(req_stat)
+
+        for req in list(self.request_queue.queue):


@pcastonguay I got bug with function. The input of get_queued_req_stats is a tuple, not LlmRequest. self.request_queue.put((self.next_req_id, request))

pcastonguay requested review from jgangani and Tabrizian May 8, 2025 13:22

Tabrizian approved these changes May 8, 2025

View reviewed changes

jgangani approved these changes May 8, 2025

View reviewed changes

pcastonguay force-pushed the llmapi_enable_req_stats branch from c95b6fa to 4049844 Compare May 9, 2025 02:58

pcastonguay force-pushed the llmapi_enable_req_stats branch from 4049844 to f79c3c9 Compare May 9, 2025 13:44

pcastonguay added 4 commits May 12, 2025 07:41

feat: Add per-request stats support with PyT backend

65d7a0d

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

Adding unit test

9104b75

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

Fixing stats unit test

45a3689

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

Fixing test with overlap

90ccbff

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

pcastonguay force-pushed the llmapi_enable_req_stats branch from f79c3c9 to 90ccbff Compare May 12, 2025 11:41

pcastonguay requested review from dongxuy04 and HuiGao-NV as code owners May 12, 2025 11:41

pcastonguay merged commit 9643be5 into NVIDIA:main May 13, 2025
3 checks passed

SimengLiu-nv reviewed May 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TRTLLM-5050][feat] Enable per-request stats with PyT backend #4156

[TRTLLM-5050][feat] Enable per-request stats with PyT backend #4156

pcastonguay commented May 8, 2025 •

edited

Loading

pcastonguay commented May 8, 2025

tensorrt-cicd commented May 8, 2025

tensorrt-cicd commented May 8, 2025

pcastonguay commented May 9, 2025

tensorrt-cicd commented May 9, 2025

tensorrt-cicd commented May 9, 2025

pcastonguay commented May 9, 2025

tensorrt-cicd commented May 9, 2025

tensorrt-cicd commented May 9, 2025

pcastonguay commented May 12, 2025

tensorrt-cicd commented May 12, 2025

tensorrt-cicd commented May 13, 2025

SimengLiu-nv May 20, 2025

[TRTLLM-5050][feat] Enable per-request stats with PyT backend #4156

[TRTLLM-5050][feat] Enable per-request stats with PyT backend #4156

Conversation

pcastonguay commented May 8, 2025 • edited Loading

Description

Test Coverage

GitHub Bot Help

kill

skip

reuse-pipeline

pcastonguay commented May 8, 2025

tensorrt-cicd commented May 8, 2025

tensorrt-cicd commented May 8, 2025

pcastonguay commented May 9, 2025

tensorrt-cicd commented May 9, 2025

tensorrt-cicd commented May 9, 2025

pcastonguay commented May 9, 2025

tensorrt-cicd commented May 9, 2025

tensorrt-cicd commented May 9, 2025

pcastonguay commented May 12, 2025

tensorrt-cicd commented May 12, 2025

tensorrt-cicd commented May 13, 2025

SimengLiu-nv May 20, 2025

Choose a reason for hiding this comment

pcastonguay commented May 8, 2025 •

edited

Loading