-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Pull requests: NVIDIA/TensorRT-LLM
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Integrate trtllm-gen kernels for QKVGemm, FC13+swiGLU, and FC2 for Llama4
#4201
opened May 10, 2025 by
eopXD
Loading…
[TRTLLM-5188] fix: [AutoDeploy] update output shape of prepare_fused_mha_metadata_fake
#4199
opened May 9, 2025 by
Fridah-nv
Loading…
Extend the Llama-Nemotron-Nano-8B perf-integration-tests
#4195
opened May 9, 2025 by
venkywonka
Loading…
infra: [TRTLLM-325] Prepare for NGC release - multiplatform build
#4191
opened May 9, 2025 by
MartinMarciniszyn
Loading…
[TRTQA-2802][fix]: add --host for mgmn serve examples script
#4175
opened May 9, 2025 by
xinhe-nv
Loading…
Breaking change: perf: Enable scheduling overlap by default
#4174
opened May 9, 2025 by
kaiyux
Loading…
[TRTLLM-5054][fix] Removing repeated loading of input processor
#4161
opened May 8, 2025 by
rakib-hasan
Loading…
[TRTLLM-5050][feat] Enable per-request stats with PyT backend
#4156
opened May 8, 2025 by
pcastonguay
Loading…
Feat: support exporting softmax statistics and update the kernel-selection heuristic
#4155
opened May 8, 2025 by
PerkzZheng
Loading…
Previous Next
ProTip!
Updated in the last three days: updated:>2025-05-07.