[BugFix] Fix Memory Leak #17567

robertgshaw2-redhat · 2025-05-02T00:00:30Z

SUMMARY:

recent PR ([V1][PP] Optimization: continue scheduling prefill chunks #17080) introduced a memory leak
this fixes it

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

github-actions · 2025-05-02T00:00:39Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

comaniac

Thanks for locating the memory leak and the fix!

comaniac · 2025-05-02T00:13:02Z

vllm/v1/core/sched/scheduler.py

+            # NOTE(rob): since we free stopped reqs above, adding stopped reqs
+            # to _cached_reqs_data will cause a memory leak.
+            if req_data.req_id not in stopped_set:
+                self._cached_reqs_data[req_data.req_id].append(req_data)


Can we just do the following so that we don't need to introduce stopped_set?

if req_data.req_id in self._cached_reqs_data: self._cached_reqs_data[req_data.req_id].append(req_data)

I don't think that would work unless line 541 is changed from self._cached_reqs_data.get(request.request_id) to self._cached_reqs_data[request.request_id].

Alternatively I think you could use if req_data.req_id not in self.finished_req_ids to avoid having a new set.

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

nightflight-dk · 2025-05-03T20:04:04Z

any plans to fix in V0? batching with phi mini (3.5 and 4), lora_enabled - V0 showed degraded performance yesterday, responses showing signs of context from other prompts

nFunctor · 2025-05-05T11:34:26Z

@nightflight-dk sorry for a sidetrack, but I wanted to clarify since I don't understand the issue well here, do you mean that the chunked prefill, if enabled in V0, potentially leads to prompt mixing? There was a phenomenon like that for prefix caching due to hash function's behaviour in python 3.12 #12621 , but it got fixed for me in 0.7.2.

xiamuyingu · 2025-05-07T09:28:23Z

Is it means that if I upgrade vllm to v0.8.5.post1, and use option "--enable-prefix-caching" when start a model service, it will not have memory leak, right?

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

Syncing midstream NM fork to Upstream tag of [v0.8.5.post1](https://github.com/vllm-project/vllm/tree/v0.8.5.post1) + cherry pick of vllm-project@be633fb needed for benchmarks + [CP](neuralmagic/nm-vllm-ent@1fe447d) for compressed tensor bump + [CP](vllm-project#17677) for lora on AMD + [CP](vllm-project#17315) for llama4 w/ pure dense layers ``` commit 31c73ba (HEAD -> upstream-v0.8.5, nm-fork/upstream-v0.8.5) Author: Chauncey <chaunceyjiang@gmail.com> Date: Wed Apr 30 15:11:04 2025 +0800 [Bugfix] Fix AttributeError: 'State' object has no attribute 'engine_client' (vllm-project#17434) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> commit f8db0bd Author: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Date: Fri May 2 14:01:38 2025 -0400 [BugFix][Attention] Fix sliding window attention in V1 giving incorrect results (vllm-project#17574) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> commit e335c34 Author: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Date: Fri May 2 04:07:03 2025 -0400 [BugFix] Fix Memory Leak (vllm-project#17567) Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> commit cc463fe Merge: 1e358ff ba41cc9 Author: Selbi Nuryyeva <selbi@redhat.com> Date: Tue Apr 29 12:34:57 2025 -0400 Merge branch 'tag-upstream-v0.8.5' into upstream-v0.8.5 commit ba41cc9 (tag: v0.8.5, tag-upstream-v0.8.5) Author: Michael Goin <mgoin64@gmail.com> Date: Mon Apr 28 16:20:24 2025 -0600 [Model] Add tuned triton fused_moe configs for Qwen3Moe (vllm-project#17328) Signed-off-by: mgoin <mgoin64@gmail.com> commit dcbac4c Author: Simon Mo <simon.mo@hey.com> Date: Mon Apr 28 14:12:01 2025 -0700 [Model] Qwen3 Dense FP8 Compat Fixes (vllm-project#17318) Signed-off-by: simon-mo <xmo@berkeley.edu> [...] ``` Commands ``` git fetch upstream git checkout -b upstream-v0.8.5 git merge upstream/v0.8.5 git cherry-pick be633fb ``` TEST PLAN accept sync: https://github.com/neuralmagic/nm-cicd/actions/runs/14841223552 related PR in cicd: neuralmagic/nm-cicd#99 release workflow: https://github.com/neuralmagic/nm-cicd/actions/runs/14845693864

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

robertgshaw2-redhat added 2 commits May 1, 2025 23:33

updated

1e4e107

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

updated

1d9f88e

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

robertgshaw2-redhat requested review from WoosukKwon, njhill, ywang96, comaniac and alexm-redhat as code owners May 2, 2025 00:00

updated

c00f3e7

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

mergify bot added the v1 label May 2, 2025

comaniac reviewed May 2, 2025

View reviewed changes

address nicks comment

bc53d27

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

njhill approved these changes May 2, 2025

View reviewed changes

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label May 2, 2025

vllm-bot merged commit c777df7 into vllm-project:main May 2, 2025
66 of 68 checks passed

radeksm pushed a commit to radeksm/vllm that referenced this pull request May 2, 2025

[BugFix] Fix Memory Leak (vllm-project#17567)

1434b45

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

simon-mo pushed a commit that referenced this pull request May 2, 2025

[BugFix] Fix Memory Leak (#17567)

edb5286

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025

[BugFix] Fix Memory Leak (vllm-project#17567)

7e318ba

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>

dtrifiro pushed a commit to red-hat-data-services/vllm that referenced this pull request May 13, 2025

[BugFix] Fix Memory Leak (vllm-project#17567)

e335c34

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

mawong-amd pushed a commit to ROCm/vllm that referenced this pull request May 14, 2025

[BugFix] Fix Memory Leak (vllm-project#17567)

0570648

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

ckhordiasma mentioned this pull request May 14, 2025

nm vllm ent 0.8.5 sync red-hat-data-services/vllm#139

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix] Fix Memory Leak #17567

[BugFix] Fix Memory Leak #17567

robertgshaw2-redhat commented May 2, 2025 •

edited by github-actions bot

Loading

github-actions bot commented May 2, 2025

comaniac left a comment

comaniac May 2, 2025

njhill May 2, 2025

robertgshaw2-redhat May 2, 2025

nightflight-dk commented May 3, 2025

nFunctor commented May 5, 2025

xiamuyingu commented May 7, 2025 •

edited

Loading

[BugFix] Fix Memory Leak #17567

[BugFix] Fix Memory Leak #17567

Conversation

robertgshaw2-redhat commented May 2, 2025 • edited by github-actions bot Loading

github-actions bot commented May 2, 2025

comaniac left a comment

Choose a reason for hiding this comment

comaniac May 2, 2025

Choose a reason for hiding this comment

njhill May 2, 2025

Choose a reason for hiding this comment

robertgshaw2-redhat May 2, 2025

Choose a reason for hiding this comment

nightflight-dk commented May 3, 2025

nFunctor commented May 5, 2025

xiamuyingu commented May 7, 2025 • edited Loading

robertgshaw2-redhat commented May 2, 2025 •

edited by github-actions bot

Loading

xiamuyingu commented May 7, 2025 •

edited

Loading