Skip to content

[Guide]: Usage on Graph mode #767

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
MengqingCao opened this issue May 6, 2025 · 3 comments
Open

[Guide]: Usage on Graph mode #767

MengqingCao opened this issue May 6, 2025 · 3 comments
Labels
guide guide note

Comments

@MengqingCao
Copy link
Collaborator

MengqingCao commented May 6, 2025

How to Use Grpah mode on vLLM Ascend

Graph mode is supported experimentally:

1. Graph mode for DeepSeek model:

  • Software:

    Software Supported version
    vllm main/v0.8.5/v0.8.5.post1
    vllm-ascend main/v0.8.5rc1
  • Usage:
    Set enable_graph_mode to True in additional_config to enable graph mode for DeepSeek model:

        "additional_config": {
            'enable_graph_mode': True,
        },

For example:

llm = LLM(
    model="deepseek-ai/DeepSeek-V2-Lite",
    additional_config={
         'enable_graph_mode': True,
    },
)

Note

enable_graph_mode should only be enabled when inferencing with DeepSeek. Other models are not supported.

2. Graph mode for dense model:

  • Software:

    Software Supported version
    vllm main
    vllm-ascend main
  • Usage:

Step1: enable V1 engine

export VLLM_USE_V1=1

Step2: modify platform.py in vllm-ascend to make graph mode work

diff --git a/vllm_ascend/platform.py b/vllm_ascend/platform.py
index 5d2c8ac..3f9f015 100644
--- a/vllm_ascend/platform.py
+++ b/vllm_ascend/platform.py
@@ -119,9 +119,7 @@ class NPUPlatform(Platform):
             enforce_eager = getattr(vllm_config.model_config, "enforce_eager",
                                     False)
 
-        # TODO(Yizhou): Override the value of enforce_eager to True before
-        # the CANN and torch_npu support NPU compilation.
-        enforce_eager = True
+
         logger.warning(
             "NPU compilation support pending. Will be available in future CANN and "
             "torch_npu releases. NPU graph mode is currently experimental and disabled "

Step3: use transfer_to_npu as the following scripts

from vllm import LLM, SamplingParams
# use transfer_to_npu to address cudagraph hard code issue. will remove this when adapted in vllm
from torch_npu.contrib import transfer_to_npu

prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

# Create a sampling params object.
sampling_params = SamplingParams(max_tokens=100, temperature=0.0)
# Create an LLM.
llm = LLM(model="Qwen/Qwen2.5-0.5B-Instruct")

# Generate texts from the prompts.
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
@umeiko
Copy link

umeiko commented May 14, 2025

should be from torch_npu.contrib import transfer_to_npu , not from torch_npu.contirb import transfer_to_npu

and, when I comma /vllm-workspace/vllm-ascend/vllm_ascend/platform.py line 124 enforce_eager = True, it reports:

WARNING 05-14 02:54:07 [platform.py:125] NPU compilation support pending. Will be available in future CANN and torch_npu releases. NPU graph mode is currently experimental and disabled by default. You can just adopt additional_config={'enable_graph_mode': True} to serve deepseek models with NPU graph mode on vllm-ascend with V0 engine.
INFO 05-14 02:54:07 [platform.py:141] PIECEWISE compilation enabled on NPU. use_inductor not supported - using only ACL Graph mode
WARNING 05-14 02:54:07 [platform.py:179] Prefix caching is not supported for V1 now, disable prefix caching
INFO 05-14 02:54:08 [core.py:58] Initializing a V1 LLM engine (v0.8.5.post1) with config: model='/mnt/models/Qwen2.5-0.5B-Instruct', speculative_config=None, tokenizer='/mnt/models/Qwen2.5-0.5B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto,  device_config=npu, decoding_config=DecodingConfig(guided_decoding_backend='auto', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=None, served_model_name=/mnt/models/Qwen2.5-0.5B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=True, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"level":3,"custom_ops":["all"],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.unified_ascend_attention_with_output"],"use_inductor":false,"compile_sizes":[],"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":512}
/usr/local/python3.10.17/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py:29: ResourceWarning: unclosed <socket.socket fd=19, family=AddressFamily.AF_INET, type=SocketKind.SOCK_DGRAM, proto=0, laddr=('10.246.91.186', 45099), raddr=('8.8.8.8', 80)>
  get_ip(), get_open_port())
WARNING 05-14 02:54:10 [utils.py:2522] Methods add_lora,cache_config,determine_available_memory,determine_num_available_blocks,device_config,get_cache_block_size_bytes,list_loras,load_config,pin_lora,remove_lora,scheduler_config not implemented in <vllm_ascend.worker.worker_v1.NPUWorker object at 0xfffd0cfa2590>
[rank0]:[W514 02:54:15.919541630 ProcessGroupGloo.cpp:715] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
INFO 05-14 02:54:15 [parallel_state.py:1004] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0
INFO 05-14 02:54:16 [model_runner_v1.py:852] Starting to load model /mnt/models/Qwen2.5-0.5B-Instruct...
ERROR 05-14 02:54:18 [core.py:396] EngineCore failed to start.
ERROR 05-14 02:54:18 [core.py:396] Traceback (most recent call last):
ERROR 05-14 02:54:18 [core.py:396]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 387, in run_engine_core
ERROR 05-14 02:54:18 [core.py:396]     engine_core = EngineCoreProc(*args, **kwargs)
ERROR 05-14 02:54:18 [core.py:396]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 329, in __init__
ERROR 05-14 02:54:18 [core.py:396]     super().__init__(vllm_config, executor_class, log_stats,
ERROR 05-14 02:54:18 [core.py:396]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 64, in __init__
ERROR 05-14 02:54:18 [core.py:396]     self.model_executor = executor_class(vllm_config)
ERROR 05-14 02:54:18 [core.py:396]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 52, in __init__
ERROR 05-14 02:54:18 [core.py:396]     self._init_executor()
ERROR 05-14 02:54:18 [core.py:396]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 47, in _init_executor
ERROR 05-14 02:54:18 [core.py:396]     self.collective_rpc("load_model")
ERROR 05-14 02:54:18 [core.py:396]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
ERROR 05-14 02:54:18 [core.py:396]     answer = run_method(self.driver_worker, method, args, kwargs)
ERROR 05-14 02:54:18 [core.py:396]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/vllm/utils.py", line 2456, in run_method
ERROR 05-14 02:54:18 [core.py:396]     return func(*args, **kwargs)
ERROR 05-14 02:54:18 [core.py:396]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 178, in load_model
ERROR 05-14 02:54:18 [core.py:396]     self.model_runner.load_model()
ERROR 05-14 02:54:18 [core.py:396]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 855, in load_model
ERROR 05-14 02:54:18 [core.py:396]     self.model = get_model(vllm_config=self.vllm_config)
ERROR 05-14 02:54:18 [core.py:396]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/vllm/model_executor/model_loader/__init__.py", line 14, in get_model
ERROR 05-14 02:54:18 [core.py:396]     return loader.load_model(vllm_config=vllm_config)
ERROR 05-14 02:54:18 [core.py:396]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/vllm/model_executor/model_loader/loader.py", line 452, in load_model
ERROR 05-14 02:54:18 [core.py:396]     model = _initialize_model(vllm_config=vllm_config)
ERROR 05-14 02:54:18 [core.py:396]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/vllm/model_executor/model_loader/loader.py", line 133, in _initialize_model
ERROR 05-14 02:54:18 [core.py:396]     return model_class(vllm_config=vllm_config, prefix=prefix)
ERROR 05-14 02:54:18 [core.py:396]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/vllm/model_executor/models/qwen2.py", line 436, in __init__
ERROR 05-14 02:54:18 [core.py:396]     self.model = Qwen2Model(vllm_config=vllm_config,
ERROR 05-14 02:54:18 [core.py:396]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/vllm/compilation/decorators.py", line 162, in __init__
ERROR 05-14 02:54:18 [core.py:396]     TorchCompileWrapperWithCustomDispatcher.__init__(
ERROR 05-14 02:54:18 [core.py:396]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/vllm/compilation/wrapper.py", line 42, in __init__
ERROR 05-14 02:54:18 [core.py:396]     backend = vllm_config.compilation_config.init_backend(vllm_config)
ERROR 05-14 02:54:18 [core.py:396]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/vllm/config.py", line 3600, in init_backend
ERROR 05-14 02:54:18 [core.py:396]     from vllm.compilation.backends import VllmBackend
ERROR 05-14 02:54:18 [core.py:396]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/vllm/compilation/backends.py", line 20, in <module>
ERROR 05-14 02:54:18 [core.py:396]     from .compiler_interface import EagerAdaptor, InductorAdaptor
ERROR 05-14 02:54:18 [core.py:396]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/vllm/compilation/compiler_interface.py", line 11, in <module>
ERROR 05-14 02:54:18 [core.py:396]     import torch._inductor.compile_fx
ERROR 05-14 02:54:18 [core.py:396]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 72, in <module>
ERROR 05-14 02:54:18 [core.py:396]     from .fx_passes.joint_graph import joint_graph_passes
ERROR 05-14 02:54:18 [core.py:396]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_inductor/fx_passes/joint_graph.py", line 19, in <module>
ERROR 05-14 02:54:18 [core.py:396]     from ..pattern_matcher import (
ERROR 05-14 02:54:18 [core.py:396]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_inductor/pattern_matcher.py", line 96, in <module>
ERROR 05-14 02:54:18 [core.py:396]     from .lowering import fallback_node_due_to_unsupported_type
ERROR 05-14 02:54:18 [core.py:396]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_inductor/lowering.py", line 6430, in <module>
ERROR 05-14 02:54:18 [core.py:396]     from . import kernel
ERROR 05-14 02:54:18 [core.py:396]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_inductor/kernel/__init__.py", line 1, in <module>
ERROR 05-14 02:54:18 [core.py:396]     from . import mm, mm_common, mm_plus_mm, unpack_mixed_mm
ERROR 05-14 02:54:18 [core.py:396]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_inductor/kernel/mm.py", line 16, in <module>
ERROR 05-14 02:54:18 [core.py:396]     from torch._inductor.codegen.cpp_gemm_template import CppPackedGemmTemplate
ERROR 05-14 02:54:18 [core.py:396]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_inductor/codegen/cpp_gemm_template.py", line 19, in <module>
ERROR 05-14 02:54:18 [core.py:396]     from .cpp_micro_gemm import CppMicroGemmAMX, create_micro_gemm, LayoutType
ERROR 05-14 02:54:18 [core.py:396]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_inductor/codegen/cpp_micro_gemm.py", line 16, in <module>
ERROR 05-14 02:54:18 [core.py:396]     from .cpp_template_kernel import CppTemplateKernel
ERROR 05-14 02:54:18 [core.py:396]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_inductor/codegen/cpp_template_kernel.py", line 20, in <module>
ERROR 05-14 02:54:18 [core.py:396]     from .cpp_wrapper_cpu import CppWrapperCpu
ERROR 05-14 02:54:18 [core.py:396]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_inductor/codegen/cpp_wrapper_cpu.py", line 30, in <module>
ERROR 05-14 02:54:18 [core.py:396]     from .wrapper import EnterSubgraphLine, ExitSubgraphLine, WrapperCodeGen
ERROR 05-14 02:54:18 [core.py:396]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_inductor/codegen/wrapper.py", line 46, in <module>
ERROR 05-14 02:54:18 [core.py:396]     from ..runtime import triton_heuristics
ERROR 05-14 02:54:18 [core.py:396]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 55, in <module>
ERROR 05-14 02:54:18 [core.py:396]     from triton import Config
ERROR 05-14 02:54:18 [core.py:396] ImportError: cannot import name 'Config' from 'triton' (unknown location)
Process EngineCore_0:
Traceback (most recent call last):
  File "/usr/local/python3.10.17/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/local/python3.10.17/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 400, in run_engine_core
    raise e
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 387, in run_engine_core
    engine_core = EngineCoreProc(*args, **kwargs)
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 329, in __init__
    super().__init__(vllm_config, executor_class, log_stats,
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 64, in __init__
    self.model_executor = executor_class(vllm_config)
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 52, in __init__
    self._init_executor()
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 47, in _init_executor
    self.collective_rpc("load_model")
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
    answer = run_method(self.driver_worker, method, args, kwargs)
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/vllm/utils.py", line 2456, in run_method
    return func(*args, **kwargs)
  File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 178, in load_model
    self.model_runner.load_model()
  File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 855, in load_model
    self.model = get_model(vllm_config=self.vllm_config)
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/vllm/model_executor/model_loader/__init__.py", line 14, in get_model
    return loader.load_model(vllm_config=vllm_config)
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/vllm/model_executor/model_loader/loader.py", line 452, in load_model
    model = _initialize_model(vllm_config=vllm_config)
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/vllm/model_executor/model_loader/loader.py", line 133, in _initialize_model
    return model_class(vllm_config=vllm_config, prefix=prefix)
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/vllm/model_executor/models/qwen2.py", line 436, in __init__
    self.model = Qwen2Model(vllm_config=vllm_config,
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/vllm/compilation/decorators.py", line 162, in __init__
    TorchCompileWrapperWithCustomDispatcher.__init__(
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/vllm/compilation/wrapper.py", line 42, in __init__
    backend = vllm_config.compilation_config.init_backend(vllm_config)
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/vllm/config.py", line 3600, in init_backend
    from vllm.compilation.backends import VllmBackend
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/vllm/compilation/backends.py", line 20, in <module>
    from .compiler_interface import EagerAdaptor, InductorAdaptor
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/vllm/compilation/compiler_interface.py", line 11, in <module>
    import torch._inductor.compile_fx
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 72, in <module>
    from .fx_passes.joint_graph import joint_graph_passes
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_inductor/fx_passes/joint_graph.py", line 19, in <module>
    from ..pattern_matcher import (
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_inductor/pattern_matcher.py", line 96, in <module>
    from .lowering import fallback_node_due_to_unsupported_type
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_inductor/lowering.py", line 6430, in <module>
    from . import kernel
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_inductor/kernel/__init__.py", line 1, in <module>
    from . import mm, mm_common, mm_plus_mm, unpack_mixed_mm
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_inductor/kernel/mm.py", line 16, in <module>
    from torch._inductor.codegen.cpp_gemm_template import CppPackedGemmTemplate
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_inductor/codegen/cpp_gemm_template.py", line 19, in <module>
    from .cpp_micro_gemm import CppMicroGemmAMX, create_micro_gemm, LayoutType
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_inductor/codegen/cpp_micro_gemm.py", line 16, in <module>
    from .cpp_template_kernel import CppTemplateKernel
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_inductor/codegen/cpp_template_kernel.py", line 20, in <module>
    from .cpp_wrapper_cpu import CppWrapperCpu
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_inductor/codegen/cpp_wrapper_cpu.py", line 30, in <module>
    from .wrapper import EnterSubgraphLine, ExitSubgraphLine, WrapperCodeGen
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_inductor/codegen/wrapper.py", line 46, in <module>
    from ..runtime import triton_heuristics
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 55, in <module>
    from triton import Config
ImportError: cannot import name 'Config' from 'triton' (unknown location)
Traceback (most recent call last):
  File "/vllm-workspace/./simple_test.py", line 15, in <module>
    llm = LLM(model="/mnt/models/Qwen2.5-0.5B-Instruct")
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/vllm/utils.py", line 1161, in inner
    return fn(*args, **kwargs)
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 247, in __init__
    self.llm_engine = LLMEngine.from_engine_args(
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/vllm/v1/engine/llm_engine.py", line 138, in from_engine_args
    return cls(vllm_config=vllm_config,
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/vllm/v1/engine/llm_engine.py", line 92, in __init__
    self.engine_core = EngineCoreClient.make_client(
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 73, in make_client
    return SyncMPClient(vllm_config, executor_class, log_stats)
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 494, in __init__
    super().__init__(
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 398, in __init__
    self._wait_for_engine_startup()
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 430, in _wait_for_engine_startup
    raise RuntimeError("Engine core initialization failed. "
RuntimeError: Engine core initialization failed. See root cause above.
[ERROR] 2025-05-14-02:54:18 (PID:15115, Device:-1, RankID:-1) ERR99999 UNKNOWN applicaiton exception
/usr/local/python3.10.17/lib/python3.10/tempfile.py:869: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmpso161dv_'>
  _warnings.warn(warn_message, ResourceWarning)

https://github.com/vllm-project/vllm-ascend/pull/854/files <- this pr fixed the problem

@MengqingCao
Copy link
Collaborator Author

should be from torch_npu.contrib import transfer_to_npu , not from torch_npu.contirb import transfer_to_npu

Thanks for pointing this!

@Yikun Yikun changed the title [Usage]: Usage on Graph mode [Guide]: Usage on Graph mode May 15, 2025
@Yikun Yikun added the guide guide note label May 15, 2025
@JackWeiw
Copy link

@umeiko Hi, I met the same problem with you. I built vllm-ascend from source and my commit hash is 7a325b2 which including PR854. I installed vllm by pip install vllm==0.8.5.post1. How come?

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
guide guide note
Projects
None yet
Development

No branches or pull requests

4 participants