Skip to content

Commit f3ee16c

Browse files
nlzyMu Huai
authored and
Mu Huai
committed
[Bugfix] Temporarily disable gptq_bitblas on ROCm (vllm-project#17411)
Signed-off-by: Yan Cangang <nalanzeyu@gmail.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
1 parent bdbfd73 commit f3ee16c

File tree

2 files changed

+6
-1
lines changed

2 files changed

+6
-1
lines changed

docs/source/features/quantization/supported_hardware.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,7 @@ The table below shows the compatibility of various quantization implementations
8080
* ✅︎
8181
* ✅︎
8282
* ✅︎
83-
* ✅︎
83+
*
8484
*
8585
*
8686
*

vllm/model_executor/layers/quantization/gptq_bitblas.py

+5
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525
PackedColumnParameter,
2626
PackedvLLMParameter,
2727
RowvLLMParameter)
28+
from vllm.platforms import current_platform
2829
from vllm.scalar_type import scalar_types
2930

3031
logger = init_logger(__name__)
@@ -191,6 +192,10 @@ def is_gptq_bitblas_compatible(cls, quant_config: Dict[str, Any]):
191192
sym = quant_config.get("sym")
192193
desc_act = quant_config.get("desc_act")
193194

195+
# temporarily disable on ROCm platform
196+
if not current_platform.is_cuda():
197+
return False
198+
194199
# If we cannot find the info needed in the config, cannot convert.
195200
if (num_bits is None or group_size is None or sym is None
196201
or desc_act is None):

0 commit comments

Comments
 (0)