Skip to content

v0.3 pre版本 使用AMX yaml报错 #617

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
cunfate opened this issue Feb 23, 2025 · 11 comments
Open

v0.3 pre版本 使用AMX yaml报错 #617

cunfate opened this issue Feb 23, 2025 · 11 comments

Comments

@cunfate
Copy link

cunfate commented Feb 23, 2025

commmand:

python3 -m ktransformers.local_chat --model_path ~/ktransformers/deepseek-r1/ --gguf_path /models/unsloth --optimize_rule_path  /root/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-multi-gpu-amx.yaml --cpu_infer 92 --max_new_tokens 1000
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/root/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/local_chat.py", line 267, in <module>
    fire.Fire(local_chat)
  File "/root/miniconda3/envs/ktransformers/lib/python3.11/site-packages/fire/core.py", line 135, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/ktransformers/lib/python3.11/site-packages/fire/core.py", line 468, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
                                ^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/ktransformers/lib/python3.11/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/local_chat.py", line 214, in local_chat
    optimize_and_load_gguf(model, optimize_rule_path, gguf_path, config)
  File "/root/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/optimize/optimize.py", line 129, in optimize_and_load_gguf
    load_weights(module, gguf_loader)
  File "/root/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/util/utils.py", line 83, in load_weights
    load_weights(child, gguf_loader, prefix+name+".")
  File "/root/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/util/utils.py", line 85, in load_weights
    module.load()
  File "/root/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/operators/base_operator.py", line 60, in load
    utils.load_weights(child, self.gguf_loader, self.key+".")
  File "/root/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/util/utils.py", line 83, in load_weights
    load_weights(child, gguf_loader, prefix+name+".")
  File "/root/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/util/utils.py", line 83, in load_weights
    load_weights(child, gguf_loader, prefix+name+".")
  File "/root/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/util/utils.py", line 83, in load_weights
    load_weights(child, gguf_loader, prefix+name+".")
  File "/root/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/util/utils.py", line 85, in load_weights
    module.load()
  File "/root/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/operators/base_operator.py", line 60, in load
    utils.load_weights(child, self.gguf_loader, self.key+".")
  File "/root/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/util/utils.py", line 83, in load_weights
    load_weights(child, gguf_loader, prefix+name+".")
  File "/root/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/util/utils.py", line 85, in load_weights
    module.load()
  File "/root/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/operators/experts.py", line 520, in load
    self.generate_experts.load(w)
  File "/root/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/operators/experts.py", line 201, in load
    assert self.gate_type == GGMLQuantizationType.BF16
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@moneyisallyouneed
Copy link

加的模型是啥呀,我也是一样的问题

@cunfate
Copy link
Author

cunfate commented Feb 23, 2025

加的模型是啥呀,我也是一样的问题

@moneyisallyouneed deepseek-r1-671B 4bit量化

@ubergarm
Copy link
Contributor

@cunfate

deepseek-r1-671B 4bit量化

您所指的是否确认为以下链接中的模型:
https://huggingface.co/unsloth/DeepSeek-R1-GGUF/tree/main/DeepSeek-R1-Q4_K_M ???

推测可能情况:
所述“未明v0.3版本”无法兼容GGUF格式,其设计或旨在利用英特尔AMX指令集支持,通过在线量化(on-the-fly
quantization)将完整BF16模型动态转换为int8(CPU端)及int4(GPU端)。

模型描述: DeepseekV3-BF16(支持在线量化至int8[CPU]与int4[GPU])
——教程参考


I assume you mean exactly https://huggingface.co/unsloth/DeepSeek-R1-GGUF/tree/main/DeepSeek-R1-Q4_K_M ???

It may be possible that the mysterious v0.3 build does not work with GGUF but wants to use Intel AMX support to do online quantization from full bf16 into int8 and int4?

Model: DeepseekV3-BF16 (online quant into int8 for CPU and int4 for GPU) -reference from tutorial

@cunfate
Copy link
Author

cunfate commented Feb 24, 2025

@ubergarm thank you for your reply! I'll try this

@txg1550759
Copy link

The methods for building and one - click running the Docker image of version 0.3 are freshly out! Welcome to test them.
0.3版本docker镜镜构建、一键运行方法新鲜出炉,欢迎测试:
https://github.com/txg1550759/ktransformers-v0.3-docker.git

@montagetao
Copy link

@ubergarm thank you for your reply! I'll try this

现在可以了么? 我也有相同的问题

@cunfate
Copy link
Author

cunfate commented Feb 24, 2025

@ubergarm thank you for your reply! I'll try this

现在可以了么? 我也有相同的问题

@montagetao 还在下BF16的模型,但是BF16的GGUF模型非常大(约1.2T),我还在努力下载ORZ

@ubergarm
Copy link
Contributor

@cunfate

我还在努力下载

老哥 你真是个英雄!我们都在等着听你的发现!

@txg1550759

是的,可能需要在@cunfate的代码中也使用该方法,通过预加载动态链接库来启用AMX并申请权限,具体可参考#320中的说明以及您提供的指南。
真喜欢你这股热情劲儿!


Yes, it may be required for cunfate to also use this code to enable AMX by preloading a dynamic-link library to request permission as described in #320 and detailed in your guide.

I like your enthusiasm!

@ubergarm
Copy link
Contributor

@cunfate ---> ggml-org/llama.cpp#12088

@ymodo
Copy link

ymodo commented Feb 27, 2025

0.3的版本哪里来的,仓库好像没看到0.3相关的版本啊。。。

@montagetao
Copy link

0.3的版本哪里来的,仓库好像没看到0.3相关的版本啊。。。

wheel,source code还没有公布

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants