Skip to content

sycl : Fixes broken build and test-backend-ops #10257

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 13, 2024

Conversation

Alcpz
Copy link
Collaborator

@Alcpz Alcpz commented Nov 11, 2024


Tests confirmed passing in Nvidia A100 and Intel Data Center GPU Max 1100

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Nov 11, 2024
@Alcpz Alcpz force-pushed the Alcpz/sycl-backend-build-fix branch from 06cb3c6 to f6ea8b7 Compare November 11, 2024 21:41
@Alcpz
Copy link
Collaborator Author

Alcpz commented Nov 11, 2024

@airMeng I undestand you were fixing the unsupported permuted MUL_MAT in #10041, but since there is some issues with the SYCL CI and it seems that it could take longer, can we merge this?

@Alcpz Alcpz requested a review from airMeng November 11, 2024 21:54
@airMeng
Copy link
Collaborator

airMeng commented Nov 12, 2024

could you cherry-pick the norm related cases from #10041 too? It will only crash with debug building

@Alcpz
Copy link
Collaborator Author

Alcpz commented Nov 12, 2024

Added the changes

@Alcpz Alcpz requested a review from NeoZhangJianyu November 12, 2024 10:53
Copy link
Collaborator

@Rbiessy Rbiessy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The oneMKL changes look good to me.

@Alcpz Alcpz merged commit 2e82ffa into ggml-org:master Nov 13, 2024
53 checks passed
@easyfab
Copy link

easyfab commented Nov 14, 2024

these commits negatively affect intel gpus. Is this expected ?

For example :
Before :

ggml_sycl_init: GGML_SYCL_FORCE_MMQ:   no
ggml_sycl_init: SYCL_USE_XMX: yes
ggml_sycl_init: found 1 SYCL devices:
| model                          |       size |     params | backend    | ngl |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | -------------------: |
[SYCL] call ggml_check_sycl
ggml_check_sycl: GGML_SYCL_DEBUG: 0
ggml_check_sycl: GGML_SYCL_F16: no
found 1 SYCL devices:
|  |                   |                                       |       |Max    |        |Max  |Global |                     |
|  |                   |                                       |       |compute|Max work|sub  |mem    |                     |
|ID|        Device Type|                                   Name|Version|units  |group   |group|size   |       Driver version|
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
| 0| [level_zero:gpu:0]|                 Intel Iris Xe Graphics|    1.6|     96|     512|   32| 31604M|            1.3.31441|
| qwen2 1.5B Q5_K - Medium       |   1.22 GiB |     1.78 B | SYCL       |  99 |         pp512 |        358.62 ± 8.26 |
| qwen2 1.5B Q5_K - Medium       |   1.22 GiB |     1.78 B | SYCL       |  99 |         tg128 |         13.10 ± 0.34 |

build: 80dd7ff2 (4068)

After:

ggml_sycl_init: GGML_SYCL_FORCE_MMQ:   no
ggml_sycl_init: SYCL_USE_XMX: yes
ggml_sycl_init: found 1 SYCL devices:
| model                          |       size |     params | backend    | ngl |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | -------------------: |
[SYCL] call ggml_check_sycl
ggml_check_sycl: GGML_SYCL_DEBUG: 0
ggml_check_sycl: GGML_SYCL_F16: no
found 1 SYCL devices:
|  |                   |                                       |       |Max    |        |Max  |Global |                     |
|  |                   |                                       |       |compute|Max work|sub  |mem    |                     |
|ID|        Device Type|                                   Name|Version|units  |group   |group|size   |       Driver version|
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
| 0| [level_zero:gpu:0]|                 Intel Iris Xe Graphics|    1.6|     96|     512|   32| 31604M|            1.3.31441|
| qwen2 1.5B Q5_K - Medium       |   1.22 GiB |     1.78 B | SYCL       |  99 |         pp512 |       276.80 ± 13.68 |
| qwen2 1.5B Q5_K - Medium       |   1.22 GiB |     1.78 B | SYCL       |  99 |         tg128 |         10.64 ± 0.25 |

build: 2e82ffa4 (4069)

Reverting over master and performance returns

@Alcpz
Copy link
Collaborator Author

Alcpz commented Nov 15, 2024

I'm currently investigating another performance regression in the SYCL backend, though this seems a separate issue. Reverting the change would break the build for non-intel backends, but we also want to avoid this performance loss. Will look into it.

Edit: Sorry for the inconvenience

@Alcpz
Copy link
Collaborator Author

Alcpz commented Nov 15, 2024

The issues is caused by the mul_mats marked as unsupported. I've found out a new issue for a specific test case:

MUL_MAT(type_a=f16,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): [MUL_MAT] NMSE = 2.227304559 > 0.000500000 FAIL

The following is a temporary patch that fixes the regression and makes test-backend-ops not crash. I'm still investigating the test-case from above.

diff --git a/ggml/src/ggml-sycl/ggml-sycl.cpp b/ggml/src/ggml-sycl/ggml-sycl.cpp
index 2dba15d2..72a94a50 100644
--- a/ggml/src/ggml-sycl/ggml-sycl.cpp
+++ b/ggml/src/ggml-sycl/ggml-sycl.cpp
@@ -4350,9 +4350,10 @@ static bool ggml_backend_sycl_device_supports_op(ggml_backend_dev_t dev, const g
                 if (op->op == GGML_OP_MUL_MAT) {
                     a = op->src[0];
                     b = op->src[1];
-                    if (ggml_is_permuted(a) || ggml_is_permuted(b)) {
+                    if (ggml_is_permuted(a)) {
                         // TODO: fix like https://github.com/ggerganov/llama.cpp/pull/10021
-                        return false;
+                        if (a->nb[0] <= a->nb[1] && a->nb[3] <= a->nb[2]) return false; // 0,1,3,2 Unsupported
+                        if (b->type != GGML_TYPE_F32) return false;
                     }
                 } else {
                     a = op->src[2];

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024
* Fixes broken build for the SYCL CUDA backend caused by non-explicit gemm call in outprod (merged in with RWKV6 in
Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration ggml-org#10133)

* Marks permuted MUL_MAT as unsupported to be able to run test-backend-ops

* Fixes asserts in norm to fix debug builds.
@NeoZhangJianyu
Copy link
Collaborator

@Alcpz
This PR lead to reduce 50% performance on Intel GPU.
I don't know the status for other GPUs.
I want to revert this PR.

How do you think?

Thank you!

@Alcpz
Copy link
Collaborator Author

Alcpz commented Nov 18, 2024

Yes, let's revert until we get a proper fix for the test-backend-ops.

Edit: I can't automatically do it. I will submit a new PR reverting just the problematic changes. CI is gonna fail for SYCL though, so we may need to have a convesation there.

@Rbiessy Rbiessy mentioned this pull request Jan 3, 2025
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants