sycl : Fixes broken build and test-backend-ops #10257

Alcpz · 2024-11-11T21:40:10Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

Fixes broken build for the SYCL CUDA backend caused by non-explicit gemm call in outprod (merged in with RWKV6 in Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration #10133)
Marks permuted MUL_MAT as unsupported to be able to run test-backend-ops
Fixes asserts in norm to fix debug builds.

Tests confirmed passing in Nvidia A100 and Intel Data Center GPU Max 1100

Alcpz · 2024-11-11T21:53:55Z

@airMeng I undestand you were fixing the unsupported permuted MUL_MAT in #10041, but since there is some issues with the SYCL CI and it seems that it could take longer, can we merge this?

airMeng · 2024-11-12T00:53:32Z

could you cherry-pick the norm related cases from #10041 too? It will only crash with debug building

Alcpz · 2024-11-12T09:38:09Z

Added the changes

Rbiessy

The oneMKL changes look good to me.

easyfab · 2024-11-14T17:02:47Z

these commits negatively affect intel gpus. Is this expected ?

For example :
Before :

ggml_sycl_init: GGML_SYCL_FORCE_MMQ:   no
ggml_sycl_init: SYCL_USE_XMX: yes
ggml_sycl_init: found 1 SYCL devices:
| model                          |       size |     params | backend    | ngl |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | -------------------: |
[SYCL] call ggml_check_sycl
ggml_check_sycl: GGML_SYCL_DEBUG: 0
ggml_check_sycl: GGML_SYCL_F16: no
found 1 SYCL devices:
|  |                   |                                       |       |Max    |        |Max  |Global |                     |
|  |                   |                                       |       |compute|Max work|sub  |mem    |                     |
|ID|        Device Type|                                   Name|Version|units  |group   |group|size   |       Driver version|
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
| 0| [level_zero:gpu:0]|                 Intel Iris Xe Graphics|    1.6|     96|     512|   32| 31604M|            1.3.31441|
| qwen2 1.5B Q5_K - Medium       |   1.22 GiB |     1.78 B | SYCL       |  99 |         pp512 |        358.62 ± 8.26 |
| qwen2 1.5B Q5_K - Medium       |   1.22 GiB |     1.78 B | SYCL       |  99 |         tg128 |         13.10 ± 0.34 |

build: 80dd7ff2 (4068)

After:

ggml_sycl_init: GGML_SYCL_FORCE_MMQ:   no
ggml_sycl_init: SYCL_USE_XMX: yes
ggml_sycl_init: found 1 SYCL devices:
| model                          |       size |     params | backend    | ngl |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | -------------------: |
[SYCL] call ggml_check_sycl
ggml_check_sycl: GGML_SYCL_DEBUG: 0
ggml_check_sycl: GGML_SYCL_F16: no
found 1 SYCL devices:
|  |                   |                                       |       |Max    |        |Max  |Global |                     |
|  |                   |                                       |       |compute|Max work|sub  |mem    |                     |
|ID|        Device Type|                                   Name|Version|units  |group   |group|size   |       Driver version|
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
| 0| [level_zero:gpu:0]|                 Intel Iris Xe Graphics|    1.6|     96|     512|   32| 31604M|            1.3.31441|
| qwen2 1.5B Q5_K - Medium       |   1.22 GiB |     1.78 B | SYCL       |  99 |         pp512 |       276.80 ± 13.68 |
| qwen2 1.5B Q5_K - Medium       |   1.22 GiB |     1.78 B | SYCL       |  99 |         tg128 |         10.64 ± 0.25 |

build: 2e82ffa4 (4069)

Reverting over master and performance returns

Alcpz · 2024-11-15T11:20:15Z

I'm currently investigating another performance regression in the SYCL backend, though this seems a separate issue. Reverting the change would break the build for non-intel backends, but we also want to avoid this performance loss. Will look into it.

Edit: Sorry for the inconvenience

Alcpz · 2024-11-15T13:57:47Z

The issues is caused by the mul_mats marked as unsupported. I've found out a new issue for a specific test case:

MUL_MAT(type_a=f16,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): [MUL_MAT] NMSE = 2.227304559 > 0.000500000 FAIL

The following is a temporary patch that fixes the regression and makes test-backend-ops not crash. I'm still investigating the test-case from above.

diff --git a/ggml/src/ggml-sycl/ggml-sycl.cpp b/ggml/src/ggml-sycl/ggml-sycl.cpp
index 2dba15d2..72a94a50 100644
--- a/ggml/src/ggml-sycl/ggml-sycl.cpp
+++ b/ggml/src/ggml-sycl/ggml-sycl.cpp
@@ -4350,9 +4350,10 @@ static bool ggml_backend_sycl_device_supports_op(ggml_backend_dev_t dev, const g
                 if (op->op == GGML_OP_MUL_MAT) {
                     a = op->src[0];
                     b = op->src[1];
-                    if (ggml_is_permuted(a) || ggml_is_permuted(b)) {
+                    if (ggml_is_permuted(a)) {
                         // TODO: fix like https://github.com/ggerganov/llama.cpp/pull/10021
-                        return false;
+                        if (a->nb[0] <= a->nb[1] && a->nb[3] <= a->nb[2]) return false; // 0,1,3,2 Unsupported
+                        if (b->type != GGML_TYPE_F32) return false;
                     }
                 } else {
                     a = op->src[2];

* Fixes broken build for the SYCL CUDA backend caused by non-explicit gemm call in outprod (merged in with RWKV6 in Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration ggml-org#10133) * Marks permuted MUL_MAT as unsupported to be able to run test-backend-ops * Fixes asserts in norm to fix debug builds.

NeoZhangJianyu · 2024-11-18T04:10:51Z

@Alcpz
This PR lead to reduce 50% performance on Intel GPU.
I don't know the status for other GPUs.
I want to revert this PR.

How do you think?

Thank you!

Alcpz · 2024-11-18T10:32:14Z

Yes, let's revert until we get a proper fix for the test-backend-ops.

Edit: I can't automatically do it. I will submit a new PR reverting just the problematic changes. CI is gonna fail for SYCL though, so we may need to have a convesation there.

sycl : Fixes RWKV6 broken build in the cuda backend

17b8a2e

github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Nov 11, 2024

sycl : marks permuted MUL_MAT as unsupported

f6ea8b7

Alcpz force-pushed the Alcpz/sycl-backend-build-fix branch from 06cb3c6 to f6ea8b7 Compare November 11, 2024 21:41

Alcpz requested a review from airMeng November 11, 2024 21:54

sycl : fix norm asserts in debug build

6a2c025

Alcpz requested a review from NeoZhangJianyu November 12, 2024 10:53

Rbiessy approved these changes Nov 12, 2024

View reviewed changes

airMeng approved these changes Nov 12, 2024

View reviewed changes

Alcpz merged commit 2e82ffa into ggml-org:master Nov 13, 2024
53 checks passed

Alcpz mentioned this pull request Nov 18, 2024

sycl: Revert MUL_MAT_OP support changes #10385

Merged

4 tasks

Rbiessy mentioned this pull request Jan 3, 2025

[SYCL] pass SYCL CI #10041

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sycl : Fixes broken build and test-backend-ops #10257

sycl : Fixes broken build and test-backend-ops #10257

Alcpz commented Nov 11, 2024 •

edited

Loading

Alcpz commented Nov 11, 2024

airMeng commented Nov 12, 2024

Alcpz commented Nov 12, 2024

Rbiessy left a comment

easyfab commented Nov 14, 2024

Alcpz commented Nov 15, 2024 •

edited

Loading

Alcpz commented Nov 15, 2024

NeoZhangJianyu commented Nov 18, 2024

Alcpz commented Nov 18, 2024 •

edited

Loading

sycl : Fixes broken build and test-backend-ops #10257

sycl : Fixes broken build and test-backend-ops #10257

Conversation

Alcpz commented Nov 11, 2024 • edited Loading

Alcpz commented Nov 11, 2024

airMeng commented Nov 12, 2024

Alcpz commented Nov 12, 2024

Rbiessy left a comment

Choose a reason for hiding this comment

easyfab commented Nov 14, 2024

Alcpz commented Nov 15, 2024 • edited Loading

Alcpz commented Nov 15, 2024

NeoZhangJianyu commented Nov 18, 2024

Alcpz commented Nov 18, 2024 • edited Loading

Alcpz commented Nov 11, 2024 •

edited

Loading

Alcpz commented Nov 15, 2024 •

edited

Loading

Alcpz commented Nov 18, 2024 •

edited

Loading