Skip to content

ggml-ci: add run.sh #2877

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 14, 2025
Merged

ggml-ci: add run.sh #2877

merged 1 commit into from
Mar 14, 2025

Conversation

redraskal
Copy link
Collaborator

Closes #2787

I created a new CI script (ci/run.sh) modeled after the one in llama.cpp.

  • Builds debug/release builds and runs ctest (currently no tests are found since they're commented out)
  • Runs benchmark with output like scripts/bench-all.sh
  • Can specify models to use with GGML_TEST_MODELS env variable (GGML_TEST_MODELS="tiny,base,...") as a comma-separated list, otherwise all models are used
  • gg_run can take additional arguments to reduce redundant logic (debug/release ctest logic is mostly the same, for example)

Added GG_BUILD_LOW_PERF env var to limit models to "tiny", "base", and "small" for faster CI on low-perf systems (maybe needs adjustment).

What else should be added? Running quantize or verifying binding generation?

The script downloads required models if they don't already exist, storing them in $MNT/models/.

Example output:

GGML_TEST_MODELS="tiny,base" ./ci/run.sh ./tmp/results ./tmp/mnt

/tmp/results/README.md:

### ctest_debug

Runs ctest in debug mode
- status: 0
```
+ ctest --output-on-failure -L main -E test-opt
Test project /mnt/e/code/whisper.cpp/build-ci-debug
No tests were found!!!

real	0m0.031s
user	0m0.009s
sys	0m0.000s
```
### ctest_release

Runs ctest in release mode
- status: 0
```
+ ctest --output-on-failure -L main -E test-opt
Test project /mnt/e/code/whisper.cpp/build-ci-release
No tests were found!!!

real	0m0.031s
user	0m0.009s
sys	0m0.000s
```
### bench

Whisper Benchmark Results
- status: 0
#### memcpy Benchmark

```
memcpy:   10.68 GB/s (heat-up)
memcpy:   10.57 GB/s ( 1 thread)
memcpy:   10.94 GB/s ( 1 thread)
memcpy:   10.71 GB/s ( 2 thread)
memcpy:   10.91 GB/s ( 3 thread)
memcpy:   10.75 GB/s ( 4 thread)
sum:    -3071998456.000000
```

#### ggml_mul_mat Benchmark

```
  64 x   64: Q4_0    25.3 GFLOPS (128 runs) | Q4_1    24.9 GFLOPS (128 runs)
  64 x   64: Q5_0    24.8 GFLOPS (128 runs) | Q5_1    23.3 GFLOPS (128 runs) | Q8_0    26.9 GFLOPS (128 runs)
  64 x   64: F16     24.5 GFLOPS (128 runs) | F32     10.2 GFLOPS (128 runs)
 128 x  128: Q4_0    34.6 GFLOPS (128 runs) | Q4_1    31.4 GFLOPS (128 runs)
 128 x  128: Q5_0    30.8 GFLOPS (128 runs) | Q5_1    29.3 GFLOPS (128 runs) | Q8_0    47.2 GFLOPS (128 runs)
 128 x  128: F16     40.2 GFLOPS (128 runs) | F32     26.1 GFLOPS (128 runs)
 256 x  256: Q4_0    66.7 GFLOPS (128 runs) | Q4_1    55.9 GFLOPS (128 runs)
 256 x  256: Q5_0    55.9 GFLOPS (128 runs) | Q5_1    49.3 GFLOPS (128 runs) | Q8_0    68.7 GFLOPS (128 runs)
 256 x  256: F16     81.3 GFLOPS (128 runs) | F32     45.1 GFLOPS (128 runs)
 512 x  512: Q4_0    89.2 GFLOPS (128 runs) | Q4_1    85.6 GFLOPS (128 runs)
 512 x  512: Q5_0    79.7 GFLOPS (128 runs) | Q5_1    61.6 GFLOPS (128 runs) | Q8_0   110.5 GFLOPS (128 runs)
 512 x  512: F16     85.9 GFLOPS (128 runs) | F32     47.0 GFLOPS (128 runs)
1024 x 1024: Q4_0   104.1 GFLOPS ( 49 runs) | Q4_1    87.8 GFLOPS ( 41 runs)
1024 x 1024: Q5_0    86.3 GFLOPS ( 41 runs) | Q5_1    87.3 GFLOPS ( 41 runs) | Q8_0   127.5 GFLOPS ( 60 runs)
1024 x 1024: F16    105.2 GFLOPS ( 50 runs) | F32     50.0 GFLOPS ( 24 runs)
2048 x 2048: Q4_0   103.7 GFLOPS (  7 runs) | Q4_1   113.3 GFLOPS (  7 runs)
2048 x 2048: Q5_0    93.9 GFLOPS (  6 runs) | Q5_1    90.2 GFLOPS (  6 runs) | Q8_0   143.7 GFLOPS (  9 runs)
2048 x 2048: F16     96.5 GFLOPS (  6 runs) | F32     47.5 GFLOPS (  3 runs)
4096 x 4096: Q4_0   112.3 GFLOPS (  3 runs) | Q4_1    95.4 GFLOPS (  3 runs)
4096 x 4096: Q5_0    91.5 GFLOPS (  3 runs) | Q5_1    88.9 GFLOPS (  3 runs) | Q8_0   137.2 GFLOPS (  3 runs)
4096 x 4096: F16     96.2 GFLOPS (  3 runs) | F32     44.8 GFLOPS (  3 runs)
```

#### Model Benchmarks

|           Config |         Model |  Th |  FA |    Enc. |    Dec. |    Bch5 |      PP |  Commit |
|              --- |           --- | --- | --- |     --- |     --- |     --- |     --- |     --- |
|             AVX2 |          tiny |   4 |   0 |  782.14 |    3.06 |    1.27 |    0.96 | b39aebf |
|             AVX2 |          base |   4 |   0 | 1494.50 |    5.78 |    2.20 |    1.72 | b39aebf |


@ggerganov
Copy link
Member

Excellent! Let me add whisper.cpp to the ggml-ci nodes and will merge this to see how it goes.

@redraskal Sending you a collaborator invite which would allow you to push branches in this repo in order to refine the ggml-ci. I think the next steps would be to add some sort of accuracy tests (see #2454) in order to keep track for any potential regressions in the quality.

@ggerganov ggerganov merged commit f11de0e into ggml-org:master Mar 14, 2025
@ggerganov
Copy link
Member

This is the change in the ggml-ci repo to start monitoring whisper.cpp:

ggml-org/ci@8a1d8d5

The workflows are now running on the master branch:

image

Here is one of the runs:

https://github.com/ggml-org/ci/tree/results/whisper.cpp/f1/1de0e73c661efe4799e090be7caedbd9e193f1/ggml-100-mac-m4

@redraskal
Copy link
Collaborator Author

Cool, I'll take a look at #2454


CMAKE_EXTRA="-DWHISPER_FATAL_WARNINGS=ON"

if [ ! -z ${GGML_CUDA} ]; then
Copy link
Member

@ggerganov ggerganov Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These checks should check for the GG_BUILD_... environment variables (see the llama.cpp script).

For example, this is the environment on the CUDA node:

image

https://github.com/ggml-org/ci/tree/results/whisper.cpp/f1/1de0e73c661efe4799e090be7caedbd9e193f1/ggml-4-x86-cuda-v100#environment

So we have to check for GG_BUILD_CUDA here instead of GGML_CUDA.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I see what you mean

buxuku pushed a commit to buxuku/whisper.cpp that referenced this pull request Mar 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ggml-ci : add workflows
2 participants