Skip to content

ggml : use atomic_flag for critical section #7598

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 29, 2024
Merged

Conversation

slaren
Copy link
Member

@slaren slaren commented May 28, 2024

Should fix a possible deadlock in quantize.

Fixes #7597

@fairydreaming Can you check if this fixes the issue in your system?

@fairydreaming
Copy link
Collaborator

Should fix a possible deadlock in quantize.

Fixes #7597

@fairydreaming Can you check if this fixes the issue in your system?

Ok, I will test this tomorrow and let you know.

@bartowski1182
Copy link
Contributor

I was having actually an almost identical issue just now so will see if it fixes it

@bartowski1182
Copy link
Contributor

@slaren this worked flawlessly for me

I was having an issue that was 100% repeatable and with this change it went right through where it used to get stuck

Thank you @fairydreaming for reporting and you for fixing!

@kunnis kunnis mentioned this pull request May 28, 2024
@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label May 28, 2024
Copy link
Contributor

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 559 iterations 🚀

Expand details for performance related PR only
  • Concurrent users: 8, duration: 10m
  • HTTP request : avg=8383.76ms p(95)=19400.65ms fails=, finish reason: stop=509 truncated=50
  • Prompt processing (pp): avg=94.2tk/s p(95)=430.61tk/s
  • Token generation (tg): avg=73.23tk/s p(95)=47.97tk/s
  • ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=sl/fix-ggml-crit-sect commit=19db321a61f90af094cd8284b70c7774a4749d00

prompt_tokens_seconds

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 559 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1716935250 --> 1716935880
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 653.74, 653.74, 653.74, 653.74, 653.74, 769.12, 769.12, 769.12, 769.12, 769.12, 792.9, 792.9, 792.9, 792.9, 792.9, 851.03, 851.03, 851.03, 851.03, 851.03, 869.3, 869.3, 869.3, 869.3, 869.3, 860.74, 860.74, 860.74, 860.74, 860.74, 870.58, 870.58, 870.58, 870.58, 870.58, 884.28, 884.28, 884.28, 884.28, 884.28, 875.78, 875.78, 875.78, 875.78, 875.78, 881.92, 881.92, 881.92, 881.92, 881.92, 906.14, 906.14, 906.14, 906.14, 906.14, 881.71, 881.71, 881.71, 881.71, 881.71, 890.5, 890.5, 890.5, 890.5, 890.5, 792.26, 792.26, 792.26, 792.26, 792.26, 795.65, 795.65, 795.65, 795.65, 795.65, 799.32, 799.32, 799.32, 799.32, 799.32, 800.2, 800.2, 800.2, 800.2, 800.2, 822.24, 822.24, 822.24, 822.24, 822.24, 820.98, 820.98, 820.98, 820.98, 820.98, 833.98, 833.98, 833.98, 833.98, 833.98, 834.11, 834.11, 834.11, 834.11, 834.11, 835.16, 835.16, 835.16, 835.16, 835.16, 855.82, 855.82, 855.82, 855.82, 855.82, 855.95, 855.95, 855.95, 855.95, 855.95, 855.39, 855.39, 855.39, 855.39, 855.39, 865.76, 865.76, 865.76, 865.76, 865.76, 865.36, 865.36, 865.36, 865.36, 865.36, 863.51, 863.51, 863.51, 863.51, 863.51, 868.98, 868.98, 868.98, 868.98, 868.98, 867.7, 867.7, 867.7, 867.7, 867.7, 867.56, 867.56, 867.56, 867.56, 867.56, 866.6, 866.6, 866.6, 866.6, 866.6, 874.26, 874.26, 874.26, 874.26, 874.26, 859.08, 859.08, 859.08, 859.08, 859.08, 834.4, 834.4, 834.4, 834.4, 834.4, 832.35, 832.35, 832.35, 832.35, 832.35, 830.86, 830.86, 830.86, 830.86, 830.86, 829.25, 829.25, 829.25, 829.25, 829.25, 831.44, 831.44, 831.44, 831.44, 831.44, 840.94, 840.94, 840.94, 840.94, 840.94, 848.02, 848.02, 848.02, 848.02, 848.02, 850.43, 850.43, 850.43, 850.43, 850.43, 850.43, 850.43, 850.43, 850.43, 850.43, 846.46, 846.46, 846.46, 846.46, 846.46, 841.28, 841.28, 841.28, 841.28, 841.28, 846.51, 846.51, 846.51, 846.51, 846.51, 849.13, 849.13, 849.13, 849.13, 849.13, 850.99, 850.99, 850.99, 850.99, 850.99, 852.98, 852.98, 852.98, 852.98, 852.98, 856.44, 856.44, 856.44, 856.44, 856.44, 839.5, 839.5, 839.5, 839.5, 839.5, 838.41, 838.41, 838.41, 838.41, 838.41, 846.04, 846.04, 846.04, 846.04, 846.04, 844.74, 844.74, 844.74, 844.74, 844.74, 845.99, 845.99, 845.99, 845.99, 845.99, 846.25, 846.25, 846.25, 846.25, 846.25, 847.36, 847.36, 847.36, 847.36, 847.36, 849.38, 849.38, 849.38, 849.38, 849.38, 852.39, 852.39, 852.39, 852.39, 852.39, 852.5, 852.5, 852.5, 852.5, 852.5, 851.77]
                    
Loading
predicted_tokens_seconds
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 559 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1716935250 --> 1716935880
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 41.15, 41.15, 41.15, 41.15, 41.15, 29.31, 29.31, 29.31, 29.31, 29.31, 30.57, 30.57, 30.57, 30.57, 30.57, 33.32, 33.32, 33.32, 33.32, 33.32, 33.46, 33.46, 33.46, 33.46, 33.46, 34.11, 34.11, 34.11, 34.11, 34.11, 35.06, 35.06, 35.06, 35.06, 35.06, 35.21, 35.21, 35.21, 35.21, 35.21, 35.09, 35.09, 35.09, 35.09, 35.09, 34.55, 34.55, 34.55, 34.55, 34.55, 34.72, 34.72, 34.72, 34.72, 34.72, 34.98, 34.98, 34.98, 34.98, 34.98, 34.3, 34.3, 34.3, 34.3, 34.3, 33.85, 33.85, 33.85, 33.85, 33.85, 32.67, 32.67, 32.67, 32.67, 32.67, 31.26, 31.26, 31.26, 31.26, 31.26, 31.09, 31.09, 31.09, 31.09, 31.09, 31.36, 31.36, 31.36, 31.36, 31.36, 31.04, 31.04, 31.04, 31.04, 31.04, 31.18, 31.18, 31.18, 31.18, 31.18, 31.24, 31.24, 31.24, 31.24, 31.24, 31.27, 31.27, 31.27, 31.27, 31.27, 30.89, 30.89, 30.89, 30.89, 30.89, 30.96, 30.96, 30.96, 30.96, 30.96, 31.12, 31.12, 31.12, 31.12, 31.12, 31.26, 31.26, 31.26, 31.26, 31.26, 31.25, 31.25, 31.25, 31.25, 31.25, 31.54, 31.54, 31.54, 31.54, 31.54, 31.77, 31.77, 31.77, 31.77, 31.77, 31.89, 31.89, 31.89, 31.89, 31.89, 32.0, 32.0, 32.0, 32.0, 32.0, 32.09, 32.09, 32.09, 32.09, 32.09, 32.12, 32.12, 32.12, 32.12, 32.12, 32.05, 32.05, 32.05, 32.05, 32.05, 31.96, 31.96, 31.96, 31.96, 31.96, 31.95, 31.95, 31.95, 31.95, 31.95, 31.91, 31.91, 31.91, 31.91, 31.91, 32.0, 32.0, 32.0, 32.0, 32.0, 32.13, 32.13, 32.13, 32.13, 32.13, 32.16, 32.16, 32.16, 32.16, 32.16, 31.99, 31.99, 31.99, 31.99, 31.99, 31.72, 31.72, 31.72, 31.72, 31.72, 31.43, 31.43, 31.43, 31.43, 31.43, 30.47, 30.47, 30.47, 30.47, 30.47, 29.46, 29.46, 29.46, 29.46, 29.46, 29.48, 29.48, 29.48, 29.48, 29.48, 29.55, 29.55, 29.55, 29.55, 29.55, 29.65, 29.65, 29.65, 29.65, 29.65, 29.67, 29.67, 29.67, 29.67, 29.67, 29.74, 29.74, 29.74, 29.74, 29.74, 29.78, 29.78, 29.78, 29.78, 29.78, 29.66, 29.66, 29.66, 29.66, 29.66, 29.59, 29.59, 29.59, 29.59, 29.59, 29.47, 29.47, 29.47, 29.47, 29.47, 29.52, 29.52, 29.52, 29.52, 29.52, 29.67, 29.67, 29.67, 29.67, 29.67, 29.82, 29.82, 29.82, 29.82, 29.82, 29.98, 29.98, 29.98, 29.98, 29.98, 30.02, 30.02, 30.02, 30.02, 30.02, 30.07, 30.07, 30.07, 30.07, 30.07, 30.11]
                    
Loading

Details

kv_cache_usage_ratio

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 559 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1716935250 --> 1716935880
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.27, 0.27, 0.27, 0.27, 0.27, 0.19, 0.19, 0.19, 0.19, 0.19, 0.12, 0.12, 0.12, 0.12, 0.12, 0.19, 0.19, 0.19, 0.19, 0.19, 0.17, 0.17, 0.17, 0.17, 0.17, 0.08, 0.08, 0.08, 0.08, 0.08, 0.15, 0.15, 0.15, 0.15, 0.15, 0.17, 0.17, 0.17, 0.17, 0.17, 0.16, 0.16, 0.16, 0.16, 0.16, 0.12, 0.12, 0.12, 0.12, 0.12, 0.1, 0.1, 0.1, 0.1, 0.1, 0.31, 0.31, 0.31, 0.31, 0.31, 0.24, 0.24, 0.24, 0.24, 0.24, 0.43, 0.43, 0.43, 0.43, 0.43, 0.28, 0.28, 0.28, 0.28, 0.28, 0.29, 0.29, 0.29, 0.29, 0.29, 0.16, 0.16, 0.16, 0.16, 0.16, 0.3, 0.3, 0.3, 0.3, 0.3, 0.12, 0.12, 0.12, 0.12, 0.12, 0.17, 0.17, 0.17, 0.17, 0.17, 0.25, 0.25, 0.25, 0.25, 0.25, 0.16, 0.16, 0.16, 0.16, 0.16, 0.17, 0.17, 0.17, 0.17, 0.17, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.19, 0.19, 0.19, 0.19, 0.19, 0.16, 0.16, 0.16, 0.16, 0.16, 0.11, 0.11, 0.11, 0.11, 0.11, 0.15, 0.15, 0.15, 0.15, 0.15, 0.17, 0.17, 0.17, 0.17, 0.17, 0.18, 0.18, 0.18, 0.18, 0.18, 0.15, 0.15, 0.15, 0.15, 0.15, 0.25, 0.25, 0.25, 0.25, 0.25, 0.18, 0.18, 0.18, 0.18, 0.18, 0.21, 0.21, 0.21, 0.21, 0.21, 0.24, 0.24, 0.24, 0.24, 0.24, 0.08, 0.08, 0.08, 0.08, 0.08, 0.14, 0.14, 0.14, 0.14, 0.14, 0.15, 0.15, 0.15, 0.15, 0.15, 0.22, 0.22, 0.22, 0.22, 0.22, 0.48, 0.48, 0.48, 0.48, 0.48, 0.6, 0.6, 0.6, 0.6, 0.6, 0.69, 0.69, 0.69, 0.69, 0.69, 0.49, 0.49, 0.49, 0.49, 0.49, 0.1, 0.1, 0.1, 0.1, 0.1, 0.13, 0.13, 0.13, 0.13, 0.13, 0.18, 0.18, 0.18, 0.18, 0.18, 0.23, 0.23, 0.23, 0.23, 0.23, 0.18, 0.18, 0.18, 0.18, 0.18, 0.19, 0.19, 0.19, 0.19, 0.19, 0.32, 0.32, 0.32, 0.32, 0.32, 0.29, 0.29, 0.29, 0.29, 0.29, 0.26, 0.26, 0.26, 0.26, 0.26, 0.16, 0.16, 0.16, 0.16, 0.16, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.08, 0.08, 0.08, 0.08, 0.08, 0.1, 0.1, 0.1, 0.1, 0.1, 0.19, 0.19, 0.19, 0.19, 0.19, 0.23, 0.23, 0.23, 0.23, 0.23, 0.17]
                    
Loading
requests_processing
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 559 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1716935250 --> 1716935880
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 2.0, 2.0, 2.0, 2.0, 2.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 1.0, 1.0, 1.0, 1.0, 1.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 2.0, 2.0, 2.0, 2.0, 2.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 2.0]
                    
Loading

@mofosyne mofosyne added the Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix label May 29, 2024
@fairydreaming
Copy link
Collaborator

@fairydreaming Can you check if this fixes the issue in your system?

@slaren I ran the modified quantize several times and it never hanged. Looks like the problem is fixed now.

@slaren slaren merged commit 87bdf2a into master May 29, 2024
72 of 73 checks passed
@slaren slaren deleted the sl/fix-ggml-crit-sect branch May 29, 2024 11:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: quantize command randomly hangs on CPUs with lots of cores
5 participants