memory leak when use the server mode #2605

wjzsuperman · 2024-12-04T08:50:10Z

Hello, when I use the server mode, I found that there is a memory leak problem.
Through monitoring, I found that after each server call, the container memory cannot return to the value before the call,
and my audio files do not exceed 30M, but each leak is about 350M.
However, when I use the main mode to call, the memory can return to the normal level.
The memory situation can be seen in the following figure, the previous one is the server mode, and the next one is the main mode.

I am not a C++ developer, and I used valgrind to analyze the memory, please help me check it.the situation is as follows,
^C==354==
==354== Process terminating with default action of signal 2 (SIGINT)
==354== at 0x4B9C3CA: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==354== by 0x15D8CF: ggml_graph_compute_thread (in /asr/bin/server)
==354== by 0x4B9986D: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==354== by 0x4BE4608: start_thread (pthread_create.c:477)
==354== by 0x4D20352: clone (clone.S:95)
==354==
==354== HEAP SUMMARY:
==354== in use at exit: 2,166,601,182 bytes in 108,817 blocks
==354== total heap usage: 127,213 allocs, 18,396 frees, 2,315,489,188 bytes allocated
==354==
==354== 304 bytes in 1 blocks are possibly lost in loss record 160 of 507
==354== at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==354== by 0x40149DA: allocate_dtv (dl-tls.c:286)
==354== by 0x40149DA: _dl_allocate_tls (dl-tls.c:532)
==354== by 0x4BE5322: allocate_stack (allocatestack.c:622)
==354== by 0x4BE5322: pthread_create@@GLIBC_2.2.5 (pthread_create.c:660)
==354== by 0x49250C9: std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_deletestd::thread::_State >, void ()()) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==354== by 0x1D2E8A: whisper_full_parallel (in /asr/bin/server)
==354== by 0x21A699: main::{lambda(httplib::Request const&, httplib::Response&)#3}::operator()(httplib::Request const&, httplib::Response&) const (in /asr/bin/server)
==354== by 0x223D78: httplib::Server::dispatch_request(httplib::Request&, httplib::Response&, std::vector<std::pair<std::unique_ptr<httplib::detail::MatcherBase, std::default_deletehttplib::detail::MatcherBase >, std::function<void (httplib::Request const&, httplib::Response&)> >, std::allocator<std::pair<std::unique_ptr<httplib::detail::MatcherBase, std::default_deletehttplib::detail::MatcherBase >, std::function<void (httplib::Request const&, httplib::Response&)> > > > const&) (in /asr/bin/server)
==354== by 0x24400B: httplib::Server::routing(httplib::Request&, httplib::Response&, httplib::Stream&) (in /asr/bin/server)
==354== by 0x244FE9: httplib::Server::process_request(httplib::Stream&, bool, bool&, std::function<void (httplib::Request&)> const&) (in /asr/bin/server)
==354== by 0x245B87: httplib::Server::process_and_close_socket(int) (in /asr/bin/server)
==354== by 0x21F9FC: std::thread::_State_impl<std::thread::_Invoker<std::tuplehttplib::ThreadPool::worker > >::_M_run() (in /asr/bin/server)
==354== by 0x4924DF3: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==354==
==354== 912 bytes in 3 blocks are possibly lost in loss record 263 of 507
==354== at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==354== by 0x40149DA: allocate_dtv (dl-tls.c:286)
==354== by 0x40149DA: _dl_allocate_tls (dl-tls.c:532)
==354== by 0x4BE5322: allocate_stack (allocatestack.c:622)
==354== by 0x4BE5322: pthread_create@@GLIBC_2.2.5 (pthread_create.c:660)
==354== by 0x4B99ECA: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==354== by 0x4B918E0: GOMP_parallel (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==354== by 0x161613: ggml_graph_compute (in /asr/bin/server)
==354== by 0x16E456: ggml_backend_cpu_graph_compute(ggml_backend, ggml_cgraph*) (in /asr/bin/server)
==354== by 0x1738BA: ggml_backend_sched_graph_compute_async (in /asr/bin/server)
==354== by 0x1739B2: ggml_backend_sched_graph_compute (in /asr/bin/server)
==354== by 0x1B841C: whisper_encode_internal(whisper_context&, whisper_state&, int, int, bool ()(void), void*) (in /asr/bin/server)
==354== by 0x1B85E2: whisper_encode_with_state (in /asr/bin/server)
==354== by 0x1BCCD5: whisper_lang_auto_detect_with_state (in /asr/bin/server)
==354==
==354== 912 bytes in 3 blocks are possibly lost in loss record 264 of 507
==354== at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==354== by 0x40149DA: allocate_dtv (dl-tls.c:286)
==354== by 0x40149DA: _dl_allocate_tls (dl-tls.c:532)
==354== by 0x4BE5322: allocate_stack (allocatestack.c:622)
==354== by 0x4BE5322: pthread_create@@GLIBC_2.2.5 (pthread_create.c:660)
==354== by 0x49250C9: std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_deletestd::thread::_State >, void ()()) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==354== by 0x1B93B7: log_mel_spectrogram(whisper_state&, float const, int, int, int, int, int, int, whisper_filters const&, bool, whisper_mel&) [clone .constprop.0] (in /asr/bin/server)
==354== by 0x1BA71D: whisper_pcm_to_mel_with_state (in /asr/bin/server)
==354== by 0x1CDB3B: whisper_full_with_state (in /asr/bin/server)
==354== by 0x1D2FA4: whisper_full_parallel (in /asr/bin/server)
==354== by 0x21A699: main::{lambda(httplib::Request const&, httplib::Response&)#3}::operator()(httplib::Request const&, httplib::Response&) const (in /asr/bin/server)
==354== by 0x223D78: httplib::Server::dispatch_request(httplib::Request&, httplib::Response&, std::vector<std::pair<std::unique_ptr<httplib::detail::MatcherBase, std::default_deletehttplib::detail::MatcherBase >, std::function<void (httplib::Request const&, httplib::Response&)> >, std::allocator<std::pair<std::unique_ptr<httplib::detail::MatcherBase, std::default_deletehttplib::detail::MatcherBase >, std::function<void (httplib::Request const&, httplib::Response&)> > > > const&) (in /asr/bin/server)
==354== by 0x24400B: httplib::Server::routing(httplib::Request&, httplib::Response&, httplib::Stream&) (in /asr/bin/server)
==354== by 0x244FE9: httplib::Server::process_request(httplib::Stream&, bool, bool&, std::function<void (httplib::Request&)> const&) (in /asr/bin/server)
==354==
==354== 2,128 bytes in 7 blocks are possibly lost in loss record 292 of 507
==354== at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==354== by 0x40149DA: allocate_dtv (dl-tls.c:286)
==354== by 0x40149DA: _dl_allocate_tls (dl-tls.c:532)
==354== by 0x4BE5322: allocate_stack (allocatestack.c:622)
==354== by 0x4BE5322: pthread_create@@GLIBC_2.2.5 (pthread_create.c:660)
==354== by 0x49250C9: std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_deletestd::thread::_State >, void ()()) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==354== by 0x2265F0: void std::vector<std::thread, std::allocatorstd::thread >::_M_realloc_inserthttplib::ThreadPool::worker(__gnu_cxx::__normal_iterator<std::thread, std::vector<std::thread, std::allocatorstd::thread > >, httplib::ThreadPool::worker&&) (in /asr/bin/server)
==354== by 0x2268A9: std::_Function_handler<httplib::TaskQueue* (), httplib::Server::Server()::{lambda()#1}>::_M_invoke(std::_Any_data const&) (in /asr/bin/server)
==354== by 0x223785: httplib::Server::listen_internal() (in /asr/bin/server)
==354== by 0x1221E9: main (in /asr/bin/server)
==354==
==354== 2,432 bytes in 8 blocks are possibly lost in loss record 297 of 507
==354== at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==354== by 0x40149DA: allocate_dtv (dl-tls.c:286)
==354== by 0x40149DA: _dl_allocate_tls (dl-tls.c:532)
==354== by 0x4BE5322: allocate_stack (allocatestack.c:622)
==354== by 0x4BE5322: pthread_create@@GLIBC_2.2.5 (pthread_create.c:660)
==354== by 0x49250C9: std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_deletestd::thread::_State >, void ()()) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==354== by 0x1B93B7: log_mel_spectrogram(whisper_state&, float const, int, int, int, int, int, int, whisper_filters const&, bool, whisper_mel&) [clone .constprop.0] (in /asr/bin/server)
==354== by 0x1BA71D: whisper_pcm_to_mel_with_state (in /asr/bin/server)
==354== by 0x1CDB3B: whisper_full_with_state (in /asr/bin/server)
==354== by 0x1D3B41: std::thread::_State_impl<std::thread::_Invoker<std::tuple<int ()(whisper_context, whisper_state*, whisper_full_params, float const*, int), whisper_context*, whisper_state*, whisper_full_params, float const*, int> > >::_M_run() (in /asr/bin/server)
==354== by 0x4924DF3: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==354== by 0x4BE4608: start_thread (pthread_create.c:477)
==354== by 0x4D20352: clone (clone.S:95)
==354==
==354== 17,024 bytes in 56 blocks are possibly lost in loss record 361 of 507
==354== at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==354== by 0x40149DA: allocate_dtv (dl-tls.c:286)
==354== by 0x40149DA: _dl_allocate_tls (dl-tls.c:532)
==354== by 0x4BE5322: allocate_stack (allocatestack.c:622)
==354== by 0x4BE5322: pthread_create@@GLIBC_2.2.5 (pthread_create.c:660)
==354== by 0x49250C9: std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_deletestd::thread::_State >, void ()()) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==354== by 0x22686C: std::_Function_handler<httplib::TaskQueue (), httplib::Server::Server()::{lambda()#1}>::_M_invoke(std::_Any_data const&) (in /asr/bin/server)
==354== by 0x223785: httplib::Server::listen_internal() (in /asr/bin/server)
==354== by 0x1221E9: main (in /asr/bin/server)
==354==
==354== LEAK SUMMARY:
==354== definitely lost: 0 bytes in 0 blocks
==354== indirectly lost: 0 bytes in 0 blocks
==354== possibly lost: 23,712 bytes in 78 blocks
==354== still reachable: 2,166,577,470 bytes in 108,739 blocks
==354== suppressed: 0 bytes in 0 blocks
==354== Reachable blocks (those to which a pointer was found) are not shown.
==354== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==354==
==354== For lists of detected and suppressed errors, rerun with: -s
==354== ERROR SUMMARY: 6 errors from 6 contexts (suppressed: 0 from 0)

Also, i analyze the memory with main model, perhaps it is the same cause that triggered the problem.
valgrind --leak-check=full ./main -osrt -m /root/.cache/models/ggml-base.bin -f test.wav
==19064== Memcheck, a memory error detector
==19064== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==19064== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==19064== Command: ./main -osrt -m /root/.cache/models/ggml-base.bin -f test.wav
==19064==
whisper_init_from_file_with_params_no_state: loading model from '/root/.cache/models/ggml-base.bin'
whisper_init_with_params_no_state: use gpu = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw = 0
whisper_model_load: loading model
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 512
whisper_model_load: n_text_head = 8
whisper_model_load: n_text_layer = 6
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 2 (base)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs = 99
whisper_model_load: CPU total size = 147.37 MB
whisper_model_load: model size = 147.37 MB
whisper_init_state: kv self size = 6.29 MB
whisper_init_state: kv cross size = 18.87 MB
whisper_init_state: kv pad size = 3.15 MB
whisper_init_state: compute buffer (conv) = 16.26 MB
whisper_init_state: compute buffer (encode) = 85.86 MB
whisper_init_state: compute buffer (cross) = 4.65 MB
whisper_init_state: compute buffer (decode) = 96.35 MB

main: processing 'test.wav' (5587296 samples, 349.2 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...

^C==19064==
==19064== Process terminating with default action of signal 2 (SIGINT)
==19064== at 0x120AD0: ggml_vec_dot_f16 (in /asr/bin/main)
==19064== by 0x1346E2: ggml_compute_forward_mul_mat (in /asr/bin/main)
==19064== by 0x1592E1: ggml_graph_compute_thread (in /asr/bin/main)
==19064== by 0x4B918E5: GOMP_parallel (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==19064== by 0x15D053: ggml_graph_compute (in /asr/bin/main)
==19064== by 0x169E96: ggml_backend_cpu_graph_compute(ggml_backend*, ggml_cgraph*) (in /asr/bin/main)
==19064== by 0x16F2FA: ggml_backend_sched_graph_compute_async (in /asr/bin/main)
==19064== by 0x16F3F2: ggml_backend_sched_graph_compute (in /asr/bin/main)
==19064== by 0x1B3F31: whisper_encode_internal(whisper_context&, whisper_state&, int, int, bool ()(void), void*) (in /asr/bin/main)
==19064== by 0x1C9BA9: whisper_full_with_state (in /asr/bin/main)
==19064== by 0x1CF04A: whisper_full_parallel (in /asr/bin/main)
==19064== by 0x11C477: main (in /asr/bin/main)
==19064==
==19064== HEAP SUMMARY:
==19064== in use at exit: 675,274,316 bytes in 105,230 blocks
==19064== total heap usage: 108,955 allocs, 3,725 frees, 783,778,742 bytes allocated
==19064==
==19064== 912 bytes in 3 blocks are possibly lost in loss record 167 of 277
==19064== at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==19064== by 0x40149DA: allocate_dtv (dl-tls.c:286)
==19064== by 0x40149DA: _dl_allocate_tls (dl-tls.c:532)
==19064== by 0x4BE5322: allocate_stack (allocatestack.c:622)
==19064== by 0x4BE5322: pthread_create@@GLIBC_2.2.5 (pthread_create.c:660)
==19064== by 0x49250C9: std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_deletestd::thread::_State >, void ()()) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==19064== by 0x1B4DF7: log_mel_spectrogram(whisper_state&, float const, int, int, int, int, int, int, whisper_filters const&, bool, whisper_mel&) [clone .constprop.0] (in /asr/bin/main)
==19064== by 0x1B615D: whisper_pcm_to_mel_with_state (in /asr/bin/main)
==19064== by 0x1C957B: whisper_full_with_state (in /asr/bin/main)
==19064== by 0x1CF04A: whisper_full_parallel (in /asr/bin/main)
==19064== by 0x11C477: main (in /asr/bin/main)
==19064==
==19064== LEAK SUMMARY:
==19064== definitely lost: 0 bytes in 0 blocks
==19064== indirectly lost: 0 bytes in 0 blocks
==19064== possibly lost: 912 bytes in 3 blocks
==19064== still reachable: 675,273,404 bytes in 105,227 blocks
==19064== suppressed: 0 bytes in 0 blocks
==19064== Reachable blocks (those to which a pointer was found) are not shown.
==19064== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==19064==
==19064== For lists of detected and suppressed errors, rerun with: -s
==19064== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

davens · 2025-01-02T17:53:24Z

Hello.

TLDR: i don't use the server myself, but i have a suggestion after looking at the server code. Call the "load" model endpoint to see if that stops/reduces the leak. I've seen in the code for server, that the load endpoint calls whisper_free which should cleanup the context and state. You could call it everytime, or when the memory usage hits a certain point.

For devs to consider: I've noticed a couple of issues with whisper.cpp with memory

After inferencing, whisper retains context, and then leaks on subsequent inferences (but whisper_free cleans up)
whisper can't inference more than 30secs at a go. whisper.cpp must be stitching up 30 sec inferences. Unfortunately, it appears that it allocates for all the context required for the audio passed in (rather than reusing what is needed for 30 secs). It really blows up in memory. Compounding problem (1). I have got around this by only calling whisper.cpp for 30seconds at time, and then overlapping the audio window and using prompt tokens

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memory leak when use the server mode #2605

memory leak when use the server mode #2605

wjzsuperman commented Dec 4, 2024

davens commented Jan 2, 2025 •

edited

Loading

memory leak when use the server mode #2605

memory leak when use the server mode #2605

Comments

wjzsuperman commented Dec 4, 2024

davens commented Jan 2, 2025 • edited Loading

davens commented Jan 2, 2025 •

edited

Loading