new WER script #2824

harvestingmoon · 2025-02-17T13:28:17Z

WER testing based off speaker 6097 of the HiTTS Dataset. Audio carried is ~ 10mb and contains dozens of short 10 second audio. WER_Scripting.py would then calculate the WER via DP.

foldl · 2025-02-17T15:32:00Z

Are those Java class files are redundant?

harvestingmoon · 2025-02-17T16:03:26Z

Ah yes I think those gradle files are redundant, pls ignore them

Only use the files that are in wer_testing

foldl · 2025-02-18T02:17:11Z

test.py

This file is not used.

Fixed under new script

wer_testing/Readme.md

wer_testing/WER_Script.py

harvestingmoon · 2025-02-18T13:16:10Z

have also added some preview to cli which would be iterated at every audio loop:

../build/bin/whisper-cli -m ../models/ggml-base.en.bin -t 4 -p 1 -f ./6097_5_mins/audio/presentpictureofnsw_02_mann_0083.wav
Word transcribed is : ['and all boats to be moored within the hospital waff and hulk.']
Actual word is: and all boats to be moored within the hospital wharf and hulk
wer for audio/presentpictureofnsw_02_mann_0083.wav is 0.17

ggerganov · 2025-02-18T18:06:29Z

Don't push changes to the bindings
The audio files should not be committed in the repo, but should be downloaded instead
The script should not parse stdout - use existing file output options
Should handle short audio inputs (i.e. less than 1s)

foldl · 2025-02-19T02:45:09Z

The audio files should not be committed in the repo, but should be downloaded instead

Downloadable datasets are huge with several hundreds of hours of audio. While I think this work item is to create some light-weight tests on WER performance which can be integrated into github workflows.

If there is a tiny dataset contained in repo, then WER benchmarking will work just out of the box and on-the-fly. A selected tiny dataset can also cover English/non-English, clean or noisy, which will be handy & useful. For example, if some sort of noise cancellation is added, then some noisy audio files can be added to the dataset and get benchmark easily & quickly.

Anyway, I suggested to add audio files to this repo. My apologies to @harvestingmoon.

harvestingmoon · 2025-02-19T02:55:26Z

Should handle short audio inputs (i.e. less than 1s)

I have tried short audio inputs with the Google Command Dataset which contains audio input files approx 1s each. However, the problem with this is that whisper.cpp is unable to capture any words at all (I believe it is because the audio inputs are just too short) so there is difficulty in calculating WER. Hence, I switched over to the Hifi-TTS dataset.

No worries @foldl ! am glad to try to help / contribute 😄

I can continue slowly developing the script if given the green-light 👍🏼

harvestingmoon added 3 commits February 16, 2025 19:37

wer testing

c32cf90

add more wer

239878a

new wer testing

2d76a65

harvestingmoon mentioned this pull request Feb 17, 2025

tests : add WER benchmarks #2454

Open

foldl reviewed Feb 18, 2025

View reviewed changes

cleaned up filepath & redir

13750f3

ggerganov closed this Feb 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new WER script #2824

new WER script #2824

harvestingmoon commented Feb 17, 2025

foldl commented Feb 17, 2025

harvestingmoon commented Feb 17, 2025

foldl Feb 18, 2025

harvestingmoon Feb 18, 2025

harvestingmoon commented Feb 18, 2025 •

edited

Loading

ggerganov commented Feb 18, 2025

foldl commented Feb 19, 2025

harvestingmoon commented Feb 19, 2025 •

edited

Loading

new WER script #2824

new WER script #2824

Conversation

harvestingmoon commented Feb 17, 2025

foldl commented Feb 17, 2025

harvestingmoon commented Feb 17, 2025

foldl Feb 18, 2025

Choose a reason for hiding this comment

harvestingmoon Feb 18, 2025

Choose a reason for hiding this comment

harvestingmoon commented Feb 18, 2025 • edited Loading

ggerganov commented Feb 18, 2025

foldl commented Feb 19, 2025

harvestingmoon commented Feb 19, 2025 • edited Loading

harvestingmoon commented Feb 18, 2025 •

edited

Loading

harvestingmoon commented Feb 19, 2025 •

edited

Loading