tests : add WER benchmarks · ggml-org/whisper.cpp#2454

(26 comments) (0 reactions) (0 assignees)C++ (49,693 stars) (5,535 forks)batch import

help wantedhigh priorityresearch🔬roadmap

説明

It would be nice to start measuring the word error rate (WER) of whisper.cpp across some representative dataset:

short audio
long audio
english
non-english
etc.

This will help us catch regressions in the future. I'm not familiar with what is typically used for TTS WER benchmarks, so looking for help from the community.

コントリビューターガイド

技術スタック: ccpp
領域: testingmachine learningperformance
Issue 種別: feature
難度: 3
推定時間: 3-5 days
活動状況: active
明確さ: mostly clear
前提条件: Basic understanding of whisper.cppFamiliarity with word error rate (WER)Experience with audio datasets
初心者向け度: 40
調査方針: The issue requests adding WER benchmarks to whisper.cpp. Start by researching standard ASR benchmark datasets (e.g., LibriSpeech, Common Voice) that include short/long audio and English/non English samples. Examine the existing test infrastructure in the repository (likely in tests/ directory or Makefile) to understand how to integrate new benchmarks. Review the comments on the issue for community suggestions on dataset selection and evaluation methodology. Coordinate with maintainers to agree on a concrete set of datasets and metrics before implementing.