tests : add WER benchmarks · ggml-org/whisper.cpp#2454

(26 comments) (0 reactions) (0 assignees)C++ (49.693 stars) (5.535 forks)batch import

help wantedhigh priorityresearch🔬roadmap

Description

It would be nice to start measuring the word error rate (WER) of whisper.cpp across some representative dataset:

short audio
long audio
english
non-english
etc.

This will help us catch regressions in the future. I'm not familiar with what is typically used for TTS WER benchmarks, so looking for help from the community.

Guia do colaborador

Pilha de tecnologia: ccpp
Domain: testingmachine learningperformance
Tipo Issue: feature
Difficulty: 3
Tempo estimado: 3-5 days
Status da atividade: active
Clarity: mostly clear
Prerequisites: Basic understanding of whisper.cppFamiliarity with word error rate (WER)Experience with audio datasets
Simpatia para novatos: 40
Direção de pesquisa: The issue requests adding WER benchmarks to whisper.cpp. Start by researching standard ASR benchmark datasets (e.g., LibriSpeech, Common Voice) that include short/long audio and English/non English samples. Examine the existing test infrastructure in the repository (likely in tests/ directory or Makefile) to understand how to integrate new benchmarks. Review the comments on the issue for community suggestions on dataset selection and evaluation methodology. Coordinate with maintainers to agree on a concrete set of datasets and metrics before implementing.