help wanted
描述
Currently arabic numerals and symbols in whisper transcript cannot be aligned, needs to be phonetic alphabet.
Need to perform inverse of normalization in https://github.com/m-bain/whisperX/blob/main/whisperx/normalizers/english.py
Such that numbers and currencies are converted to their phonetic word form.
E.g. "$300" -> "three hundred dollars"
To perform wav2vec alignment.
Then convert back to symbol form, and assign timestamps.