mozilla/DeepSpeech

Method to replace incomplete/wrong words in sentence ?!

Open

#795 opened on Aug 29, 2017

View on GitHub
 (4 comments) (0 reactions) (0 assignees)C++ (26,755 stars) (4,093 forks)batch import
help wanted

Description

Hello all. Looking for the best method, for replacing incomplete words in sentence.

An idea : use SequenceMatcher from python difflib, who reports a percent :

ex in french : intelligen (we don't hear the 't')

ratio = difflib.SequenceMatcher(None, 'intelligen', 'intelligent').ratio() ratio -> 0.9523809523809523

if the ratio == 1.0 pass (good) if the ratio >= defined_value, change to dic word if under, pass or report error, or report none,

Or, in my case, add a comment : src = tu es intelligent res = tu es intailigen (ratio = 0.7619047619047619 for intailigen, under needed ratio) corrected_result = u'tu es intailigen,bad' my bot read sentence, speaks that it heard a bad question, and save sentence to a log, for model corrections)

I could also compare complete sentence : the ratio would be very different, and I could completly replace the whole sentence by the dic one : ex: src = alfred diriges toi vers le salon res = dirige toi ver le salo alfre ratio = 0.7333333333333333 of course, it't a possible case for a limited corpus, and all possible sentences written in words.txt

It seems a bit heavy with words loops, but python is strong with this...

Is there a better method (deep one ?), that I could learn ? (python!)

Thanks all

Contributor guide