Arabic regex match some parts of words that mustn't be matched
#131 opened on Dec 24, 2017
Description
In some Arabic dimensions, the regex doesn't match whole words only, instead it sometimes match some part of the word that mustn't be matched at all ! which makes serious buggy outputs. I noticed this problem in the following dimensions: 1- ordinal 2- number 3- time 4- duration
for example: if the input text was: "وأحدث ذلك مشكلة كبيرة" which in English means: "And that made a big problem"
Duckling match the word "احد" (which means "Sunday") from its built-in regex, with the word "وأحدث" (which means "made") from the input text ! (see the following photo for run time)
this was in "time" dimension.
examples for the other dimensions are also attached:
a- "ordinal" dimension:
b- "number" dimension:
This example shows bugs in more the one dimension, number dim is one of them.