Arabic regex match some parts of words that mustn't be matched
#131 创建于 2017年12月24日
描述
In some Arabic dimensions, the regex doesn't match whole words only, instead it sometimes match some part of the word that mustn't be matched at all ! which makes serious buggy outputs. I noticed this problem in the following dimensions: 1- ordinal 2- number 3- time 4- duration
for example: if the input text was: "وأحدث ذلك مشكلة كبيرة" which in English means: "And that made a big problem"
Duckling match the word "احد" (which means "Sunday") from its built-in regex, with the word "وأحدث" (which means "made") from the input text ! (see the following photo for run time)
this was in "time" dimension.
examples for the other dimensions are also attached:
a- "ordinal" dimension:
b- "number" dimension:
This example shows bugs in more the one dimension, number dim is one of them.