facebook/duckling

Arabic regex match some parts of words that mustn't be matched

Open

Aperta il 24 dic 2017

Vedi su GitHub
 (4 commenti) (5 reazioni) (0 assegnatari)Haskell (4282 star) (737 fork)batch import
bughelp wanted

Descrizione

In some Arabic dimensions, the regex doesn't match whole words only, instead it sometimes match some part of the word that mustn't be matched at all ! which makes serious buggy outputs. I noticed this problem in the following dimensions: 1- ordinal 2- number 3- time 4- duration

for example: if the input text was: "وأحدث ذلك مشكلة كبيرة" which in English means: "And that made a big problem"

Duckling match the word "احد" (which means "Sunday") from its built-in regex, with the word "وأحدث" (which means "made") from the input text ! (see the following photo for run time) time bug

this was in "time" dimension.

examples for the other dimensions are also attached:

a- "ordinal" dimension: ordinal bug

b- "number" dimension: This example shows bugs in more the one dimension, number dim is one of them. number bug

Guida contributor