facebook/duckling

Arabic regex match some parts of words that mustn't be matched

Open

#131 opened on Dec 24, 2017

View on GitHub
 (4 comments) (5 reactions) (0 assignees)Haskell (4,282 stars) (737 forks)batch import
bughelp wanted

Description

In some Arabic dimensions, the regex doesn't match whole words only, instead it sometimes match some part of the word that mustn't be matched at all ! which makes serious buggy outputs. I noticed this problem in the following dimensions: 1- ordinal 2- number 3- time 4- duration

for example: if the input text was: "وأحدث ذلك مشكلة كبيرة" which in English means: "And that made a big problem"

Duckling match the word "احد" (which means "Sunday") from its built-in regex, with the word "وأحدث" (which means "made") from the input text ! (see the following photo for run time) time bug

this was in "time" dimension.

examples for the other dimensions are also attached:

a- "ordinal" dimension: ordinal bug

b- "number" dimension: This example shows bugs in more the one dimension, number dim is one of them. number bug

Contributor guide