facebook/duckling

Arabic regex match some parts of words that mustn't be matched

Open

#131 创建于 2017年12月24日

在 GitHub 查看
 (4 评论) (5 反应) (0 负责人)Haskell (4,282 star) (737 fork)batch import
bughelp wanted

描述

In some Arabic dimensions, the regex doesn't match whole words only, instead it sometimes match some part of the word that mustn't be matched at all ! which makes serious buggy outputs. I noticed this problem in the following dimensions: 1- ordinal 2- number 3- time 4- duration

for example: if the input text was: "وأحدث ذلك مشكلة كبيرة" which in English means: "And that made a big problem"

Duckling match the word "احد" (which means "Sunday") from its built-in regex, with the word "وأحدث" (which means "made") from the input text ! (see the following photo for run time) time bug

this was in "time" dimension.

examples for the other dimensions are also attached:

a- "ordinal" dimension: ordinal bug

b- "number" dimension: This example shows bugs in more the one dimension, number dim is one of them. number bug

贡献者指南

Arabic regex match some parts of words that mustn't be matched · facebook/duckling#131 | Good First Issue