facebook/duckling

Support for parser combinators

Open

#27 opened on May 24, 2017

View on GitHub
 (7 comments) (0 reactions) (0 assignees)Haskell (4,282 stars) (737 forks)batch import
enhancementhelp wanted

Description

Many of the examples of regexes are reached the point where a parser combinator library would be a much better option - a prime example is the URL matcher which can easily be precisely defined using a parser combinator, while at the moment it's fairly ad hoc and loses a lot of information (the path doesn't work for URLs which contain usernames and passwords, something users might want to be able to match on to forbid or warn users who're posting URLs they shouldn't):

ruleURL :: Rule
ruleURL = Rule
  { name = "url"
  , pattern =
    [ regex "((([a-zA-Z]+)://)?(w{2,3}[0-9]*\\.)?(([\\w_-]+\\.)+[a-z]{2,4})(:(\\d+))?(/[^?\\s#]*)?(\\?[^\\s#]+)?)"
    ]
  , prod = \tokens -> case tokens of
      (Token RegexMatch (GroupMatch (m:_:_protocol:_:domain:_:_:_port:_path:_query:_)):
       _) -> Just . Token Url $ url m domain
      _ -> Nothing
  }

(For this specific example, the Network.URI package already provides parseURI :: String -> Maybe URI)

I don't have an implementation for this yet (nor a preference for combinator library) because I don't fully understand how duckling all fits together, and wanted to open this to start discussion about it.

Contributor guide