chakra-core/ChakraCore

[RegExp] Unicode-mode RegExp incorrectly matches lone surrogates

Open

#98 opened on Jan 14, 2016

View on GitHub
 (2 comments) (0 reactions) (0 assignees)JavaScript (9,000 stars) (1,374 forks)batch import
BugSeverity: 2help wanted

Description

I was able to observe this in Edge 25.10586.0.0:

/[\ud800-\ud805]+/u.exec("\u{10000}\ud801\ud802") should return ["\ud801\ud802] but instead returns["\ud800"], which is the first half of "\u{10000}".

The spec requires the input string to be interpreted as a sequence of code points, i.e. surrogate pairs to be combined. So matching the lead surrogate of "\u{10000}" is incorrect.

A bit more reduced would be: /\ud800+/u.exec("\u{10000}") should return null, but returns ["\ud800"] instead.

Contributor guide