chakra-core/ChakraCore
View on GitHub[RegExp] Unicode-mode RegExp incorrectly matches lone surrogates
Open
#98 opened on Jan 14, 2016
BugSeverity: 2help wanted
Description
I was able to observe this in Edge 25.10586.0.0:
/[\ud800-\ud805]+/u.exec("\u{10000}\ud801\ud802") should return ["\ud801\ud802] but instead returns["\ud800"], which is the first half of "\u{10000}".
The spec requires the input string to be interpreted as a sequence of code points, i.e. surrogate pairs to be combined. So matching the lead surrogate of "\u{10000}" is incorrect.
A bit more reduced would be: /\ud800+/u.exec("\u{10000}") should return null, but returns ["\ud800"] instead.