TextDecoder does not error incorrectly for legacy byte sequences · nodejs/node#40091

(12 评论) (4 反应) (0 负责人)JavaScript (35,535 fork)batch import

encodinggood first issue

仓库指标

Star: (117,218 star)
PR 合并指标: (平均合并 13天 4小时) (30 天内合并 233 个 PR)

描述

Version

v16.9.1

Platform

Microsoft Windows NT 10.0.19043.0 x64

Subsystem

encoding

What steps will reproduce the bug?

Enter the following in the REPL:

new TextDecoder("Big5").decode(new Uint8Array([0x83, 0x5C])).charCodeAt(0).toString(16)

as well as

new TextDecoder("Big5").decode(new Uint8Array([0x83, 0x5C])).charCodeAt(1).toString(16)

How often does it reproduce? Is there a required condition?

Every time

What is the expected behavior?

fffd for the first, and 5c for the second (as in Firefox and Chrome, and per the WHATWG Encoding Standard)

What do you see instead?

f00e and NaN

Additional information

I suspect this has to do with you using ICU as-is, instead of properly patching it to match the Encoding Standard. There are probably more bugs like this.

@inexorabletash may be able to point to where in the Chromium source tree we keep our ICU encoding patches.

贡献者指南

研究方向: 检查 Node.js 编码子系统中 Big5 的 TextDecoder 实现。根据 WHATWG 标准比较预期行为与实际输出。查找与 Chromium 方法相比 ICU 补丁的差异。考虑修补选项以符合标准。
技术栈: javascript
领域: backend
议题类型: 缺陷
难度: 3
预计时间: 1-3 小时
活动状态: 活跃
清晰度: 清晰
前置要求: JavaScriptNode.js
新手友好度: 40