TextDecoder does not error incorrectly for legacy byte sequences · nodejs/node#40091

(12 留言) (4 反應) (0 負責人)JavaScript (35,535 fork)batch import

encodinggood first issue

倉庫指標

Star: (117,218 star)
PR 合併指標: (平均合併 13天 4小時) (30 天內合併 233 個 PR)

描述

Version

v16.9.1

Platform

Microsoft Windows NT 10.0.19043.0 x64

Subsystem

encoding

What steps will reproduce the bug?

Enter the following in the REPL:

new TextDecoder("Big5").decode(new Uint8Array([0x83, 0x5C])).charCodeAt(0).toString(16)

as well as

new TextDecoder("Big5").decode(new Uint8Array([0x83, 0x5C])).charCodeAt(1).toString(16)

How often does it reproduce? Is there a required condition?

Every time

What is the expected behavior?

fffd for the first, and 5c for the second (as in Firefox and Chrome, and per the WHATWG Encoding Standard)

What do you see instead?

f00e and NaN

Additional information

I suspect this has to do with you using ICU as-is, instead of properly patching it to match the Encoding Standard. There are probably more bugs like this.

@inexorabletash may be able to point to where in the Chromium source tree we keep our ICU encoding patches.

貢獻者指南

研究方向: 檢查 Node.js 編碼子系統中 Big5 的 TextDecoder 實作。根據 WHATWG 標準比較預期行為與實際輸出。查找與 Chromium 方法相比 ICU 修補程式的差異。考慮修補選項以符合標準。
技術棧: javascript
領域: backend
議題類型: 錯誤
難度: 3
預計時間: 1-3 小時
活動狀態: 活躍
清晰度: 清晰
前置要求: JavaScriptNode.js
新手友善度: 40