TextDecoder does not error incorrectly for legacy byte sequences · nodejs/node#40091

(12 commenti) (4 reazioni) (0 assegnatari)JavaScript (35.535 fork)batch import

encodinggood first issue

Metriche repository

Star: (117.218 star)
Metriche merge PR: (Merge medio 13g 4h) (233 PR mergiate in 30 g)

Descrizione

Version

v16.9.1

Platform

Microsoft Windows NT 10.0.19043.0 x64

Subsystem

encoding

What steps will reproduce the bug?

Enter the following in the REPL:

new TextDecoder("Big5").decode(new Uint8Array([0x83, 0x5C])).charCodeAt(0).toString(16)

as well as

new TextDecoder("Big5").decode(new Uint8Array([0x83, 0x5C])).charCodeAt(1).toString(16)

How often does it reproduce? Is there a required condition?

Every time

What is the expected behavior?

fffd for the first, and 5c for the second (as in Firefox and Chrome, and per the WHATWG Encoding Standard)

What do you see instead?

f00e and NaN

Additional information

I suspect this has to do with you using ICU as-is, instead of properly patching it to match the Encoding Standard. There are probably more bugs like this.

@inexorabletash may be able to point to where in the Chromium source tree we keep our ICU encoding patches.

Guida contributor

Direzione di ricerca: Esamina l'implementazione di TextDecoder per Big5 nel sottosistema di codifica di Node.js. Confronta il comportamento atteso secondo lo standard WHATWG con l'output effettivo. Cerca le differenze nelle patch ICU rispetto all'approccio di Chromium. Considera le opzioni di patch per allinearsi allo standard.
Tech stack: javascript
Dominio: backend
Tipo issue: Bug
Difficoltà: 3
Tempo stimato: 1-3 ore
Stato attività: Attiva
Chiarezza: Chiara
Prerequisiti: JavaScriptNode.js
Adatta ai principianti: 40