TextDecoder does not error incorrectly for legacy byte sequences · nodejs/node#40091

(12 commentaires) (4 réactions) (0 assignés)JavaScript (35 535 forks)batch import

encodinggood first issue

Métriques du dépôt

Stars: (117 218 stars)
Métriques de merge PR: (Merge moyen 13j 4h) (233 PRs mergées en 30 j)

Description

Version

v16.9.1

Platform

Microsoft Windows NT 10.0.19043.0 x64

Subsystem

encoding

What steps will reproduce the bug?

Enter the following in the REPL:

new TextDecoder("Big5").decode(new Uint8Array([0x83, 0x5C])).charCodeAt(0).toString(16)

as well as

new TextDecoder("Big5").decode(new Uint8Array([0x83, 0x5C])).charCodeAt(1).toString(16)

How often does it reproduce? Is there a required condition?

Every time

What is the expected behavior?

fffd for the first, and 5c for the second (as in Firefox and Chrome, and per the WHATWG Encoding Standard)

What do you see instead?

f00e and NaN

Additional information

I suspect this has to do with you using ICU as-is, instead of properly patching it to match the Encoding Standard. There are probably more bugs like this.

@inexorabletash may be able to point to where in the Chromium source tree we keep our ICU encoding patches.

Guide contributeur

Direction de recherche: Examinez l'implémentation de TextDecoder pour Big5 dans le sous système d'encodage de Node.js. Comparez le comportement attendu selon la norme WHATWG avec la sortie réelle. Recherchez les différences de correctifs ICU par rapport à l'approche de Chromium. Envisagez des options de correctif pour se conformer à la norme.
Stack technique: javascript
Domaine: backend
Type d'issue: Bug
Difficulté: 3
Temps estimé: 1-3 heures
Statut d'activité: Active
Clarté: Claire
Prérequis: JavaScriptNode.js
Accessibilité débutant: 40