drewnoakes/metadata-extractor

NUL bytes in Apple Multi-language Profile Name in ICC profile of OS X Screenshot

Open

#396 opened on Apr 10, 2019

View on GitHub
 (1 comment) (0 reactions) (0 assignees)Java (2,411 stars) (470 forks)batch import
format-icchelp wantedimage-queue

Description

The way I was able to generate this issue is with PNG screenshots taken on OS X. Specifically, I'm running OS X 10.12.6 on a mid-2015 MacBook Pro, although I've seen it on a few other versions as well (10.13.6, 10.14.2). I'm attaching an example. I did not mess with the file in any way, this is the exact PNG I got from OS X after hitting Command-Shift-3.

It's related to the Apple Multi-Language profile name tag in the ICC profile of the PNG file. Specifically, the relevant part of the metadata-extractor output is:

[ICC Profile] - Apple Multi-language Profile Name = 34 hrHR(LCD u boji) koKR(컬러 LCD) nbNO(Farge-LCD) id??(LCD Warna) huHU(Színes LCD) csCZ(Barevný LCD) daDK(LCD-farveskærm) ukUA(Кольоровий LCD) ar??(‏LCD ملونة) itIT(LCD colori) roRO(LCD color) nlNL(Kleuren-LCD) heIL(‏LCD צבעוני) esES(LCD color) fiFI(Väri-LCD) zhTW(彩色 LCD) viVN(LCD Màu) skSK(Farebný LCD) zhCN(彩色 LCD) ruRU(Цветной ЖК-дисплей) frFR(LCD couleur) ms??(Warna LCD) caES(LCD en color) thTH(LCD สี) esXL(LCD color) deDE(Farb-LCD) enUS(Color LCD) ptBR(LCD Colorido) plPL(Kolor LCD) elGR(Έγχρωμη οθόνη LCD) svSE(Färg-LCD) trTR(Renkli LCD) jaJP(カラーLCD) ptPT(LCD a Cores)

Where the ? symbol is actually a NUL byte (I couldn't get the github markdown to display the weird square symbol I get for the NUL byte in my terminal output).

According to the ICC Profile, section 10.13 on "multiLocalizedUnicodeType", each record should start with a 2-byte ISO 639-1 language code, followed by a 2-byte ISO 3166-1 country code. As you can see in the output above, it seems like for a few languages (in this case Indonesian, Malay and Arabic - I have no idea why), the 2-byte country code is actually two NUL bytes.

By contrast, exiftool seems to validate and skip these records if they don't conform to the standard, and they end up not being displayed:

Profile Description ML (hr-HR)  : LCD u boji
Profile Description ML (ko-KR)  : 컬러 LCD
Profile Description ML (nb-NO)  : Farge-LCD
Profile Description ML (hu-HU)  : Színes LCD
Profile Description ML (cs-CZ)  : Barevný LCD
Profile Description ML (da-DK)  : LCD-farveskærm
Profile Description ML (uk-UA)  : Кольоровий LCD
Profile Description ML (it-IT)  : LCD colori
Profile Description ML (ro-RO)  : LCD color
Profile Description ML (nl-NL)  : Kleuren-LCD
Profile Description ML (he-IL)  : ‏LCD צבעוני
Profile Description ML (es-ES)  : LCD color
Profile Description ML (fi-FI)  : Väri-LCD
Profile Description ML (zh-TW)  : 彩色 LCD
Profile Description ML (vi-VN)  : LCD Màu
Profile Description ML (sk-SK)  : Farebný LCD
Profile Description ML (zh-CN)  : 彩色 LCD
Profile Description ML (ru-RU)  : Цветной ЖК-дисплей
Profile Description ML (fr-FR)  : LCD couleur
Profile Description ML (ca-ES)  : LCD en color
Profile Description ML (th-TH)  : LCD สี
Profile Description ML (es-XL)  : LCD color
Profile Description ML (de-DE)  : Farb-LCD
Profile Description ML          : Color LCD
Profile Description ML (pt-BR)  : LCD Colorido
Profile Description ML (pl-PL)  : Kolor LCD
Profile Description ML (el-GR)  : Έγχρωμη οθόνη LCD
Profile Description ML (sv-SE)  : Färg-LCD
Profile Description ML (tr-TR)  : Renkli LCD
Profile Description ML (ja-JP)  : カラーLCD
Profile Description ML (pt-PT)  : LCD a Cores

I honestly don't know what the right thing to do here is , or even if this is really a bug in the library. My opinion is that OS X is generating an incorrect profile, but the reality is that these pictures are out there in the wild so I thought it's worth raising the issue. I'm happy to try to provide a fix if you have a suggestion for what to do.

If you want to inspect the bytes yourself here's a little snippet I used:

final Metadata metadata = ImageMetadataReader.readMetadata(yourInputStreamHere);
final IccDirectory id = metadata.getDirectoriesOfType(IccDirectory.class).iterator().next();
final byte[] appleMlucBytes = (byte[]) id.getObject(IccDirectory.TAG_APPLE_MULTI_LANGUAGE_PROFILE_NAME);

Contributor guide