Page 1 of 1

Name Table String Encodings

Posted: Sat Mar 09, 2019 10:28 pm
by ClintGoss
This is pretty arcane and may be the result of my lack of understanding … but …

Case A. When Tools => Options => Font => Export Font => Exclude Legacy Data is checked.

As expected, entries in the Name table (eg. Copyright) are output with a single string with platform = Windows (3), encoding = Unicode BMP (1) and language = English (United States) (1033).

Case B. Exclude Legacy Data is UNchecked.

As expected, another string is output for each Name table entry (it actually precedes the Windows string) with platform = Macintosh (1), encoding = Roman (0), and language = English (0).

*** Issue: Both strings are output as UTF-8.

I am thinking that the Macintosh string needs to be converted to Roman encoding (https://en.wikipedia.org/wiki/Mac_OS_Roman) with, for example, a copyright character output as a single byte 0xA9. However, a copyright character is output as the UTF-8 string 0xC2 0xA9.

For the Windows string, the 1.8.3 OpenType spec seems to call for …
When building a Unicode font for Windows, the platform ID should be 3 and the encoding ID should be 1, and
the referenced string data must be encoded in UTF-16BE.
… however, the string is also encoded UTF-8.

The output fonts do seem to "work", at least in Windows, and fonts I have examined seem to share this style of using UTF-8 … so …
  • Am I completely confused about encodings?
  • Is the Spec not correct?
  • Is UTF-8 always "acceptable" in place of Roman and UTF-16BE (and ??other??) expected / specified encodings???
-Clint

Re: Name Table String Encodings

Posted: Sun Mar 10, 2019 12:50 am
by ClintGoss
Never mind.
Ooops sorry, mea culpa, and who let this guy out of the Home for Wayward Programmers?

I'm using an abstraction layer, which handles the encoding / decoding and presents most strings as UTF-8. It overloads read() and (ain't OOP wonderful) fooled me into thinking I was getting bytes rather than UTF-8.

However, the exercise does make me realize that FCP needs to do a (in perl lingo) "downgrade" on the string to output the Mac / Roman version, which is OK for characters in Mac Roman (luckily, copyright is in there), but certainly not all the characters. Wonder what FCP does with un-mappable characters …

-Clint