[CLOSED] Name Table String Encodings
Posted: Sat Mar 09, 2019 10:28 pm
This is pretty arcane and may be the result of my lack of understanding … but …
Case A. When Tools => Options => Font => Export Font => Exclude Legacy Data is checked.
As expected, entries in the Name table (eg. Copyright) are output with a single string with platform = Windows (3), encoding = Unicode BMP (1) and language = English (United States) (1033).
Case B. Exclude Legacy Data is UNchecked.
As expected, another string is output for each Name table entry (it actually precedes the Windows string) with platform = Macintosh (1), encoding = Roman (0), and language = English (0).
*** Issue: Both strings are output as UTF-8.
I am thinking that the Macintosh string needs to be converted to Roman encoding (https://en.wikipedia.org/wiki/Mac_OS_Roman) with, for example, a copyright character output as a single byte 0xA9. However, a copyright character is output as the UTF-8 string 0xC2 0xA9.
For the Windows string, the 1.8.3 OpenType spec seems to call for …
The output fonts do seem to "work", at least in Windows, and fonts I have examined seem to share this style of using UTF-8 … so …
Case A. When Tools => Options => Font => Export Font => Exclude Legacy Data is checked.
As expected, entries in the Name table (eg. Copyright) are output with a single string with platform = Windows (3), encoding = Unicode BMP (1) and language = English (United States) (1033).
Case B. Exclude Legacy Data is UNchecked.
As expected, another string is output for each Name table entry (it actually precedes the Windows string) with platform = Macintosh (1), encoding = Roman (0), and language = English (0).
*** Issue: Both strings are output as UTF-8.
I am thinking that the Macintosh string needs to be converted to Roman encoding (https://en.wikipedia.org/wiki/Mac_OS_Roman) with, for example, a copyright character output as a single byte 0xA9. However, a copyright character is output as the UTF-8 string 0xC2 0xA9.
For the Windows string, the 1.8.3 OpenType spec seems to call for …
… however, the string is also encoded UTF-8.When building a Unicode font for Windows, the platform ID should be 3 and the encoding ID should be 1, and
the referenced string data must be encoded in UTF-16BE.
The output fonts do seem to "work", at least in Windows, and fonts I have examined seem to share this style of using UTF-8 … so …
- Am I completely confused about encodings?
- Is the Spec not correct?
- Is UTF-8 always "acceptable" in place of Roman and UTF-16BE (and ??other??) expected / specified encodings???