I'm currently working with a Chinese font which has a lot of Private Use Area (PUA) characters. The first block seems to have been ordered neatly by radical plus additional strokes but once out of the first block, it's a bit of a mess (imagine someone wrote and English > German dictionary sorted alphabetically and then, after the main part, added a section on "Additional Words" but rather than sorting that alphabetically, they added them ad-hoc i.e. zygomorphic might be followed by nilotic and then arabesque because the dealt with zygomorphic first, then nilotic etc etc).
That makes searching for characters very hard work and I was wondering if there was a way to search for components? Let's say I'm searching for 睉 and cannot find it in the "ordered" block, is there a way of searching for a component (for example 坐)? A bit like in Word you can do a part of word search where "alg" will find both "alginate" and "algorithm"?
Searching CJK font for character component
-
- Top Typographer
- Posts: 9878
- Joined: Tue Oct 29, 2002 5:28 am
- Location: Seven Kings, London UK
- Contact:
Re: Searching CJK font for character component
Does Finding A Glyph work for CJK fonts?
Re: Searching CJK font for character component
Unfortunately not. Searching by mapping only works if you know the mapping but for a character with an unknown mapping, that clearly is not an option.
Glyph name also doesn't work because CJK characters rely on the code point for naming for example the name of 語 is U+8A9E CJK UNIFIED IDEOGRAPH-8A9E (where for A you'd find "Latin Capital Letter A"). There are "attributes" fields but even in canonical Unicode, these do not seem to include character constituents e.g. 語 you can in theory search for readings (=pronunciation) in the Unicode database(s) if they're included but not parts i.e. I cannot get to 語 by searching either for 訁or 吾 even with canonical Unicode, never mind PUA which seem to come with even less meta-data, certainly not readings as far as I've seen.
Glyph name also doesn't work because CJK characters rely on the code point for naming for example the name of 語 is U+8A9E CJK UNIFIED IDEOGRAPH-8A9E (where for A you'd find "Latin Capital Letter A"). There are "attributes" fields but even in canonical Unicode, these do not seem to include character constituents e.g. 語 you can in theory search for readings (=pronunciation) in the Unicode database(s) if they're included but not parts i.e. I cannot get to 語 by searching either for 訁or 吾 even with canonical Unicode, never mind PUA which seem to come with even less meta-data, certainly not readings as far as I've seen.
Re: Searching CJK font for character component
Some font makers (such as Babelstone) provide reference sheets for the composition of canonical CJK and their own PUA but obviously what a font does in the Private Use Area is different in each font that uses the PUA and I cannot (unfortunately) rely on the sheets provided by Babelstone to figure out what DFSongStd (the font I'm looking at) maps there.