Diacritical Marks, respectively Unicode

Erwin Denissen · Post by **Erwin Denissen** » Mon Aug 04, 2008 11:56 am

Bhikkhu Pesala wrote:Don't forget the currency symbols: I presume that French users will need the Fr, and Spanish users will need Peseta for example.

I can't find any proof of the use of the French currency sign, so I doubt if it's useful to include it. On the Internet all I can find is "francs". No wait, I found one reference:
http://fr.wikipedia.org/wiki/Franc_fran%C3%A7ais

Even the EKI Letter Database fails to find a language that uses the French franc sign.
http://www.eki.ee/letter/chardata.cgi?search=franc

Besides since 2002 the Euro is the official currency of the European Union.

Erwin Denissen · Post by **Erwin Denissen** » Mon Aug 04, 2008 12:11 pm

Bhikkhu Pesala wrote:Polish, for example, doesn't seem to be fully supported (no Ogoenk), but has anyone from Poland bought it?

Maybe the following characters (the last ones on the additional template) should be removed from the template, to make room for more accented characters instead: ¯ ¨ ´ · ¸

Erwin Denissen · Post by **Erwin Denissen** » Mon Aug 04, 2008 1:04 pm

Bhikkhu Pesala wrote:The following accents are needed to cover most languages. (Source: Letter Database).

It seems I need to add 14 more characters (A, E, a, e with ogonek, and C, N, S, Z, c, n, s, z with acute and Z, z with dot above) to include all required characters for Polish. I might just ask the two Polish customers for their opinion about what characters they consider important.

Ludwik · Post by **Ludwik** » Mon Aug 04, 2008 1:58 pm

Erwin Denissen wrote:2 paid customers from Poland so far.

I am one of them.
I need: ą Ą ć Ć ę Ę ł Ł ń Ń ó Ó ś Ś ź Ź ż Ż

Erwin Denissen · Post by **Erwin Denissen** » Mon Aug 04, 2008 3:17 pm

Hi Ludwik,

Thanks for your fast response!

Now with the required Polish characters I have 123 characters, but there is only room for 110 characters. So some won't make it into the final template.

I think these should stay:
¿¡ are used in the Spanish language
² for example square meter: m²
³ for example cubic meter: m³
«» are guillemets. the right one is also used on web pages as an arrow

Characters that could be removed:
¦ the broken bar is rarely used, so could be excluded
¬ not sign, might be excluded
× mathematical symbol (multiplication)
÷ mathematical symbol (division)
¤ currency symbol, used rarely, so could be excluded (but not likely as it is on the first page!)
º Masculine ordinal indicator, used in Spanish language
ª Feminine ordinal indicator, used in Spanish language
¹ superscript 1
¥ Yen sign (the only non Latin character?)
¼½¾ if necessary these will be excluded as well
- soft hyphen
¯ Macron
¨ Diaeresis
´ Acute accent
· Middle dot
¸ Cedilla

Erwin Denissen · Post by **Erwin Denissen** » Tue Aug 05, 2008 9:15 am

The latest version of the second template includes the following characters:
ÀÁÂÃÄÅĄÇĆĈ
ÈÉÊËĔĜĤÌÍÎ
ÏĴŁŃÑÒÓÔÕÖ
ØŚŜŠÙÚÛÜŬÝ
ŸŹŻŽàáâãäå
ąçćĉèéêëęƒ
ĝĥıìíîïĵłń
ñòóôõöøśŝš
ùúûüŭýÿźżž
ÆŒÐÞßæœðþµ
¿¡¼½¾«»¬²³

It covers:
English
German
Dutch
French
Italian
Swedish
Czech
Norwegian
Danish
Polish
Spanish
Portuguese
Basque
Estonian
Faeroese
Frisian
Irish
Galician
Hungarian
Icelandic
Albanian
Esperanto

I'm happy with this additional character set, so unless I receive additional suggestions, this will be it.

Yehuda · Post by **Yehuda** » Wed Aug 06, 2008 8:59 am

It covers:
<snip>
Hungarian
<snip>

I don't see the Hungarian Ő, ő, Ű, and ű

Erwin Denissen · Post by **Erwin Denissen** » Wed Aug 06, 2008 9:23 am

Oops, good point

So now four out of these have to be removed:

¼½¾«»¬²³

Erwin Denissen · Post by **Erwin Denissen** » Mon Aug 11, 2008 9:13 am

I've added the missing Hungarian characters:

ÀÁÂÃÄÅĄÇĆĈ
ÈÉÊËĔĜĤÌÍÎ
ÏĴŁŃÑÒÓÔÕÖ
ŐØŚŜŠÙÚÛÜŬ
ŰÝŸŹŻŽàáâã
äåąçćĉèéêë
ęƒĝĥıìíîïĵ
łńñòóôõöőø
śŝšùúûüŭűý
ÿźżžÆŒÐÞßæ
œðþµ¿¡«»¬²

Erwin Denissen · Post by **Erwin Denissen** » Mon Aug 11, 2008 11:12 am

Not all Czech characters are included, so that one is now removed from the list.

The two page template still covers a lot of languages:
English, German, Dutch, French, Italian, Swedish, Norwegian, Danish, Polish, Spanish, Portuguese, Basque, Estonian, Faeroese, Frisian, Irish, Galician, Hungarian, Icelandic, Albanian and Esperanto.

Timo Kähkönen · Post by **Timo Kähkönen** » Mon Aug 11, 2008 12:10 pm

Bhikkhu Pesala wrote:...Anyone making multi-lingual fonts should be using FontCreator, not Scanahand...

I think Scanahand could be wonderful tool for as well hobbyist as professional. There seems not to be Scanahand like tool in the market, which "is full of" manual scan and trace glyph per glyph tools. These include professional softwares like Fontlab, Typetool, Scanfont and as I can see also Fontographer. FontCreator is also this like tool, with lack of automatic font generation using glyph templates. So with FontCreator it is hard work to make multi-lingual fonts.

Now we have one page template in Scanahand (Basic). Why should we restrict it's possibilities to two page template and waste our time to think what glyphs to include in the second page?

Reasonable would be allow all characters in unicode, at least Basic Multilingual Plane (BMP) 000000..00FFFF (65536 glyphs). In practice there are only few font creators who needs these all in one font. Fonts has nearly always only some subset of unicode and nearly always only some subset of unicode blocks. For example Verdana covers the following 18 unicode blocks, but NONE of them inlcude all possible characters of the block. As we can see, Verdana has only 95 characters of 128 Basic Latin characters.

Basic Latin (95 of 128)
Latin-1 Supplement (96 of 128)
Latin Extended-A (128 of 128)
Latin Extended-B (11 of 208)
Spacing Modifier Letters (9 of 80)
Combining Diacritical Marks (5 of 112)
Greek and Coptic (73 of 127)
Cyrillic (94 of 255)
Latin Extended Additional (96 of 246)
General Punctuation (23 of 106)
Superscripts and Subscripts (1 of 34)
Currency Symbols (5 of 22)
Letterlike Symbols (6 of 79)
Number Forms (4 of 50)
Mathematical Operators (14 of 256)
Geometric Shapes (6 of 96)
Private Use Area (12 of 6400)
Alphabetic Presentation Forms (2 of 58)

The questions are:
1) in which criteria to select unicode blocks to the new font
2) in which criteria to select what characters to include from these blocks to the new font

Some reason there must be that the creator of Verdana has included only 21.7 % of general punctuation marks. Maybe he/she has thought that some chars are more essential or widely used than others.

One way to answer this is statistical way: to collect widely used fonts, calculate what blocks and portions of blocks are most common and use these results as the base for own font templates.

The technic that allows free selection of unicode ranges is simple. In the Scanahand the must be one embedded font that covers the whole unicode plane. At the beginning the Basic Multilingual Plane (BMP) 000000..00FFFF (65536 glyphs) is enough. Scanahand uses this font to print sample characters in the template pages.

If the program would use fixed page unicode ranges then the program would always know how to map the glyphs. But if we go to the dynamic user created glyph templates (which is really preferable), the solution is little more complicated.

When there is 1-10 dynamically created template pages, there must be some way to include the information of the page unicode ranges to the printable template page - without this page-related information it's in practice not possible to map scanned glyphs to unicode mapping slots.

One of the best solutions is Data Matrix Barcode, which can include thousands of bytes information to small image. When the Matrix image that has encoded information of page's character range is printed on the top of the page then Scanahand can decode the information back to the character range. Data Matrix has built-in error correction, so it is very tolerant to image noise (scan dust & scratches, misaligned lines in print etc.).

EDIT: Of course, I don't mean that the fixed two page template is to be removed. I mean it's not enough. So the user should have ability to select between fixed template(s) and dynamic template. The average user could select one of the fixed templates (or default template) and the advanced user could select one of the fixed templates and modify unicode ranges of it and use that modified dynamic template.

William · Post by **William** » Mon Aug 11, 2008 3:08 pm

As the accented characters needed for Czech were mentioned. some readers might like to know of the following document.

http://www.evertype.com/alphabets/czech.pdf

This is one of many documents about the Unicode characters needed for setting text in the languages of Europe.

There is a huge list of links to those documents in section 1.2.2 Alphabetic index of languages, which is about half-way down the following page.

http://www.evertype.com/alphabets/

William Overington

11 August 2008

Bhikkhu Pesala · Post by **Bhikkhu Pesala** » Mon Aug 11, 2008 5:15 pm

Timo Kähkönen wrote:I think Scanahand could be wonderful tool for as well hobbyist as professional. There seems not to be Scanahand like tool in the market, which "is full of" manual scan and trace glyph per glyph tools.

FontLab does offer a product like Scanahand — ScanFont — but it is $99 and still requires a Font Editor.

Professionals can use Scanahand to get their artwork into a font quickly, and then adjust the metrics and mappings in FontCreator. That is too much to expect of a budget-priced hobbyist's program.

So with FontCreator it is hard work to make multi-lingual fonts.

Making multi-lingual fonts is hard work with any program.

Why should we restrict it's possibilities to two page template and waste our time to think what glyphs to include in the second page?

Scanahand must be easy to use. It is designed for the beginner who knows nothing about Unicode or mappings. Professionals can use a program with all of the required features.

Reasonable would be allow all characters in unicode, at least Basic Multilingual Plane (BMP) 000000..00FFFF (65536 glyphs).

I have suggested before that mapping could be a separate process to scanning, but this would make Scanahand difficult to use for amateurs who don't know about mapping. Though they could type "A" when they see "A" scanned as a glyph, they won't know how to type extended Unicode glyphs. Typing accented glyphs like ä á å might not be too much to ask for European users who use a language-specific keyboard, but for others it would I think lead to more errors and frustration than filling in a template with just the extra glyphs that they want.

The method of filling in a template is almost idiot-proof and fast. Mapping individual glyphs would add considerably to the complexity and time required to make a font.

Generating custom templates on the fly by selecting glyphs from a character map or typing them from the keyboard is an interesting idea that might be viable. I don't see it happening just yet though.

Jowaco · Post by **Jowaco** » Mon Aug 11, 2008 6:05 pm

So masterly analysed and put by Bhikkhu Pesala.

I can only humbly agree.

Joe.

Timo Kähkönen · Post by **Timo Kähkönen** » Mon Aug 11, 2008 6:42 pm

Bhikkhu Pesala wrote: FontLab does offer a product like Scanahand — ScanFont — but it is $99 and still requires a Font Editor.

But Scanfont has not printable template page. It has Separate Shapes -function, but it doesn't do what Scanahand do.

Bhikkhu Pesala wrote: Professionals can use Scanahand to get their artwork into a font quickly, and then adjust the metrics and mappings in FontCreator. That is too much to expect of a budget-priced hobbyist's program.

Scanahand does mappings fully.

Making multi-lingual fonts is hard work with any program.

With Scanahand it would be simple - if there were a simple way to select needed glyphs. (See at the bottom of this message)

Scanahand must be easy to use. It is designed for the beginner who knows nothing about Unicode or mappings. Professionals can use a program with all of the required features.

It would be really easy to select between few ready made templates or select languages - if the program is made simplicity in mind.

The user does this: select language(s) from dropdown
The program does this: languages (english, swedish, russian) -> scripts (latin, armenian, cyrillic) -> unicode ranges (x0000...xFFFF) -> print as templete pages

User is so NOT needed to know mappings etc., he/she selects only language/languages.

The Scanahand can take care of glyph selection and mappings. There is for example this list of scripts used in languages:
http://www.unicode.org/cldr/data/charts ... uages.html

Also professionals has need for simple way to scan font and do the mappings. As far as I know there is no this like program for beginner or professional.

I have suggested before that mapping could be a separate process to scanning, but this would make Scanahand difficult to use for amateurs who don't know about mapping.

Amateur or pro has no need to know about mappings. The program can do this.

Typing accented glyphs like ä á å might not be too much to ask for European users who use a language-specific keyboard, but for others it would I think lead to more errors and frustration than filling in a template with just the extra glyphs that they want.

The user has no need to type glyphs. He/she only writes these glyphs to template pages accorging to sample glyphs in template.

Mapping individual glyphs would add considerably to the complexity and time required to make a font.

The program can do the mapping in few milliseconds without frustration.

Generating custom templates on the fly by selecting glyphs from a character map or typing them from the keyboard is an interesting idea that might be viable. I don't see it happening just yet though.

Glyph selection could be done in several ways:
a) one or few ready made templates for usual purposes (beginner)
b) language based selection
c) character map (visual table of glyphs) selection (beginner and professional)
d) typing characters by keyboard (beginner and professional)
e) unicode ranges by string (professional)

If there were a demo of these alternatives, it could be simple to say which one are best...