How to (auto)generate font with many special characters

Hello,

I’m new to creating fonts, and I’m facing a problem of creating a rather complex one for my project. I am currently trying to find a tool that would help me with the task. I was hoping I could get some help here.
I am working on a an abstract machine editor/simulator (Finite State Automata, Turing machines) and I need a font to display special characters these machines work with. Basically they are standard alphabet letters with a number of lines below or above them. The problem is each character can have 0 to 3 lines over itself and 0 to 3 below (see attached picture). I need all possible combinations of these, and I need it for all the letters of the alphabet, both lowercase and uppercase. This means I need to create some 780 new characters. That’s far too many for me to create them manually in the editor.
I need a tool that could generate these characters, based on some set of parameters. I also need the new characters to have some logical unicode values, so I can easily handle them programatically. Can this be done with FontCreator? Do you know of anything else that could help me?

Thank you in advance for your answers.

The Professional edition of FontCreator is ideally suited to a task like this. You can use complete composites to create your several versions of each letter, as one normally does for à á ã á ä ā and so forth.

You could perhaps replace each accent with one, two, or three horizontal lines, and use the built-in CompositeData.xml file to generate your special composite characters, but Unicode code-points are not assigned for all possible combinations of a-z and accents, so you would need to add some more.

CompositeData.xml is a plain text file that can be edited to add your own custom assignments to create your custom composite characters in the Private Use Area. FontCreator already does this to create Petite Capitals with accents in the PUA. Run the Transform Script from Tools, Glyph Transformer, Open, Petite Capitals script on a standard font to see it in action.

Alternatively, you can just use copy and paste to create composites, which may prove to be much less work, unless you’re planning to edit several typefaces in the same way, e.g. regular, bold, italic, and bold italic, for both Serif and Sans Serif fonts. See my tutorial on Complete Composites to see how to do this.

Producing the composites below by modifying existing accents was a matter of a few minutes’ work without any special modifications being required. As you can see, there may be problems with the length of lines over/under characters of different width.
Composite Vowels.png

I wondered how the number 780 was reached.

Having 0 to 3 lines above is a choice of 4.
Having 0 to 3 lines below is a choice of 4.
So there are 16 possible choices for each letter.
There are 52 letters.
So 16 choices for each of 52 letters is a total of 832 special characters.

As 780 is 52 less than 832 it seems that you are intending to use the ordinary alphabetic letters A..Z and a..z as representing the special characters where there is no line above and no line below. It is your project and it is your decision, yet it seems to me that it is worth considering whether it would be better to have 832 special characters in the Unicode Private Use Area rather than use the ordinary letters of the alphabet for some of them. It is alright to do this as Unicode uses the same glyph for different characters in some cases, for example the first letter of the Latin alphabet and the first letter of the Greek alphabet: this is where the meaning is different, even though the glyphs display the same as each other,

The following thread might be of interest.

http://forum.high-logic.com:9080/t/newby-question/2457/1

In that thread the question is about adding some special symbols for engineering to a font.

A quick glance at the following thread might be of interest: it features a font for representing instructions in a portable interpretable object code using character code points in the Unicode Private Use Area.

http://forum.high-logic.com:9080/t/a-font-for-some-experiments-in-computing/2197/1

I would suggest a logical layout for the code points that you choose to use. There is no need to use 780 or 832 code points in sequence.

For example, the following is one possible scheme.

U+E001 A
U+E002 B
.
.
U+E019 Y
U+E01A Z

U+E021 a
U+E022 b
.
.
U+E039 y
U+E03A z

Then 1 line above could be regarded as adding hexadecimal 40 to the value, 2 lines above could be regarded as adding hexadecimal 80 to the value and 3 lines above could be regarded as adding hexadecimal C0 to the value.

Then for the whole collection of 208 code points thus far, 1 line below could be regarded as adding hexadecimal 100 to the value, 2 lines below could be regarded as adding hexadecimal 200 to the value and 3 lines below could be regarded as adding hexadecimal 300 to the value.

That would produce 832 code point assignments for your special characters.

Well, they need to be generated and however it is achieved it will be a considerable task.

I would suggest considering the use of a monowidth font design for the glyphs. That way the lines will be the same width for each character, which will reduce the size of the task of producing the font significantly.

I hope that this helps.

William Overington

4 May 2009

Your illustration involving the letter a and the illustration by Bhikkhu Pesala for a e i o u are interesting.

Thinking further about this, I am wondering quite how this would work for letters such as h and g and d and p and q. Are you wanting the lines above the top of the ascenders and descenders or tucked in to the side of them? Also, as regards a and A, it seems from your illustration that the lines are directly above the a. Yet for the A they would not be at the same level as those above the a, unless the a was made to be the same height as the A: yet that would pose problems if the c were the same height as the C. Letter pairs such as a and A and e and E and q and Q would be clearly distinguished if the letters were all made the same height, though not c and C and s and S and o and O.

Whilst the design of the characters is entirely your choice, may I ask if it is essential that the lines are above and below the letter? For example, would it be acceptable for the way of expressing one of the special characters to be by using a sequence of two characters, namely the letter followed by one of 16 special symbols, such as, for example, a diamond, (expressed as a square rotated by 45 degrees such that it is the same height as a letter e), with 0 to 3 lines above and 0 to 3 lines below. This would mean that only 16 special characters would be needed in the Private Use Area.

Internal processing and storage could, if desired, still use the 782 special character code points, yet display and editing within the user interface and printing in any publication could use the sequence of two characters.

I recognize that if the nomenclature is something that is in use already in your field of study that devising an alternative way of displaying what is meant may not be appropriate, yet if you are devising a new system then maybe it might perhaps be worth considering, given that some of the lowercase letters have an ascender or a descender.

However, I respect that your research is your research so please feel free to say that it must be as you first mentioned if you so wish.

William Overington

4 May 2009

Font Creator offers MANY tools to produce the results you want.

In addition to those already mentioned (Composites, cut and paste) is using the “Add Rectangle” and “Add Ellipse” Tools at the top of the page. They may be even faster than cut and paste.

Another is Samples to change the characters shown on the left side of the screen that can then be drug into any editing window (IF they are not composites in the original font):
http://forum.high-logic.com:9080/t/sample-great-tool-for-spies-secret-codes/1342/1





Good luck on your project

I thought of some code point assignments that might be of use should you wish to try the idea of the 16 special characters.

U+E420 DIAMOND
U+E421 DIAMOND WITH ONE HORIZONTAL LINE ABOVE
U+E422 DIAMOND WITH TWO HORIZONTAL LINES ABOVE
U+E423 DIAMOND WITH THREE HORIZONTAL LINES ABOVE

U+E424 DIAMOND WITH ONE HORIZONTAL LINE BELOW
U+E425 DIAMOND WITH ONE HORIZONTAL LINE ABOVE AND ONE HORIZONTAL LINE BELOW
U+E426 DIAMOND WITH TWO HORIZONTAL LINES ABOVE AND ONE HORIZONTAL LINE BELOW
U+E427 DIAMOND WITH THREE HORIZONTAL LINES ABOVE AND ONE HORIZONTAL LINE BELOW

U+E428 DIAMOND WITH TWO HORIZONTAL LINES BELOW
U+E429 DIAMOND WITH ONE HORIZONTAL LINE ABOVE AND TWO HORIZONTAL LINES BELOW
U+E42A DIAMOND WITH TWO HORIZONTAL LINES ABOVE AND TWO HORIZONTAL LINES BELOW
U+E42B DIAMOND WITH THREE HORIZONTAL LINES ABOVE AND TWO HORIZONTAL LINES BELOW

U+E42C DIAMOND WITH THREE HORIZONTAL LINES BELOW
U+E42D DIAMOND WITH ONE HORIZONTAL LINE ABOVE AND THREE HORIZONTAL LINES BELOW
U+E42E DIAMOND WITH TWO HORIZONTAL LINES ABOVE AND THREE HORIZONTAL LINES BELOW
U+E42F DIAMOND WITH THREE HORIZONTAL LINES ABOVE AND THREE HORIZONTAL LINES BELOW

These have been chosen so that the decimal equivalent of the hexadecimal values end in 00 to 15, so that if entering code points into WordPad using the Alt method that the codes are more easily remembered. The decimal values are from 58400 to 58415.

These codes may not be of any use whatsoever to you if you choose not to use the diamond method, yet I have included them just in case they might be useful.

William Overington

4 May 2009