How to do autokerning of all glyphs of a font

I am an IT professional and I study typeface design as a hobby. I use FontCreator (version 15) to dissect professional fonts and experiment with them. Lately I have been dedicated to studying the art and technique of kerning. I have been looking for ways to automate the process as much as possible. With this goal, I analyzed how FontCreator’s AutoKern tool works. I ended up developing a method to automatically generate kerning pairs for all glyphs in a font. Since I have not found anything similar in online forums or YouTube tutorials on the subject, I decided to share it here.

FontCreator’s Glyph Transformer tool has a script called “Latin Scripts Characters Glyphs to Autokerning” that “inserts the Latin scripts character sets for which FontCreator makes autokerning.” There are exactly 1,435 characters from 18 Unicode blocks. These characters are created blank, and the user must replace them, or copy into them, the glyphs corresponding to each character. But no check is made that the glyphs actually correspond to the specified Latin letters for each character. So, in principle, the user can place any glyphs from any languages in those characters, and the AutoKern tool will operate on them in the same way.

This is the essence of the method: trick FontCreator’s AutoKern tool into operating on any glyphs the user wants. The complication is dealing with the idiosyncrasies and limitations of the tool’s own algorithm. Careful planning and attentive execution are necessary to achieve a good result. And, above all, a lot of patience. (The whole process can take several days!)

1 Like

HOW IT WORKS

To begin, it is necessary to understand how the AutoKern algorithm behaves. Just run it once on a reasonable subset of Latin characters (about 200 is enough) and examine the generated code in the OpenType Designer’s Code Editor. Some peculiarities stand out. AutoKern creates kerning classes for all glyphs, for both the first and second components of all possible pairs. The exact criteria for distributing glyphs into classes are not evident (they are parameters embedded in the algorithm itself), but they do not matter. The important thing is to note that relatively few multi‑glyph classes are created. Most glyphs do not fit into any other class and end up alone in their own class.

Watching the process while it runs, you can see that the progress bar moves quickly at first, then slows down until it apparently stops. But it keeps running, as the message “Analyzing character X” assures us, with the analyzed character continuing to change. (It is not really the character being analyzed, but the glyph assigned to the character. For FontCreator, as for all font design software, a character is just a position occupied by a glyph in a table.)

What happens is that calculations must be made for each glyph to see whether it can be placed in an already created kerning class; if it cannot, a new kerning class is created to accommodate it. Since most glyphs will not fit into any other class, the number of classes grows quickly, and comparing each new glyph with the preexisting classes takes more and more time. That is why, at the beginning, the process runs fast (few classes exist) and then becomes slower and slower. This one‑by‑one analysis of glyphs can take several hours of uninterrupted execution for all 1,435 glyphs that AutoKern can process. After that, the effective distance calculations for each pair of kerning classes are relatively quick, a matter of minutes.

The important point for the method to be presented is how the AutoKern algorithm names kerning classes. For the 52 glyphs allocated to the characters corresponding to the uppercase and lowercase letters of the Latin alphabet (all in the Unicode Basic Latin block), the corresponding kerning classes receive the names of those characters (First_A, First_b, Second_C, Second_d, etc.), whether the glyphs are alone or grouped with others in those classes. But for all the other 1,383 glyphs, the naming pattern is different: if the class contains more than one glyph, it is numbered (First1, First2, Second10, Second20, etc.); but if the class contains only one glyph, it gets the name of that glyph. So, if the user places, for example, Greek letter glyphs properly named in the positions of the Latin characters, the isolated‑glyph classes that will be generated will receive names like First_Alpha-grek and Second_alphatonos-grek. These kerning classes can later be merged with others for actual Latin glyphs with no chance of confusion.

The problem is the numbered classes that contain several glyphs. When you combine in the same FEA code (Adobe OpenType Feature Description Language) kerning classes generated for Latin glyphs and classes generated for Greek glyphs, for example, there will be classes with the same numbered names but different glyphs. It is not very hard, however, to undo the confusion. You just need to, after the Greek kerning classes are generated, add the infix Greek to their names. In the Code Editor this is done with twenty find‑replace operations applied to all the code generated by AutoKern, one for each digit and each pair of classes, to insert Greek between First or Second and the following digit, turning FirstN into FirstGreekN and SecondN into SecondGreekN.

The same happens with the classes that receive the names of the letters of the Latin alphabet. If you place other glyphs in those characters, the generated classes will still have Latin letters in their names and will be confused with other classes of the same name generated for the actual Latin characters. Here the solution is more laborious: you must replace, in each of those classes, the Latin letter with the name of the first glyph in the class. For example, change First_D to First_Delta-grek (if you placed the glyph Delta-grek in the character D) and Second_q to Second_alphatonos-grek (if you placed the glyph alphatonos-grek in the character q). Since there are 52 letters and two types of classes, there are up to 104 find‑replace operations to be done across all code generated by AutoKern.

If you do not want the tedious work of making those 124 class‑name substitutions, there is a way, but at a cost. The AutoKern algorithm has a peculiarity in its program logic. If there are no glyphs with outlines in the characters of the letters of the Latin alphabet (that is, if those letter positions in the Basic Latin block are blank glyphs, or do not even exist in the FCP file), the algorithm cannot group the remaining glyphs into kerning classes. In other words, all other glyphs will each be placed in their own kerning class, even though the algorithm keeps wasting a lot of time analyzing each of them. Somehow, the glyphs placed in those 52 Basic Latin characters serve as a reference for the calculations needed to group the other glyphs into classes. It seems to be a flaw in the program’s logic, because the algorithm should be able to group the glyphs from the remaining characters into appropriate kerning classes even in the absence of the 52 alphabet letters. (But I understand the programmer could not imagine that someone would try to use AutoKern to generate kerning classes that did not include the letters of the alphabet.)

The fact is that if you do not assign glyphs with outlines to the Basic Latin block characters, AutoKern will generate exactly as many kerning classes as there are other glyphs (up to 1,383), one class per glyph, each glyph in its own class. And, as already explained, each single‑glyph class will receive the name of that glyph. Thus there will be absolutely no risk that, at the end, when the kerning classes for different languages are combined into a single FEA file, there will be homonymous classes with different glyphs. This way, the user saves the work of modifying the names of the generated classes. But, as I said, there is a cost: with each glyph in its own kerning class, there will be more pairs of classes, and the whole process will take much longer. And the FEA code generated by AutoKern will also have many more lines. And the size of the TTF file embedding the kerning information will also be larger.

The explanation above covers the essence of the process. From it, one can outline a step‑by‑step process to do auto‑kerning with any sets of glyphs, from any Unicode blocks, up to a maximum of 1,435 glyphs. But what if you need more? Many professional fonts have hundreds of extra glyphs used in stylistic variations. What if you want to obtain kerning pairs for all those variations? The strategy here is divide and conquer.

To understand the process, imagine that you decide to work only with the 1,383 characters that remain after excluding the 52 from the Basic Latin block. Since each glyph will be in its own kerning class, it does not even make sense to speak of classes, and we will speak only of glyphs. However you distribute the glyphs across the characters, AutoKern will compute the pairing of each glyph with all the others, in both possible positions (one on the left, the other on the right). It will even compute the pairing of each glyph with itself. In mathematical terms, it will compute the Cartesian product of the set of all glyphs. If you are working only with Latin glyphs, you can very well call this set of glyphs L, and the set of kerning pairs for all these glyphs will be the Cartesian product L×L.

But suppose you decide, to better organize things, to divide all those glyphs into two large groups, placing, for example, all uppercase letters together, separate from the lowercase ones, or all letters that include diacritics separate from those without diacritics. Call the first subset of glyphs A and the second subset B. It should be clear to you that AutoKern will pair all glyphs in subset A among themselves, and also all glyphs in subset B among themselves, and also each glyph in A with each glyph in B (in that order) and each glyph in B with each glyph in A (in that order). In mathematical terms, you will have divided the set L into two disjoint subsets A and B, and induced AutoKern to compute the products A×A, B×B, A×B, and B×A, which is equivalent to computing the Cartesian product L×L, only in parts.

Now suppose you increase your set L of Latin glyphs to include a third subset C, containing stylistic variations of several letters. Now you have more glyphs than AutoKern can process all at once. How can you use AutoKern to obtain the new Cartesian product L×L? I will give you a moment to think…


By making AutoKern compute the Cartesian products of subsets A, B, and C of set L taken two by two! First, the products involving A and B — A×A, B×B, A×B, and B×A, as before; then the products involving A and C — A×A, C×C, A×C, and C×A; finally, the products involving B and C — B×B, C×C, B×C, and C×B. That is, you first run AutoKern with the glyphs from A and from B and save the generated code to an FEA file; next, run it again, this time with the glyphs from A and from C, and save the code to another FEA file; and run a third time with the glyphs from B and C and save the result to a third FEA file. After that, just merge the three generated files into one, which will contain all pairs of all glyphs.

Note that the products of each subset with itself will be calculated twice, and the pairings of its glyphs will appear duplicated in the unified file. These repetitions must be eliminated. Strictly speaking, that is not mandatory, because the OpenType Designer’s Code Editor lets these errors pass when compiling the code, merely warning about them. But since there will be tens of thousands of repeated lines, removing them will substantially reduce the code’s compilation time and make the final FEA file less gigantic.

A practical problem is that the Code Editor does not have a feature for automatically removing repeated lines. It only warns about them when it checks the code’s syntax, but leaves the removal to the user. Other applications, like Word and Excel, do have this feature, but the number of lines generated by AutoKern is often much larger than these applications support. I solved this problem by asking ChatGPT to write a small Python function for me to remove the duplicate lines in the code generated by AutoKern.

There is, however, a more serious problem to consider. In the reasoning described above, we considered only unitary kerning classes, with a single glyph each. That is because when you run AutoKern in parts, multi‑glyph kerning classes cause intractable problems. To understand how, suppose that, in the distribution of glyphs into three subsets, the glyph E ends up in subset A, the glyph Eacute ends up in subset B, and the glyph Edieresis in subset C. Also suppose that the AutoKern algorithm determines that the three, having identical left‑side outlines, can be placed in the same right‑side class in the pairings (a Second class).

What will happen is that when you run AutoKern with subsets A and B, the algorithm will create the class Second_E and include in it the glyphs E and Eacute, but not the glyph Edieresis, which will be neither in A nor in B. Next, when you run AutoKern with subsets A and C, the algorithm will create another class Second_E and include in it the glyphs E and Edieresis, but not the glyph Eacute, which will be neither in A nor in C. And later, when you run AutoKern with subsets B and C, the algorithm will either create a numbered class SecondN and include in it the glyphs Eacute and Edieresis (if you placed some other glyph in the character E), or it will create two classes for each glyph, Second_Eacute and Second_Edieresis (if you left the character E blank).

Later, when you merge all the classes into the same FEA file, you will have two classes Second_E with different glyphs, and the glyphs Eacute and Edieresis repeated in different classes. And each of these classes will have been paired by the algorithm with all the others. Since all these classes will have identical left‑side outlines, you would expect them to have the same kerning values in the pairings, which in theory would not cause problems. But we do not know the parameters or the logic of the AutoKern algorithm to be sure that the computed values will be the same for all these classes. Even if those values are identical, we do not know how the Code Editor’s compiler will treat these redundancies when checking syntax. Even if the compiler accepts these inconsistencies or redundancies, we do not know how the commercial software that will render the font will handle them.

These inconsistencies and redundancies can occur with any multi‑glyph classes, not only the 104 named with letters, but also the hundreds of others named with numbers. In probabilistic terms, it will happen to many of them, potentially involving thousands of glyphs and tens of thousands of pairings. Good luck debugging all that…

It is therefore much safer, when running AutoKern in parts, to exclude the 52 characters from Basic Latin block to induce AutoKern to create only classes with isolated glyphs, which will guaranteedly have different names and different glyphs. In short, this applies the KISS principle to the problem — Keep It Simple, Stupid!

This approach, however, as explained earlier, considerably increases the number of classes and pairings between them. It also limits the number of glyphs to be processed in each AutoKern run to 1,383. And since two subsets of the total glyphs will have to be processed in each run, each one will have to have at most half that amount: one with 691, the other with 692 glyphs. To make the math easier, let us normalize to 690 glyphs per subset, 1,380 in each AutoKern run. In practice, the calculations to be performed are as follows:

  1. Divide the total number of glyphs to be processed by 690. If it yields a fractional number (it most likely will), round up to the next integer. This is the number N of subsets into which the glyphs will have to be distributed.

  2. Divide the total number of glyphs to be processed by this number N of subsets. The result is the number of glyphs that should be placed in each subset. If the division is not exact (it probably will not be), some subsets will have one more or one fewer glyph than the others.

  3. The number of AutoKern runs needed to compute all pairs of all glyphs will be a combination of the N subsets taken 2 at a time.

Three subsets (more than 1,380, up to 2,070 glyphs) will require three AutoKern runs; four subsets (up to 2,760 glyphs) will require six runs; five subsets (up to 3,450 glyphs) already mean 10 runs; six subsets (up to 4,140 glyphs), fifteen runs; seven subsets (up to 4,830 glyphs), 21 runs; eight subsets (up to 5,520 glyphs), 28 runs; and so on.

The AutoKern runs do not need to be sequential. To save time, they can be performed simultaneously, in parallel. My notebook — with a 12th Gen Intel Core i7‑1255U 1.70 GHz processor, 16 GB RAM, Windows 11 Pro 64‑bit — can run up to nine instances of FontCreator simultaneously, each one executing an AutoKern on 1,380 glyphs, without a drop in the performance of each instance. You can measure your machine’s performance (if it is a PC) with the Task Manager app, native to Windows: when several instances of FontCreator running AutoKern in parallel occupy 90% of the CPU processing time, you will have reached the maximum performance of your system for this type of processing. Running more instances of FontCreator will force Windows to dedicate less CPU time to each task, nullifying the benefits of parallelism.

The method explained above makes it feasible to automate kerning for all possible pairs of glyphs in a font. The number of glyphs is limited only by the available computational power. The values obtained for each pair, however, should be taken only as default values. The designer will still need to analyze the most frequent pairs in the most widely used languages to check whether the values assigned by AutoKern are indeed the most appropriate, or whether they need to be modified. No algorithm (for now) replaces the expertise of a good type designer.

And here are my suggestions for the FontCreator developers to improve AutoKern:

  1. DO NOT FIX whatever it is that makes the AutoKern algorithm unable to group glyphs into kerning classes when the 52 Basic Latin characters are missing. It is this providential deficiency that makes it possible to use the tool on partitions of large sets of glyphs and then merge these partitions without inconsistencies or redundancies in the generated classes.

  2. Create an option to run AutoKern without trying to group glyphs in kerning classes, instead putting them each in its own class from the beginning . Because when the 52 Basic Latin characters are missing, the many hours the program spends trying unsuccessfully to group the other glyphs into classes are just wasted processing time and electricity (therefore, money).

Even for those who will run AutoKern “the right way” for which it was designed, grouping glyphs into kerning classes may not always be desirable. Kerning classes were invented to somewhat expand type designers’ ability to manually make glyph pairings. Still, it is a modest increase compared to what a computer can do. Today’s machines are powerful enough to calculate the proper spacing of hundreds of thousands of glyph pairs in minutes. But they still take hours to analyze a few thousand glyphs and decide whether they can be grouped into classes. If the designer is going to accept the default values generated by AutoKern, making few later changes, the designer will be more interested in obtaining those values quickly, to have more time later to make the necessary adjustments.

PUTTING IT INTO PRACTICE

If you have understood everything (or almost) up to this point, you are probably eager to see how this can be done in practice. As you might suspect, it is not very simple. Many different operations are performed in several stages, and it is easy to get tangled up. If there is interest, I can write a step‑by‑step tutorial based on the experiments I did to develop the method presented here. I would use a commercial font as an example to give a real sense of how this solution can work in practice.

Nice find! Too bad FontCreator’s AutoKern is still mostly geared toward Latin, but your workaround is clever.

Also note: if you already have a kerning lookup with non-Latin characters, you can right-click that lookup in the treeview and select Auto Kern.