Today the Unicode consortium announces Unicode Emoji, Version 4.0, which brings 113 new emoji along with further enhancing gender representation and professions. These new emoji are already appearing on smart phones and other devices and platforms that support emoji. See the full list in Emoji Recently Added.
The new emoji will soon be available for adoption, helping fund projects to improve language support.
Unlike the 72 emoji characters added to Unicode 9.0 in June, these are not new Unicode characters. Most of these new emoji are sequences of existing emoji, “glued together” with a special invisible character so that they appear and behave like a single character. This glue character is called a ZWJ, pronounced “zwidge” or /zwɪdʒ/. Three existing Unicode 9.0 characters (gender and medical symbols) were changed to qualify as emoji, for use in those ZWJ sequences.
Two of the new sequences are flags, 10 are family groupings (such as mother with daughter), 32 are new professions/roles (such as man or woman astronaut), and 66 are explicit-gendered variants (such as man or woman running). 99 of these sequences, plus 5 other characters (such as snowboarder), can also now have the 5 skin tone modifiers.
U+1F469 U+200D U+1F467
U+1F468 U+200D U+1F680
U+1F469 U+200D U+1F680
U+1F3C2 U+1F3FB
U+1F3C2 U+1F3FF
The technical documentation has also been updated, with additional guidelines for implementers and the new versions of the emoji data files for use in programs.
Interesting! I’m now trying to get my head around the difference between a zero-width joiner (ZWJ) and a zero-width no-break space (ZWNBS). Likewise for the zero-width non-joiner and the zero-width space.
I’m now trying to get my head around the difference between a zero-width joiner (ZWJ) and a zero-width no-break space (ZWNBS). Likewise for the zero-width non-joiner and the zero-width space.
If you look at the bytes inside a file saved from Microsoft WordPad as a Unicode Text Document, there is a BOM character, that is a ZWNBSP character, at the start of the file so as to show in which order the 8-bit bytes of the 16-bit UTF-16characters are stored, that is whether they are stored as HI LO HI LO HI LO … or LO HI LO HI LO HI … . In fact they are stored as LO HI LO HI LO HI … by WordPad.
Such looking at the bytes inside the file can be carried out using the ViewHex program that Erwin kindly made available in this forum.