Unicode Brings Skin Tones to Snowboarders

ErwinDenissen · November 28, 2016, 11:15pm

Today the Unicode consortium announces Unicode Emoji, Version 4.0, which brings 113 new emoji along with further enhancing gender representation and professions. These new emoji are already appearing on smart phones and other devices and platforms that support emoji. See the full list in Emoji Recently Added.

The new emoji will soon be available for adoption, helping fund projects to improve language support.

Unlike the 72 emoji characters added to Unicode 9.0 in June, these are not new Unicode characters. Most of these new emoji are sequences of existing emoji, “glued together” with a special invisible character so that they appear and behave like a single character. This glue character is called a ZWJ, pronounced “zwidge” or /zwɪdʒ/. Three existing Unicode 9.0 characters (gender and medical symbols) were changed to qualify as emoji, for use in those ZWJ sequences.

Two of the new sequences are flags, 10 are family groupings (such as mother with daughter), 32 are new professions/roles (such as man or woman astronaut), and 66 are explicit-gendered variants (such as man or woman running). 99 of these sequences, plus 5 other characters (such as snowboarder), can also now have the 5 skin tone modifiers.

U+1F469 U+200D U+1F467

U+1F468 U+200D U+1F680

U+1F469 U+200D U+1F680

U+1F3C2 U+1F3FB

U+1F3C2 U+1F3FF

The technical documentation has also been updated, with additional guidelines for implementers and the new versions of the emoji data files for use in programs.

http://blog.unicode.org/2016/11/113-new-unicode-emoji-plus-skin-tones.html

Alfred · November 29, 2016, 8:42am

Interesting! I’m now trying to get my head around the difference between a zero-width joiner (ZWJ) and a zero-width no-break space (ZWNBS). Likewise for the zero-width non-joiner and the zero-width space.

William · December 1, 2016, 11:11am

I’m now trying to get my head around the difference between a zero-width joiner (ZWJ) and a zero-width no-break space (ZWNBS). Likewise for the zero-width non-joiner and the zero-width space.

ZWJ is U+200D

ZWNBSP is U+FEFF, also known as BYTE ORDER MARK

ZWNJ is U+200C

ZWS is U+200B

WJ is U+2060

I found the following.

ZWJ

http://www.fileformat.info/info/unicode/char/200D/index.htm

There is a reference to a post by the late Michael Kaplan.

That link is no longer available, but I have found that it is archived elsewhere.

http://archives.miloush.net/michkap/archive/2006/02/15/532394.html

ZWNBSP

http://www.fileformat.info/info/unicode/char/FEFF/index.htm

There are some notes about the characters in the code charts available from the following web page.

http://www.unicode.org/charts/

There may well be further notes in the Unicode Standard.

http://www.unicode.org/versions/Unicode9.0.0/

If there are such notes, referencing within the code charts would be helpful.

William

William · December 1, 2016, 11:22am

If you look at the bytes inside a file saved from Microsoft WordPad as a Unicode Text Document, there is a BOM character, that is a ZWNBSP character, at the start of the file so as to show in which order the 8-bit bytes of the 16-bit UTF-16characters are stored, that is whether they are stored as HI LO HI LO HI LO … or LO HI LO HI LO HI … . In fact they are stored as LO HI LO HI LO HI … by WordPad.

Such looking at the bytes inside the file can be carried out using the ViewHex program that Erwin kindly made available in this forum.

http://forum.high-logic.com:9080/t/obtaining-a-hexadecimal-dump-of-a-unicode-text-document/2342/2

William