Emoji

Regarding the following document.

http://www.unicode.org/L2/L2014/14093-utr51-draft-emoji.pdf

Please consider in particular the last two paragraphs of section 3 of the document.

Having seen the earlier HTML version I produced the following document.
The_format_of_the_readouts.dat_file_suggested_for_possible_use_in_the_application_of_localized_read-out_labels.pdf (33.5 KB)
The format is almost an exact copy of the format in the document available in the following post.

http://forum.high-logic.com:9080/t/localizable-sentences-experiment-font-support/2475/89

Since publishing the readouts.dat format document I have received helpful advice about the XLIFF format.

I knew nothing at all about XLIFF.

I found the following.

http://en.wikipedia.org/wiki/OASIS_(organization)

http://en.wikipedia.org/wiki/XLIFF

http://docs.oasis-open.org/xliff/xliff-core/xliff-core.html

I have not studied it all yet.

Suppose that the Unicode Technical Committee accepts section 3 of the 14093-utr51-draft-emoji.pdf document. (I hope they do.)

Suppose then that a manufacturer of a text-to-speech system then sees that section and asks as follows.

β€œIn adding that facility into our text-to-speech system, is there a portable file format that we can use so that our user community can crowd-source localizations of the emoji into the many languages with which our text-to-speech system is used?” asks the manufacturer.

I feel that as XLIFF exists that the readouts.dat format may well never be used by most businesses. However, perhaps the readouts.dat format might be useful for student projects and some research and development projects.

I am thinking about adding some features so as to assist a software tool to convert from readouts.dat format to XLIFF format, while still keeping the same original lightweight processing demand if someone wishes to program a routine to read in a readouts.dat file to, say, a text-to-speech program.

So as well as its original intended use, maybe the readouts.dat format augmented so as to assist a software tool to convert from readouts.dat format to XLIFF format will be a convenient way for people to enter the localization data into a computer system prior to an XLIFF file being produced from the readouts.dat file.

William Overington

3 May 2014

Here are some notes that might be of interest to some readers.

XLIFF 2.0 candidate standard is due to be published today.

http://docs.oasis-open.org/xliff/xliff-core/v2.0/cs01/xliff-core-v2.0-cs01.html

http://docs.oasis-open.org/xliff/xliff-core/v2.0/cos01/xliff-core-v2.0-cos01.html

Reference in my draft text below to XLIFF refers to XLIFF 1.2 format.

http://docs.oasis-open.org/xliff/xliff-core/xliff-core.html

http://en.wikipedia.org/wiki/XLIFF

MY DRAFT TEXT SO FAR ONLY

If the first character of the line of a readouts.dat file is an ASTERISK the line is a comment for the primary purpose of using a readouts.dat file, namely of providing a text string in a particular language that describes in words the description of a particular pictograph character.

It would be possible, in principle, to use a specially written software tool to convert the contents of a readouts.dat file to an XLIFF file that has the same information content, though presented in an XLIFF structure.

For this purpose, several additional features are now introduced into the format of a readouts.dat file, though in such a way that the same original format may be used when using the readouts.dat file for its primary purpose.

These features are as follows, all being defined only when the first character of the line is an ASTERISK: the definitions being based upon the second character of the line when the first character of the line is an ASTERISK.

*[

On a line starting *[ the text after the second character, if there is any, can be used in the source string of an XLIFF trans-unit element.

As conversion of a localization line of a readouts.dat file to XLIFF coding takes place, the latest use of a *[ line indicates which string should be used in the source string of the XLIFF trans-unit element related to that localization line.

For example

*[en-GB

Please note that no quotation marks are used in the *[ line.

*]

On a line starting *] the text after the second character, if there is any, can be used in the target string of an XLIFF trans-unit element.

As conversion of a localization line of a readouts.dat file to XLIFF coding takes place, the latest use of a *] line indicates which string should be used in the target string of the XLIFF trans-unit element related to that localization line.

For example

*]en-GB

Please note that in the original use for a readouts.dat file there would only be one use of *] in the file, at the start of the file before any localization lines.

There might not be any use of a *[ in the file if the source is pictograph symbols such as emoji.


As I learn which information that could be in comments in a readouts.dat file that could usefully be formatted so as to facilitate automated transfer into an XLIFF file, I may define other features, for example lines in a readouts.dat file that start with *{ and *} and maybe *_ and, if needed, some others as well.

William Overington

5 May 2014

Regarding the above referenced document.

Some readers might like to try the following.

On the following web page,

http://www.unicode.org/L2/L-curdoc.htm

please find the link about document L2/14-093 and note that there is a link

Snapshot; HTML version

and clicking that link leads to working draft 4 dated 2013-06-09, though maybe 2014-06-09 is intended as it is now version 4.

That draft no longer mentions read-out labels yet does refer to a TTS name, meaning text-to-speech.

I noticed that in particular: there appear to be other changes in the document as well.

William Overington

20 June 2014

There are lots of new encoding proposals linked from the following page.

http://www.unicode.org/L2/L-curdoc.htm

Some readers might like to view my contribution in the Other Reports section of the following document that is linked from that page.

http://www.unicode.org/L2/L2015/15019-pubrev.html

William Overington

3 February 2015

The following document is interesting.

http://www.unicode.org/L2/L2016/16008r3-custom-emoji.pdf

Can these customized items be implemented using FontCreator?

Please note in particular the Private Use section, which seems to open up vast amounts of coding space for private use, with graceful display of a fallback character if the private encoding is not recognized.

I am wondering whether it is possible to use such a coding on web pages using a webfont.

William Overington

30 January 2016

I found the following web page yesterday.

http://grouplens.org/blog/investigating-the-potential-for-miscommunication-using-emoji/

There is a pdf document linked from that page.

http://grouplens.org/site-content/uploads/Emoji_Interpretation.pdf

That pdf document appears to be the substantial document about the research.

The research is very interesting in relation to fonts as it includes some research on miscommunication that can be caused by the emoji fonts on different devices having different designs of the glyph for the same character code.

So the person sending a message may see, for a particular emoji character, one design of glyph and the recipient see a different design of glyph; and in some cases the different designs are being interpreted as having a different intended meaning.

I notice that they mention the Oxford Dictionary in relation to emoji. I had not known about that.

http://blog.oxforddictionaries.com/2015/11/word-of-the-year-2015-emoji/

There is a video on the above page as well.

William Overington

14 April 2016

I have just noticed the following document, Jurassic Emoji Proposal.

http://www.unicode.org/L2/L2016/16072-jurassic-emoji.pdf

It is available from the following web page.

http://www.unicode.org/L2/L-curdoc.htm

An interesting quote from the document.

quote

On a figurative level, the brontosaurus may be used to indicate that someone is sweet, or that someone is particularly clueless.

end quote

That seems very ambiguous! :slight_smile:

William Overington

Friday 15 April 2016

This looks fascinating. It is emoji-related, though not totally about emoji.

http://www.unicode.org/L2/L2016/16105-unicode-image-hash.pdf

It is linked from the following web page.

http://www.unicode.org/L2/L-curdoc.htm

William Overington

Tuesday 3 May 2016

I have noticed the following document.

http://www.unicode.org/L2/L2016/16128-additional-emoji-selection-factor.pdf

It is available linked from the following web page.

http://www.unicode.org/L2/L-curdoc.htm

William Overington

Saturday 7 May 2016

I have submitted some feedback about emoji-related proposals to the Unicode Technical Committee.

They are in the following document.

http://www.unicode.org/L2/L2016/16123-pubrev.html

It is linked from the following page.

http://www.unicode.org/L2/L-curdoc.htm

There is feedback on two topics, namely the following.

Feedback on Jurassic Emoji proposal

Feedback on Coded Hashes of Arbitrary Images proposal

William Overington

7 May 2016