dequis.org

notes

emoji

A few random notes regarding emoji support.

fonts

The "Symbola" font provides black and white emoji as normal font vectors

"Apple color emoji" uses apple's method, is included in recent iOS versions

"Segoe UI emoji" uses microsoft's method, is included in windows 8

"Noto Color Emoji" uses google's method, is included in recent android versions.

see also: my fork of Noto Color Emoji with 12x12 glyphs (ttf download) to workaround implementations without bitmap downscaling

ways to embed emoji in fonts

google http://google-opensource.blogspot.com/2013/05/open-standard-color-font-fun-for.html (implemented in freetype)
microsoft http://typography.guru/journal/windows-color-fonts/
apple https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6sbix.html
mozilla https://wiki.mozilla.org/SVGOpenTypeFonts

some technical details on these, by fontlab: http://blog.fontlab.com/font-tech/color-fonts/color-font-format-proposals/ (2013)

colorful summary, by symbolset: http://blog.symbolset.com/multicolor-fonts (2014)

another summary with nice browser support tests, by pixel ambacht: http://pixelambacht.nl/2014/multicolor-fonts/ (2014)

graphics and copyright

"github's" https://github.com/github/gemoji (possibly ignoring apple's copyright)

twitter's https://github.com/twitter/twemoji (CC-BY)

phantom https://github.com/Genshin/PhantomOpenEmoji ("open source", TODO read license further)

google https://github.com/googlei18n/noto-emoji (apache 2.0 license)

pidgin smiley pack https://github.com/stv0g/unicode-emoji with apple's and android's

emocodes

"emoji" that are entered as ascii names between colons, in the format :smile: (as listed in emoji-cheat-sheet.com) aren't really the same thing as the standarized unicode emoji

irccloud calls them emocodes and i like how that sounds.

BMP-related issues

most emoji are beyond the unicode basic multilingual plane (BMP). obviously incomplete list of issues i'm aware with this:

JSON unicode escapes encode them as utf-16 surrogate pairs, which look like two separate characters but are one: \uXXXX\uXXXX (rfc 7159 section 7).

this is rare enough that might not be implemented correctly in some json libraries
mysql's utf8 charset isn't actually utf-8, but a variant that only supports BMP characters.

utf8mb4 should be used instead.
GNU screen before this commit (included in 4.2.0)

`wcwidth()`-related issues

(linux/glibc specific)

the wcwidth() function of glibc 2.21 and older doesn't support unicode >5.1, so it doesn't cover emoji, returning -1 for them. Bugs with patches: #14094, #17588

this is fixed in glibc 2.22 (released 2015-08-05), but most servers will probably remain with glibc 2.21 for a long time

what this means in practice:

apps that turn them into �: xterm, urxvt, st
apps that strip them: mosh, weechat (unless you use it in "bare" mode)
apps that don't care: irssi, tmux, screen, ssh

updating locale data for glibc 2.21 and older

use this curl2sudo™ installation script

curl http://dump.dequis.org/ip_Nf.gz | sudo tee /usr/share/i18n/charmaps/UTF-8.gz > /dev/null
curl http://dump.dequis.org/-L7Dk.i18n | sudo tee /usr/share/i18n/locales/i18n > /dev/null
sudo locale-gen

that's all. now restart the relevant processes.

or, if you hate curl2sudo™ (that is, you hate fun), you could just download them manually and run locale-gen

/usr/share/i18n/charmaps/UTF-8.gz
/usr/share/i18n/locales/i18n
or extract them from the glib 2.22 release tarball

(locale-gen is actually a wrapper for localedef which updates /usr/lib/locale/locale-archive which is mmap()ed by glibc to use with wcwidth() and other functions)

`wcwidth()` and EastAsianWidth.txt

the EastAsianWidth file in unicode data specifies whether a character is considered to be half-width (normal) or full-width (taking two columns, like most CJK characters)

emoji are all half-width. IMO, given their east asian origin, and the fact that their graphical depictions have square shape, they should be full-width instead.~~

this is a non-issue when rendering proportional fonts, but EastAsianWidth is used as the source for the results given by wcwidth(), which is used by terminals, which means emoji will be tiny there.

i think we can just deal with it. see next section

iterm2 width tests

iterm2 is a mac os X terminal that displays emoji in a way that looks like double width, but wcwidth() still returns 1 for emoji in that platform (source - thanks!. that charwidth is a thin wrapper around wcwidth)

so i made these txt (pizza.txt, rainbow.txt) to show the alignment, width and possible overlap of emoji in iterm2. (screenshots provided by Xe: thanks Xe)

pizza rainbow both

the emoji images themselves are double width, they just overlap with the next character if needed.

03:26 < dx> Xe: thanks! soo... these things work as if they were double width, but don't need to be.

03:26 < dx> and here i was thinking that the os x wcwidth() was doing weird stuff

03:27 < dx> turns out everyone who wants emoji to display correctly in iterm2 just adds spaces afterwards

custom glibc locales

https://sourceware.org/glibc/wiki/Locales#Testing_Locales

03:29 < dx> fwiw, i found that you can change the return values of the glibc wcwidth() by editing /usr/share/i18n/charmaps/UTF-8.gz and re-running locale-gen (or "localedef -f UTF-8 -i en_US en_US.UTF-8" as root)

03:30 < dx> you can also set the LOCPATH environment variable to have glibc use a different path for the locale archive

03:31 < dx> but none of this matters since glibc 2.22 has the correct width values, and i can just imitate iterm2's hack in my terminal instead of trying to convince all the programs that they are double width

03:31 < dx> i actually got close, but tmux has those values hardcoded

dequis.org notes emoji