A few random notes regarding emoji support.
fonts
The "Symbola" font provides black and white emoji as normal font vectors
"Apple color emoji" uses apple's method, is included in recent iOS versions
"Segoe UI emoji" uses microsoft's method, is included in windows 8
"Noto Color Emoji" uses google's method, is included in recent android versions.
see also: my fork of Noto Color Emoji with 12x12 glyphs (ttf download) to workaround implementations without bitmap downscaling
ways to embed emoji in fonts
- google http://google-opensource.blogspot.com/2013/05/open-standard-color-font-fun-for.html (implemented in freetype)
- microsoft http://typography.guru/journal/windows-color-fonts/
- apple https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6sbix.html
- mozilla https://wiki.mozilla.org/SVGOpenTypeFonts
some technical details on these, by fontlab: http://blog.fontlab.com/font-tech/color-fonts/color-font-format-proposals/ (2013)
colorful summary, by symbolset: http://blog.symbolset.com/multicolor-fonts (2014)
another summary with nice browser support tests, by pixel ambacht: http://pixelambacht.nl/2014/multicolor-fonts/ (2014)
graphics and copyright
"github's" https://github.com/github/gemoji (possibly ignoring apple's copyright)
twitter's https://github.com/twitter/twemoji (CC-BY)
phantom https://github.com/Genshin/PhantomOpenEmoji ("open source", TODO read license further)
google https://github.com/googlei18n/noto-emoji (apache 2.0 license)
pidgin smiley pack https://github.com/stv0g/unicode-emoji with apple's and android's
emocodes
"emoji" that are entered as ascii names between colons, in the format :smile:
(as listed in emoji-cheat-sheet.com) aren't really the same thing as the standarized unicode emoji
irccloud calls them emocodes and i like how that sounds.
BMP-related issues
most emoji are beyond the unicode basic multilingual plane (BMP). obviously incomplete list of issues i'm aware with this:
JSON unicode escapes encode them as utf-16 surrogate pairs, which look like two separate characters but are one:
\uXXXX\uXXXX
(rfc 7159 section 7).this is rare enough that might not be implemented correctly in some json libraries
mysql's
utf8
charset isn't actually utf-8, but a variant that only supports BMP characters.utf8mb4
should be used instead.GNU screen before this commit (included in 4.2.0)
wcwidth()
-related issues
(linux/glibc specific)
the wcwidth()
function of glibc 2.21 and older doesn't support unicode >5.1, so it doesn't cover emoji, returning -1 for them. Bugs with patches: #14094, #17588
this is fixed in glibc 2.22 (released 2015-08-05), but most servers will probably remain with glibc 2.21 for a long time
what this means in practice:
- apps that turn them into �: xterm, urxvt, st
- apps that strip them: mosh, weechat (unless you use it in "bare" mode)
- apps that don't care: irssi, tmux, screen, ssh
updating locale data for glibc 2.21 and older
use this curl2sudo™ installation script
curl http://dump.dequis.org/ip_Nf.gz | sudo tee /usr/share/i18n/charmaps/UTF-8.gz > /dev/null
curl http://dump.dequis.org/-L7Dk.i18n | sudo tee /usr/share/i18n/locales/i18n > /dev/null
sudo locale-gen
that's all. now restart the relevant processes.
or, if you hate curl2sudo™ (that is, you hate fun), you could just download them manually and run locale-gen
- /usr/share/i18n/charmaps/UTF-8.gz
- /usr/share/i18n/locales/i18n
- or extract them from the glib 2.22 release tarball
(locale-gen
is actually a wrapper for localedef
which updates /usr/lib/locale/locale-archive
which is mmap()
ed by glibc to use with wcwidth()
and other functions)
wcwidth()
and EastAsianWidth.txt
the EastAsianWidth file in unicode data specifies whether a character is considered to be half-width (normal) or full-width (taking two columns, like most CJK characters)
emoji are all half-width. IMO, given their east asian origin, and the fact that their graphical depictions have square shape, they should be full-width instead.~~
this is a non-issue when rendering proportional fonts, but EastAsianWidth is used as the source for the results given by wcwidth()
, which is used by terminals, which means emoji will be tiny there.
i think we can just deal with it. see next section
iterm2 width tests
iterm2 is a mac os X terminal that displays emoji in a way that looks like double width, but wcwidth() still returns 1 for emoji in that platform (source - thanks!. that charwidth is a thin wrapper around wcwidth)
so i made these txt (pizza.txt, rainbow.txt) to show the alignment, width and possible overlap of emoji in iterm2. (screenshots provided by Xe: thanks Xe)
the emoji images themselves are double width, they just overlap with the next character if needed.
03:26 < dx> Xe: thanks! soo... these things work as if they were double width, but don't need to be.
03:26 < dx> and here i was thinking that the os x wcwidth() was doing weird stuff
03:27 < dx> turns out everyone who wants emoji to display correctly in iterm2 just adds spaces afterwards
custom glibc locales
https://sourceware.org/glibc/wiki/Locales#Testing_Locales
03:29 < dx> fwiw, i found that you can change the return values of the glibc wcwidth() by editing /usr/share/i18n/charmaps/UTF-8.gz and re-running locale-gen (or "localedef -f UTF-8 -i en_US en_US.UTF-8" as root)
03:30 < dx> you can also set the LOCPATH environment variable to have glibc use a different path for the locale archive
03:31 < dx> but none of this matters since glibc 2.22 has the correct width values, and i can just imitate iterm2's hack in my terminal instead of trying to convince all the programs that they are double width
03:31 < dx> i actually got close, but tmux has those values hardcoded