A few random notes regarding emoji support.
The "Symbola" font provides black and white emoji as normal font vectors
"Apple color emoji" uses apple's method, is included in recent iOS versions
"Segoe UI emoji" uses microsoft's method, is included in windows 8
"Noto Color Emoji" uses google's method, is included in recent android versions.
ways to embed emoji in fonts
- google http://google-opensource.blogspot.com/2013/05/open-standard-color-font-fun-for.html (implemented in freetype)
- microsoft http://typography.guru/journal/windows-color-fonts/
- apple https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6sbix.html
- mozilla https://wiki.mozilla.org/SVGOpenTypeFonts
some technical details on these, by fontlab: http://blog.fontlab.com/font-tech/color-fonts/color-font-format-proposals/ (2013)
colorful summary, by symbolset: http://blog.symbolset.com/multicolor-fonts (2014)
another summary with nice browser support tests, by pixel ambacht: http://pixelambacht.nl/2014/multicolor-fonts/ (2014)
graphics and copyright
"github's" https://github.com/github/gemoji (possibly ignoring apple's copyright)
twitter's https://github.com/twitter/twemoji (CC-BY)
phantom https://github.com/Genshin/PhantomOpenEmoji ("open source", TODO read license further)
google https://github.com/googlei18n/noto-emoji (apache 2.0 license)
pidgin smiley pack https://github.com/stv0g/unicode-emoji with apple's and android's
"emoji" that are entered as ascii names between colons, in the format
:smile: (as listed in emoji-cheat-sheet.com) aren't really the same thing as the standarized unicode emoji
irccloud calls them emocodes and i like how that sounds.
most emoji are beyond the unicode basic multilingual plane (BMP). obviously incomplete list of issues i'm aware with this:
JSON unicode escapes encode them as utf-16 surrogate pairs, which look like two separate characters but are one:
\uXXXX\uXXXX(rfc 7159 section 7).
this is rare enough that might not be implemented correctly in some json libraries
utf8charset isn't actually utf-8, but a variant that only supports BMP characters.
utf8mb4should be used instead.
GNU screen before this commit (included in 4.2.0)
this is fixed in glibc 2.22 (released 2015-08-05), but most servers will probably remain with glibc 2.21 for a long time
what this means in practice:
- apps that turn them into �: xterm, urxvt, st
- apps that strip them: mosh, weechat (unless you use it in "bare" mode)
- apps that don't care: irssi, tmux, screen, ssh
updating locale data for glibc 2.21 and older
use this curl2sudo™ installation script
curl http://dump.dequis.org/ip_Nf.gz | sudo tee /usr/share/i18n/charmaps/UTF-8.gz > /dev/null curl http://dump.dequis.org/-L7Dk.i18n | sudo tee /usr/share/i18n/locales/i18n > /dev/null sudo locale-gen
that's all. now restart the relevant processes.
or, if you hate curl2sudo™ (that is, you hate fun), you could just download them manually and run
- or extract them from the glib 2.22 release tarball
locale-gen is actually a wrapper for
localedef which updates
/usr/lib/locale/locale-archive which is
mmap()ed by glibc to use with
wcwidth() and other functions)
wcwidth() and EastAsianWidth.txt
the EastAsianWidth file in unicode data specifies whether a character is considered to be half-width (normal) or full-width (taking two columns, like most CJK characters)
emoji are all half-width. IMO, given their east asian origin, and the fact that their graphical depictions have square shape, they should be full-width instead.~~
this is a non-issue when rendering proportional fonts, but EastAsianWidth is used as the source for the results given by
wcwidth(), which is used by terminals, which means emoji will be tiny there.
i think we can just deal with it. see next section
iterm2 width tests
iterm2 is a mac os X terminal that displays emoji in a way that looks like double width, but wcwidth() still returns 1 for emoji in that platform (source - thanks!. that charwidth is a thin wrapper around wcwidth)
the emoji images themselves are double width, they just overlap with the next character if needed.
03:26 < dx> Xe: thanks! soo... these things work as if they were double width, but don't need to be.
03:26 < dx> and here i was thinking that the os x wcwidth() was doing weird stuff
03:27 < dx> turns out everyone who wants emoji to display correctly in iterm2 just adds spaces afterwards
custom glibc locales
03:29 < dx> fwiw, i found that you can change the return values of the glibc wcwidth() by editing /usr/share/i18n/charmaps/UTF-8.gz and re-running locale-gen (or "localedef -f UTF-8 -i en_US en_US.UTF-8" as root)
03:30 < dx> you can also set the LOCPATH environment variable to have glibc use a different path for the locale archive
03:31 < dx> but none of this matters since glibc 2.22 has the correct width values, and i can just imitate iterm2's hack in my terminal instead of trying to convince all the programs that they are double width
03:31 < dx> i actually got close, but tmux has those values hardcoded