Sinhala language is spoken only by the Sinhalese people in the small
island of Sri Lanka who are about 60% of the total population of a 20
million. So its one of the least spoken languages in the world which
makes seeing Sinhala characters on the web is a delightful experience
for a Sinhalese. At least it used to be. Nowadays with the breakethrough
of unicode its become so commonplace, theres nothing so special about
it. There are many sinhala websites, half of posts on my facebook wall
are in Sinhala and though its still somewhat dodgy, Google translate
includes Sinhala.
Before unicode ASCII was popular. It still is here but unicode is the
norm. There are 256 ASCII characters and 128 of them were used for
letters of the English language. As a standard ascii code 065 is used
for capital A in English fonts. In fonts that have glyphs for languages
other than English its for some standard letter of that language's
alphabet. And in Wijesekara layout for Sinhala fonts it stands for "Hal
kireema"
්. The problem with this standard is that same sequence of ascii codes
could display different glyphs depending on the font used. Some text
written in Sinhala language using a Sinhala ascii font,
if viewed using a different font, could display nothing but gibberish.
Or worse, if two languages contain similar letters, it could give out a
meaningful yet different meaning. So its obvious that ascii texts are
very difficult to be used universally. Hence unicode.
There were 65000 unicode characters in the beginning and now there are
17 times more which allows every letter in every language in the world
to have its own unicode code. There's still an excess of codes which are
taken up by glyphs like ♥, ♫, ☯, ☺. With unicode, the font used
should not matter in deciding which letter of which language is
displayed. Only in the visual properties of glyphs it should matter.
Font makers are guided on which unicode code should display which
characters. 128 characters from code
U+0D80 through
U+0DFF are reserved for
Sinhala characters.
Obviously a font cannot contain glyphs for every unicode code. If a
selected font does not contain glyphs for a certain unicode characters
those characters would be displayed in a font that does. Applications
including web browsers would select the fallback font depending on the
way the system is configured. In my ubuntu 14.04 machine Sinhala
characters are displayed by the font LKLUG. It can be changed by
changing configuration of
fontconfig.
Now to displaying characters on the web. Earlier, content of web sites
are displayed entirely in fonts that are installed in viewers system.
Websites could optionally specify a certain family or a font or a chain
of fonts for fallbacks. Though a webpage could end up being displayed
entirely different from the way the developer expected because of lack
of a certain installed font. Though this is the case with many websites
even now, there is the introduction of webfonts which could change all
that.
Developers can specify a font to use and the place to get that font,
using @font-face notation so the clients (web browsers) would do
everything they can to display text using that font. Usually they only
fallback if they could not download the webfont from the specified
location.
Early Sinhala websites would include ASCII text. And as none of the
sinhala fonts they could use could be considered web safe, they asked
users to download and install whatever font they are using. Notices
appeared that says "Do you see sinhala characters? If not download and
install this font" while gibberish apeard in whatever the english ascii
font the web browser decided to fallback. Only once the font is
installed the text would look meaningful.
When unicode came through many sinhala websites changed from ASCII to
Unicode. The upside is most systems included a unicode font that covers
the sinhala unicode characters. This got rid of the step of downloading
and installing fonts. Unfortunately this is also the downside. Most
systems... Some systems does not include a Sinhala unicode font. For
example Android devices with versions KitKat and prior. And without
rooting its very difficult to install a new font there. LollyPop
standard font includes glyphs for Sinhala characters. But some
manufacturers like Sony removed them for reasons known only to them.
Maybe they thought extra few KiloBytes is not worth an entire nation
reading and writing from their native toung. Sinhala websites like
bbc.lk,
lankadeepa.lk contains unicode text.
So they are readable from most pcs but not from most Android hand
helds.
Then webfonts came up. Which allows developers to include a Sinhala
unicode font with the rest of the content from the website. So its
readable from most browsers including ones in Android devices.
gossip.hirufm.lk does this. Many
other Sinhala websites do not seem to do it.
gossiplankanews.lk,
another gossip site!, use webfonts but they are sticking to ASCII. If
text from their site is copied and pasted somewhere you can see the
gibberish they truely are. But at least since they use webfonts content
should be readable from systems without Sinhala Unicode fonts, so
Android devices.
If the developers of Sinhala websites use webfonts with unicode content
they can increase their audience.
fontsquirrel is a good place, among others, to generate a webfonts kit. The
hodipotha font from icta.lk is released under creative commons license,
so it can be used to generate the webfonts kit.
In fontsquirrel it is important to chose the expert option and pick no
subsetting. Unless it will generate webfonts with characters only in the
range of western charaters omitting Sinhala characters.
Following text is using webfonts (hodipotha) and hence should be visible
in many browsers including ones in Android mobiles in (not very beautiful)
glyphs of the hodipotha font.
Following is not. And hence would show up in whatever the font your system
decides.(is configured)