Wednesday, October 23, 2013

Spell Checker for Iloko

Iloko has made its presence on the Internet in the form of blogs (mannurat.com), web sites (iluko.com), online dictionaries and Facebook (Ilocano.org). But, the one space in this digital world that I have seen very little of it is in software.

I’ve posted about Ultradefrag in the past and it is the only application that has a localized UI (User Interface) available in Iloko. So, there is some inroads and an example for others to follow. In addition to Iloko, the application also has a Waray-Waray option. I’ve written about helping in localizing Mozilla Firefox into Iloko, but for the time being that is on hold.

RedSquigglesRecently, I was using Facebook in Google Chrome typing my comment in Iloko. If you take a look at the screen capture, every word in my comment is underlined in red! Why? Chrome does not recognize the language and in the Settings Iloko cannot be selected. The only word recognized is “Iloko” which I added to the custom internal dictionary. Lo and behold! Filipino (A.K.A. “Tagalog”) is available, but not for spell checking. I checked to see if there was an Iloko extension, but none is available. I investigated further. I found out that the spell check used in Chrome is Hunspell. Sadly, a dictionary for Iloko is not available. So, the “red squigglies” will continue under words as I type. This irritation (and I imagine that it irks other Iloko speakers) prompted to me to investigate how to create a spelling dictionary for Iloko and Hunspell.LangSelect

Why Hunspell?

Hunspell is free and open-source software. It is not a stand-alone program but a set of libraries that can be incorporated into applications. Among them are:

  • OpenOffice.org – Free and open-source “office” suite. Similar to Microsoft Office. A Tagalog spell-checker extension can be found on their site.
  • LibreOffice – An offshoot of OpenOffice.org, but more “cutting edge” so they are able to share dictionaries.
  • Google Chrome – A popular web browser.
  • Mozilla Firefox – Another popular web browser.
  • Mac OS X – The Mac Operating System.
  • SDL Trados – Software to help localize and translated software.

Creating a dictionary is relative “easy”. The “dictionary” actually is composed of two files, an affix file (*.aff) that contains the affixes, and a dictionary file (*.dic) that contains a list of roots and stems. Each entry in the dictionary file references affix classes or “rules” in the *.aff file. In other words, Hunspell will “figure out” all the possible “words”. Words that it cannot determine are “incorrect”, so it is crucial to get the rules right. Each file is named according to the target language code, e.g. “ilo.aff” and “ilo.dic” for Iloko.

With the ease of creating the necessary files and the the wide-spread use of Hunspell, it’s quite possible to say that after creating an Iloko dictionary, the dictionary can be used with many popular applications.

One thing to note is that Hunspell just checks spelling. It does not check syntax. So, I can type a string of words that have no association with one another and only their spellings will be checked.

Next, I’ll talk about each of the files, the *.aff file and the *.dic file, and some of the issues I’ve encountered while attempting to create a spelling dictionary.

No comments:

Post a Comment