Tuesday, June 16, 2015

How to Simulate Reduplication in Hunspell

Reduplication is a process where a part of or the whole root or stem is repeated. It is use for both inflection and derivation in Iloko.

In my approach, reduplicated whole roots or stems are listed as different entries in the *.dic file. First of all, creating a rule to predict the reduplicated form of a whole root or stem is not practical. Second, words derived from this type of reduplication can change lexical categories and meaning, i.e., derivation. For example, saka-saka (adjective) “barefoot” from saka (noun) “foot” is listed as an entry in the *.dic file and it is tagged with the affix classes that can be applied to it.

saka/xy
saka-saka/ab

Hunspell does not have a mechanism for reduplication, whole or partial. It is only capable of only dealing with what it considers prefixes and suffixes. There had to be a way to reproduce reduplication using its framework of rules! Unfortunately, it has deterred at least one person’s attempt at creating a spell-check dictionary for Tagalog as shown in the following thread: helping to implement a grammar checker...[sic].

Nevertheless, reduplication is possible if you “think outside the box”. To simulate the process, we must use the same approach used for infixation: create “compound prefixes”!

Partial word reduplication centers around the first syllable of the root (at least in Iloko and Tagalog). And, it is the syllable that we have to write the rules for. The first syllable can be any one of the following types: V, VC, CV and CVC. (V = vowel; C = consonant).

5 vowels
14 final consonants
25 initial consonants and consonant clusters
? inflectional forms (depends on affix)

All in all, if you do the math, there are 1,750 “possible” syllables, or 1,750 rules that can be created to simulate reduplication! Imagine multiplying that figure with the various inflectional verb forms and that figure can double to 3,500. And, that would be one class! It’s because of this number, I’ve devised automation to assist in creating affix classes.

Another approach is to create another entry in the dictionary with the reduplicated part of the word.

Example:
1) saka
2) saksaka
3) sasaka
4) saka-saka
5) saksaka-saka
6) sasaka-saka


Number one would be the basic form, so any affixes Number two can be used in certain verb forms or it can be used as the distributive plural. Number three can be used for certain verb forms that require only CV reduplication, for example. And, number four is a lexicalized entry that means “to go barefoot”. The other forms can then be used for forms based on “saka-saka”.

What is nice about this approach is that the affix file no longer has to “guess” reduplicated syllables or mutations that may occur because of phonological processes. The con is that the necessary stems need to be created and the appropriate affix flags have to be associated with the other forms which can tax someone who might be adding entries to the dictionary file.

No comments:

Post a Comment