ICEMorph

A Morphological Analysis Tool for Old Icelandic

Disambiguation

The rich morphology of Old Icelandic / Old Norse leads to a large number of identical forms derived either from different lemma or from the same lemma within its paradigm. For example, the form hafi could be either a dative singular neuter noun with the meaning ‘lifting’, or an instance of the verb hafa ‘to have’. In the latter case, hafi could be either optative present 3rd person singular or optative present 3rd person plural, respectively.

The task of determining the correct lemma and grammatical information of a form is called grammatical disambiguation. Sense disambiguation, on the other hand, determines the correct meaning of a given form. For example, the Zoega dictionary entry for hafa lists 15 different meanings, such as ‘to have’, ‘to hold’, and ‘to dwell’.

Two Disambiguation strategies

There are a variety of disambiguation strategies (see Manning and Schütze (2000) for an overview). For unsupervised disambiguation, only a text corpus is available. Supervised disambiguation, on the other hand, usually involves a given text corpus, as well as a dictionary or other additional linguistic information with respect to the target language such as grammatical tags. As a result, supervised disambiguation usually yields better results.

For ICEMorph, we have at our disposal a variety of lexica as well as the grammatical forms for each lemma from the the morphological analyzer. Our efforts will initially focus on grammatical disambiguation in a semi-supervised manner. The disambiguator will return results for ambiguous forms that include a statistically weighted score for each disambiguation. T\In this system, the end-user has the option to select the form they feel is most likely. Borrowing from Perseus, we will also institute a "voting" system, allowing expert users the opportunity to vote on the correct solution and possibly add their own annotations for that disambiguation. Since ambiguous forms are often a matter of interpretation, we do not intend to proffer an automated solution--rather, we believe that this assisted form of disambiguation will stimulate academic debate concerning certain discovered instances of ambiguity.