ICEMorph

A Morphological Analysis Tool for Old Icelandic

Normalization

Not all Old Icelandic / Old Norse texts are written with the same orthographic conventions. Indeed, orthographic and spelling conventions changed along with the changes in the languages. As a result, one finds over the course of several centuries, a wide range of spellings for the same underlying form. This information, however, is not simply a matter of inconvenience, but rather is important in regards to historical developments in the languages. At the same time, the diverse spellings pose a significant challenge for morphological analysis across texts (particularly manuscript transcriptions) that come from different periods. Rather than elide this rich information, the goal of our project is to retain original orthography in the display environment, yet include "normalized" forms in the tagset for each token, so that sophisticated searches and visualizations can be employed across the centuries.

A short white paper on "Issues in Orthographic Standardization in Old Icelandic" lays out some of the challenges associated with this problem. The goal of our system is to employ an advanced normalization module, that allows us to tag word tokens with the appropriate normalized form in an automated fashion.