ICEMorph

A Morphological Analysis Tool for Old Icelandic

Search

currently unavailable

ICEMorph: Architecture

Thumbnail example

ICEMorph is a second generation morphological analysis and look-up tool for the study of Old Icelandic / Old Norse, the most morphonologically complex of the Germanic languages. The analysis tool uses a second generation formal language, FM/Haskell, to tackle the problem of this complexity.
The look-up tool is based on two of the most important dictionaries for Old Icelandic / Old Norse study: Cleasby-Vigfusson An Icelandic-English Dictionary (1874), and Johan Fritzner's Ordbog over det gamle norske Sprog ( 1883-1896).
In its pilot phase, the look-up tool is based on Zoega's subset of Cleasby-Vigfusson, and this dictionary supplies the lexical set for the analysis tool.


Redesigning the Analyzer: The Promise of FM/Haskell

We are developing the second generation Old Icelandic morphological analyzer using the functional programming language Haskell, following the lead of Forsberg and Ranta’s development of Functional Morphology (FM) (Forsberg and Ranta 2004). FM/Haskell fulfills all of the conditions outlined above. While FM/Haskell has been used to develop morphological analyzers for relatively uninflected languages, such as Swedish, our project represents the first attempt to apply FM/Haskell in a comprehensive way to a very morphonologically complex language. Although FM/Haskell was originally designed to be a compiler for other morphological systems, our project intends to extend it to include word discovery when analysis fails (e.g. idiosyncratic forms from manuscripts), making FM/Haskell an excellent stand alone platform for inflectional morphology and analysis. Finally, by developing a standard graphical user interface for adding prototypes to language specific libraries and for the editing of dictionaries, we will greatly increase the extensibility of this system to other languages, lowering considerably the barriers to the development of morphological analyzers.

Previously, our research group developed a first generation morphological analyzer and English language look-up tool for Old Icelandic that integrated well with several digital library systems under the auspices of the Cultural Heritage Language Technologies Project (CHLT), (NSF #IIS-0122491). Together with the Perseus Project, we incorporated the early morphological analyzer/look-up tool with Standard Edition texts of the legendary sagas (Fornaldar sögur) on a pilot basis. In collaboration with a team at the Arnamagnaean Institute, University of Copenhagen (AMI-Copenhagen), we piloted linking the morphological analyzer to diplomatic and facsimile transcriptions of manuscripts; these transcriptions were in turn linked to images of the manuscript pages. Finally, we piloted our morphological analyzer and the legendary sagas with various visualization tools including Sammon cluster views, Dendro maps and radial interactive visualizations available in Greenstone.

Our current project proposes to extend this work significantly. The most important development is the replacement the earlier morphological analyzer with a newly designed and scripted analyzer. The initial analyzer had reached certain performance barriers as it became increasingly complex. Although we had initially explored two-level analysis as well as finite state technologies such as those developed at Xerox PARC, AT&T and by van Noord as solutions to the problem, we settled on FM/Haskell given its transparency in programming. For researchers interested in finite state technologies, an added bonus of FM/Haskell is its ability to generate regular expressions in XFST and LEXC format. By expanding FM/Haskell to include word discovery and compound analysis, our morphological analyzer will not only be a powerful tool for the study of Old Icelandic but also pave a clear path for the development of other easily implemented morphological analyzers for other highly inflected languages.

The Old Icelandic Morphological Analyzer in Haskell

The shift in our architecture from a stand alone machine written in Perl to one written in Haskell, a functional programming language, is a considerable one. Functional languages consider programming as the evaluation and application of mathematical functions. In contrast, imperative languages focus on the execution of sequential commands. While it is generally true that imperative languages (like Perl or C) can adopt a “functional” style and imitate results of functional code, there are significant differences. Due to its puritan approach to computation, functional code is easier to maintain and debug. In addition, code written in a functional language tends to be considerably shorter and easier to read.

Functional Morphology is a systematic way of developing natural language morphologies in a functional language. It is written in the purely functional programming language Haskell 98 with the intention to provide non-programmers with an intuitive way to implement natural language morphologies. There exist currently four rule sets for Swedish, Latin, Italian, Spanish, and Russian.

Functional Morphology is based on the idea of declension prototypes. Given a particular word class, most words in that class tend to be declined the same or similarly, while exceptions occur less often. Functional Morphology makes use of this fact and defines prototype declension tables for the regular word types. For example, Latin “rosa” follows the following declension paradigm [1]:

rosaParadigm :: String → Noun
rosaParadigm rosa (NounForm n c) =
let rosae = rosa ++ "e"
rosis = init rosa ++ "is"
in case n of
Singular → case c of
Accusative → rosa + "m"
Genitive → rosae
Dative → rosae
_ → rosa
Plural → case c of
Nominative → rosae
Vocative → rosae
Accusative → rosa ++ "s"
Genitive → rosa ++ "rum"
_ → rosis

In recent months we have begun revising our Old Icelandic morphological analyzer to utilize the strengths of the Functional Morphology framework. Here is a sample declension table for masculine a-stem nouns:

decl1heimr :: DictForm -> Noun
decl1heimr heimr (NounForm n c) =
mkStr $
case n of
Singular -> case c of
Nominative -> prefix ++ lexeme
Accusative -> heim
Genitive -> heims
Dative -> heimi
Plural -> case c of
Nominative -> heimar
Accusative -> heima
Genitive -> heima
Dative -> heimum
where
(prefix, lexeme) = splitCompound heimr
root = (tk 1 lexeme)
heim = prefix ++ root
heims = prefix ++ root ++ "s"
heimi = prefix ++ (syncope root ++ "i")
heimar = prefix ++ (syncope root ++ "ar")
heima = prefix ++ (syncope root ++ "a")
heimum = prefix ++ (syncope (uMutation root) ++ "um")

>

The structure of this table is identical to the Latin table given above. However, as Old Icelandic has a very rich morphology, the declension prototype contains a number of morphophonemic rules, such as syncope and u-mutation. To illustrate the functional nature of the programming language Haskell, here is the function that performs u-mutation:

uMutation :: String -> String
uMutation man = m ++ mkUm a ++ n
where
(m,a,n) = findStemVowel man
mkUm v = case v of
"a" -> "ö"
_ -> v

Given a string (which in Haskell means a list of strings or characters), it first finds its stem vowel. If the stem vowel is an “a”, then it is being changed to “ö”. In all other cases (indicated by the “_”) no change is being performed.

Perl vs Haskell

The migration from Perl to Haskell began in September 2006. So far, our experience with Haskell and Functional Morphology in particular has been very positive. Here are some of our current observations:

• The declension tables given above illustrate the more transparent nature of the Haskell language.
• Debugging and code maintenance are straightforward thanks to its clear definition of data types.

The code base is much smaller; currently, the morphophonemic rules (i.e. not including the declension tables) span 117 lines, while the Perl version required more than 500 lines for the same functions