ICEMorph

A Morphological Analysis Tool for Old Icelandic

Search

currently unavailable

ICEMorph: History

Thumbnail example

ICEMorph is a second generation morphological analysis and look-up tool for the study of Old Icelandic / Old Norse, the most morphonologically complex of the Germanic languages. The analysis tool uses a second generation formal language, FM/Haskell, to tackle the problem of this complexity.
The look-up tool is based on two of the most important dictionaries for Old Icelandic / Old Norse study: Cleasby-Vigfusson An Icelandic-English Dictionary (1874), and Johan Fritzner's Ordbog over det gamle norske Sprog ( 1883-1896).
In its pilot phase, the look-up tool is based on Zoega's subset of Cleasby-Vigfusson, and this dictionary supplies the lexical set for the analysis tool.


From CREST to ICEMorph

The current project derives from earlier projects aimed at making Old Icelandic texts available in electronic format. That project, started in the 1970s and dubbed CREST, focused on a very limited number of texts, all drawn from the corpus of "legendary" sags, the Fornaldar sögur. These short texts were keyed onto punch cards and, through the auspices of IBM Iceland, read into a mainframe computer. The main goal of the project was to produce a keyword in context (KWIC) concordance of these texts. Of course, any such concordance would have had to be lemmatized by hand. The concordance was never produced and, while the cards were eventually read to tape and then mounted on a disk at UC Berkeley's computing services center, little more was done with these texts.

In the early 1980s, the project was revived, and a decision was made to attempt to deploy early scanning technology (Kurzweil) to scan the remaining texts from the legendary saga corpus. Besides the obvious problems of representing Old Icelandic in a limited ASCII set (see Berkeley conventions), problems immediately arose surrounding the accuracy of early OCR in languages other than English. Some of these problems were addressed by writing elaborate filters and by employing graduate students to tediously correct text. In several cases, texts were simply entered manually using the Berkeley conventions. In 1989, a full--yet still unlemmatized--concordance of the Legendary Sagas was produced--the concordance was so unwieldy that it did not fulfill its wider research potential.

By the mid 1990s, advances in OCR for languages other than English led to the emergence of more and more digital texts. Through concerted effort, Dr. Zoe Borovsky was able to produce a complete set of proofread texts of the legendary sagas, one of the first digital corpora of early Scandinavian texts. Searches on this corpus and attempts at data-mining were still seriously hampered by the challenges posed by Old Icelandic morphology.

During a conversation in 2000 with Gregory Crane, the editor-in-chief of Perseus, Timothy Tangherlini agreed to put together a team to develop an early version of an Old Icelandic morphological analyzer, as a proof of concept. The analyzer and the extant Old Icelandic digital texts would be brought into the Perseus environment to further illustrate the benefits of attaching accurate morphological detail to these texts. Students would be served by the look-up tool incorporated into the morphological analysis tool. This work was made possible by funding from the National Science Foundation, the European Union, under the Cultural Heritage Languages Technology program. Additional support was provided through the Center for Medieval and Rennaissance Studies.

In 2005, the new director of CMRS, Prof. Brian Copenhaver lent his considerable support to the project and brought it under the umbrella of UCLA's CMRS. Through generous seed funding from the UCLA Office of Research and the CMRS, the current project builds on the early success of the NSF/EU funded research. Moving away from cascading rewrite rules (and all of the problems debugging those rules), the new analyzer promises to be more flexible, more efficient and far more accurate. Similarly, because of excellent collaborations with a variety of international partners, our analyzer will be able to take advantage of important dictionaries such as J. Fritzner's that have already been digitized, and an ever expanding corpus of digital texts, and manuscript images.