Visualizations and Analytics

A key advance in this project is the application of advanced word study, text categorization, knowledge discovery and visualization tools for the study of disparate digital assets. In the first instance, these analytic techniques will focus on text. In later iterations, and as more images and sound files become available, other techniques will be added.

Networks

The study of folklore is arguably the study of networks. Networks exist exterior to the collected texts in the networks of storytellers and collectors. Networks also exist within stories themselves but, because of the short length of many of the folklore texts, I will focus initially on developing tools for calculating and visualizing the "external" networks of storytellers and collectors.

Early models of folklore focused on the notion of regular, non-random networks (Krohn). When transmission took place, it was regular, localized and predictable. The diffusion of knowledge propagated like a wave from a single point along well-described pathways. More recent folklore theory has discarded this notion of regular networks, in favor of a recognition of more irregular and increasingly random networks. While many people organize into networks based on social class, geographic location and occupation, transmission of stories can--and often does--take place on the basis of less predictable contact (someone meets someone at an Inn and they swap tales; or the local musician brings a replacement musician from another village to play at a Christmas party and he sings songs from his area as well). The resulting networks are less predictable, and reflect increasing randomness:

A network analysis tool will be developed as part of this project to discover aspects of networks of storytellers and collectors. Certain pre existing information, from Minder og Oplevelser, and other sources such as the census and church books, will be used as part of the network information set. This information will then be used in conjunction with measures of repertoire similarity derived from the ETK corpus. Different types of implicit or explicit connections will receive different weights: for example, a familial relationship will receive strong weight, while an occupational affiliation will receive lower weight. High measures of repertoire similarity will similarly receive a high weight. Since the date of collection and place of collection for nearly all the documents can be determined, it may also be possible to explore how implicit networks shift over time and space. An interesting question in this regard is to explore if the advent of ubiquitous rail service changed the geographic range of these implicit networks.

An important feature of the network discovery and research tool will be the degree of user choice in limiting the parameters of the informant groups--in other words, the user could limit the domain to "school teachers" and then explore the implicit networks of school teachers generated through a calculation of repertoire similarity and other information related to school teachers' relationships with each other. The output from the network modeler will not only be projected into 2D space, but also projected onto geographical maps.