The Austrian Baroque Corpus ABaC:us: What does the linguistic annotation add?

First Author: Claudia Resch
Other Author: Eva Wohlfarter
Austrian Centre for Digital Humanities, Austrian Academy of Sciences

The term "corpus" in linguistics refers to a large and structured set of texts which is usually electronically stored and processed. The purpose of this paper is to introduce the Austrian Baroque Corpus (ABaC:us) which has been built up by an interdisciplinary team since 2010.
ABaC:us consists of text data and images dating from the baroque era, in particular the years from 1650 to 1750. It includes 17 texts with more than 210.000 running words, of which five texts - attributed to the Augustinian monk Abraham a Sancta Clara (1644-1709) - constitute the very core of the corpus. The texts of ABaC:us belong mainly to the so-called Memento Mori genre, thus to texts associated with death and dying.
The corpus aims to combine traditional philological expertise and up-to-date text technology to preserve the cultural and linguistic heritage embedded in the texts. In order to ensure reusability, well-established text technological standards - XML annotations according to the guidelines of the Text Encoding Initiative (version P5, - were adopted. The focus of the paper, however, lies on the linguistic annotation: With Tree Tagger (, an open standard to apply Part of Speech tagging, and the Stuttgart-T├╝bingen-Tagset ( word class and lemma information were automatically added to every word in the five main texts of the corpus. But what does the linguistic annotation add to the value of the corpus? The question is legitimate, as the manual correction of the annotation - which was necessary to obtain high quality data - was a rather time-consuming process.
The linguistic annotation allows for more complex linguistic research, such as the analysis of stylistic and rhetorical features, recurring patterns and grammatical elements. Can the linguistic analysis of the corpus help us to enable a deeper knowledge of the society of the past? With several examples from ABaC:us, this paper aims to open the debate.


