Predicting constructional choice in Estonian
Jane Klavan, Maarja-Liisa Pilvik, Kristel Uiboaed
University of Tartu
A common presumption in usage-based linguistics is that the alternation between linguistic rival forms (such as the English genitive constructions) is not free but conditioned by a multitude of factors. In our presentation, we take a closer look at two near-synonymous constructions - the synthetic adessive construction (e.g. laual 'on the table') and the analytic "peal" construction (laua peal 'on the table') - expressing spatial locative function in Estonian, and identify a number of semantic and morpho-syntactic factors that influence the choice between the two constructions.
In the first systematic study on the subject (Klavan 2012) a logistic regression model with four morphosyntactic and two semantic explanatory predictors was fit to Estonian written language data, yielding a classification accuracy of 70%. In our study, we use dialectal data from the Corpus of Estonian Dialects (CED 2015) to explore how the minimal adequate model for written data performs on non-standard, spoken spontaneous language. In addition, we include the geographical dimension and the Landmark lemma as random effects and demonstrate how these factors significantly improve the fit of the model. Furthermore, we show how complementing the results of the mixed-effects logistic regression model with the results obtained with the 'tree & forest' models (e.g. Breiman 2001) helps to explain the variation in more detail and highlight significant interactions in the data.
Breiman, Leo. 2001. Random Forests. Machine Learning 45(1): 5-32.
CED. 2015. Corpus of Estonian Dialects. http://www.murre.ut.ee/mkweb
Klavan, Jane. 2012. Evidence in Linguistics: Corpus-Linguistic and Experimental Methods for Studying Grammatical Synonymy (Dissertationes Linguisticae Universitatis Tartuensis). Tartu: University of Tartu Press.