Studies of Everyday Speech at the Intersection of Disciplines

Tatiana Sherstinova
St. Petersburg State University

The research project described in the paper has been started several years ago with the aim to investigate Russian spontaneous speech. As it was shown by many researchers, natural speech is very different from speech recorded in a laboratory with soundproof walls. We decided to change the conception and get recordings from natural real-life communicative situations: the participants-volunteers had to spend a whole day with turned-on voice recorders that recorded all their audible communications. This methodology can be compared with a daily cardio monitoring that is widely practiced in medicine. As known, speech features strongly depend on speaker's individual characteristics. Thus, we incorporated into our analysis techniques used in field linguistics, sociolinguistics, and psycholinguistics (e.g., the participants had to complete socio-demographic and psychological questionnaires). That was the origin of the linguistic resource later known as the ORD corpus of Russian everyday communication [1].

The ORD corpus allows to examine speech on various linguistic levels: phonetic, lexical, grammatical, semantic, and pragmatic. Moreover, the detailed examination of corpus data led us to unexpected conclusions: the ORD recordings give valuable research data for many other interdisciplinary studies like anthropological linguistics, behavioral and communication studies, studies in pragmatics, discourse analysis, psycholinguistics, and forensic phonetics. The corpus can also be used for didactic purposes when studying colloquial Russian as a foreign language, etc. This paper focuses on the two important ORD applications.

Applications for Speech Technologies. Statistical description of Russian spontaneous speech in everyday interaction is very significant for adjustment and improvement of speech synthesis and recognition systems. Thus, speech transcripts of the ORD corpus may be used for building n-gram language models for speech recognition systems that predict the probability of a given word on the basis of the preceding n−1 words. The study of spontaneous speech reduction [2] may be used for building an authentic lexicon of word pronunciations. Besides, the specialists in speech technologies express interest in the lists of the most frequently used Russian utterances [3] and in temporal patterns of speech obtained from the ORD data.

Sociolinguistic studies. Several pilot sociolinguistic investigations were made in a last few years based on the ORD data: e.g., speech rate studies, comparison of men’s and women’s social behavior, etc. Recently, a large sociolinguistic project has been started with an aim to analyze everyday Russian with focus on social differentiation (age-, gender-, education-, professional-related groups, etc.). Sociolects are to be described on phonetic, lexical, and grammar levels. One of the most significant objects of this project is to reveal distinctive speech features between different social groupings (e.g., young people vs. older people, men vs. women, blue collars vs. white collars, etc.). Besides sociolinguistics, the results of this project will be a very valuable material for forensic linguistics as well. The research is supported by the Russian Scientific Foundation, project # 14-18-02070 "Everyday Russian Language in Different Social Groups".

The list of possible applications of the ORD corpus may be further continued.


