Corpus Studies of Russian Everyday Speech and Oral Communication

Bogdanova-Beglarian Natalia, Sherstinova Tatiana, Blinova Olga, Martynenko Gregory
St. Petersburg State University

The paper presents the ORD ("One day of speech") corpus of Russian everyday speech which contains long-term audio recordings of daily communication [1]. Nowadays, the ORD corpus is the most representative collection of everyday spoken Russian containing more than 1000 hours of recordings gathered from 110 main participants and hundreds of their interlocutors; speech transcripts numbers about 500000 words and it is planned to extend transcripts up to 1 million words. Speech is selectively annotated on different levels — phonetic, lexical, grammatical, and pragmatic; quantitative data processing is made for annotations on each level [2]. The paper gives brief overview of studies which are (or have been) conducted on the ORD data in the followings aspects: 1) phonetics (study of reduction; temporal studies; speech patterns; hesitations; etc.); 2) lexical studies (new words; new meanings; frequency word lists; lexical richness and concentration; slang; argot; etc. ); 3) morphology studies (POS-distribution; frequency lists of grammatical forms; grammatical errors; etc.) 4) syntactic studies (linear word order; syntactic complexity; specific syntactic phenomena of spontaneous speech; etc.); 5) discourse and communication studies (macro and micro structures of everyday communication; communication scenarios; discourse words and fillers; pragmatic studies; communication with "not-standard" interlocutors; etc.); 6) psycholinguistic studies (dependency of speech characteristics from speaker's psychological type); and 7) sociolinguistic studies (speech features of different social grouping; gender linguistics; styles and registers of spoken Russian; etc.) currently supported by Russian Scientific Foundation, project # 14-18-02070 “Everyday Russian Language in Different Social Groups” (cf., for example, [3]). The ORD corpus has different interdisciplinary applications, the major of which will be listed.

