Article published in:
Spoken Corpora and Linguistic StudiesEdited by Tommaso Raso and Heliana Mello
[Studies in Corpus Linguistics 61] 2014
► pp. 27–68
Methodological issues for spontaneous speech corpora compilation
The case of C-ORAL-BRASIL
Spontaneous Speech Corpus Compilation has been going through a growing period in the past 20 years. This is due majorly to technological advances that have been achieved allowing for highly accurate recording in vivo, new insights coming from empirically-based linguistic theory, concerns for the documentation of threatened languages and the high degree of relevance of findings to speech recognition applications. This paper discusses methodologies associated to spontaneous speech corpus compilation which shed light on specific aspects of relevance to the understanding of linguistic phenomena that pertain to spoken language. The compilation process of C-ORAL-BRASIL I, an informal spontaneous speech Brazilian Portuguese corpus, among other examples, is used as the basis for the discussion carried.
Published online: 14 November 2014
https://doi.org/10.1075/scl.61.01mel
https://doi.org/10.1075/scl.61.01mel
References
Allwood, Jens
Berruto, Gaetano
Biber, Douglas & Conrad, Susan
Biber, Douglas, Conrad, Susan & Reppen, Randi
Biber, Douglas, Johansson, Stig, Leech, Geoffrey, Conrad, Susan & Finegan, Edward
Chomsky, Noam
Cresti, E.
Cresti, Emanuela
Cresti, Emanuela & Gramigni, Paola
Cresti, Emanuela & Moneglia, Massimo
Cresti, Emanuela & Raso, Tommaso
2012. Text annotation of information units through IPIC. LABLITA http://lablita.dit.unifi.it/ipic/
Dittmar, Norbert
Du Bois, John W., Chafe, Wallace L., Meyer, Charles, Thompson, Sandra A., Englebretson, Robert & Martey, Nii
EAGLES Standards
Edwards, Jane A.
Firenzuoli, Valentina
Fleiss, Joseph L.
Fogassi, Leonardo & Ferrari Pier Francesco
Gregori, Lorenzo & Panunzi, Allesandro
van den Heuvel, Henk, Boves, Louis, Choukri, Khalid, Goddijn, Simo & Sanders, Eric
Izre’el, Shlomo, Hary, Benjamin & Rahav, Giora
Johansson, Stig
Karcevsky, Serge
Labov, William
Labov, William & Waletzky, Joshua
Leech, Geoffrey, Myers, Greg & Thomas, Jenny
Llisterri, Joaquim
1996 Preliminary recommendations on spoken texts. EAGLES Documents EAG-TCWG-STP/P. http://www.ilc.cnr.it/EAGLES96/spokentx/spokentx.html
MacWhinney, Brian J.
Martin, Philippe
Mello, Heliana & Raso, Tommaso
Mello, Heliana, Raso, Tommaso, Mittmann, Maryualê M., Vale, Heloisa P. & Côrtes, Priscila O.
Mello, Heliana, Raso, Tommaso, Mittmann, Maryualê M. & Furtado, D.
DBCom: C-ORAL-BRASIL search engine platform. Forthcoming.
Mettouchi, Amina, Lacheret-Dujour, Anne, Silber-Varod, Vered, Izre’el, Shlomo
Mettouchi, Amina, Caubet, Dominique, Vanhove, Martine, Tosco, Mauro, Comrie, Bernard & Izre’el, Shlomo
Moneglia, Massimo & Cresti, Emanuela
Moneglia, Massimo, Scaarano, Antonietta & Spinu, Marius
Moneglia, Massimo & Scarano, Antonietta
Moneglia, Massimo & Cresti, Emanuela
Forthcoming. The cross-linguistic comparison of information patterning in spontaneous speech corpora: Data from C-ORAL-ROM ITALIAN and C-ORAL-BRASIL. In Linguistique interactionnelle contrastive. Grammaire et interaction dans les langues romanes, Sabine Diao-Klaeger & Britta Thörle (eds) Tübingen Stauffenburg
Nencioni, Giovanni
Oostdijk, Nelleke, Goedertier, Wim, Van Eynde, Frank, Boves, Louis, Martens, Jean-Pierre, Moortgat, Michael, Baayen, R. Harald
2002 Experiences from the Spoken Dutch Corpus Project. In
Proceedings from the Third International Conference on Language Resources and Evaluations
, Manuel Gonzalez-Rodriguez & Carmen Paz Suárez Araujo (eds), 330–347. Las Palmas de Gran Canaria.
Panunzi, Allesandro & Gregori, Lorenzo
2012 DB-IPIC. An XML database for the representation of information structure in spoken language. In Pragmatics and Prosody. Illocution, Modality, Attitude, Information Structure and Speech Annotation, Heliana Mello, Allesandro Panunzi & Tommaso Raso (eds), 19–37. Florence: Firenze University Press.
Poggi, Isabella
Raso, Tommaso & Mello, Heliana
Raso, Tommaso & Mittmann, Maryualê M.
Rocha, Bruno
Scarano, Antonietta
Schiel, Florian, Baumann, Angela, Draxler, Christoph, Ellbogen, Tania, Hoole, Phil & Steffen, Alexander
Signorini, Sabrina & Tucci, Ida
2004 Il restauro e l’ archiviazione elettronica del primo corpus di italiano parlato: Il corpus Stammerjohann. In Costituzione, Gestione e restauro di corpora vocali, Atti delle XIV Giornate del GFS, Collana degli atti dell’associazione italiana di acustica. Viterbo, 4–6 dicembre 2003, Amedeo De Dominicis, Laura Mori & Marianna Stefani (eds), 119–126. Roma: Esagrafica. 

Sinclair, John
1996. Preliminary recommendations on corpus typology. EAGLES Document EAG-TCWG-CTYP/P. http://www.ilc.cnr.it/EAGLES96/corpustyp/corpustyp.html
Stam, Gale & Ishino, Mika
Thompson, Paul
Winski, Richard, Moore, Roger & Gibbon, Dafydd
1995 EAGLES Spoken Language Working Group: Overview and results. In
Eurospeech’95. Proceedings of the 4th European Conference on Speech Communication and Speech Technology
, 18–21 September, Vol 1, 841–844. Madrid, Spain.
Cited by
Cited by 2 other publications
Bossaglia, Giulia, Heliana Mello & Tommaso Raso
Cresti, Emanuela
This list is based on CrossRef data as of 29 january 2021. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.