Article published in:Spoken Corpora and Linguistic Studies
Edited by Tommaso Raso and Heliana Mello
[Studies in Corpus Linguistics 61] 2014
► pp. 152–188
The variation of action verbs in multilingual spontaneous speech corpora
Semantic typology and corpus design
Most high frequency verbs referring to Action in our ordinary communication are General; that is, they productively extend to different actions in their own meaning. Moreover, languages can categorize actions differently. Despite its importance the variations of these verbs is largely unknown, and this lack of data prevents us from facing crucial aspects of lexical typology. The range of productive variations of Action verbs can be induced from spoken corpora, since references to actions are frequent in oral communication. This paper presents data derived from multilingual corpora (English and Italian) within the IMAGACT project and illustrates the methodology, the corpus design requirements, and the overall results obtained in this corpus-based research on cross-linguistic lexical semantics. The methodology identifies data that is relevant to semantic competence, separating the contexts in which the verb is used in its own core meaning from metaphors and phraseology. It makes use of visual prototypes rather than definitions in representing Action concepts, so allowing the display of typological variations across languages in a simple and informative manner. In the Italian corpus, among 677 verbs referring to Action, 106 are General, each of them comprising 3 to 15 action types. This subset records the majority of the cases in which there is reference to Physical Action and is for this reason a core area in the semantic knowledge of the language. Data regarding semantic variation can emerge only if a large enough variety of interactive context is recorded. As a whole, the incidence of metaphorical and phraseological usages in the verb occurrences is high (39%), but is higher in formal uses of language. Reference to Action is concentrated in informal, interactive contexts and especially in interactions with children in the early phases of language acquisition, which also testifies the higher variation of verbs across action types.
Published online: 14 November 2014
1995 BNC database and word frequency lists http://www.kilgarriff.co.uk/bnc-readme.html.
British National Corpus, Version 3
(BNC XML Edition) 2007 Distributed by Oxford University Computing Services on behalf of the BNC Consortium. http://www.natcorp.ox.ac.uk/
Brown, Susan, Rood, Travis & Palmer, Martha
Choi, Soonja & Bowerman, Melissa
El Corpus Oral de Referencia de la Lengua Espanola Contempornea. ftp://ftp.lllf.uam.es/pub/corpus/oral
Cresswell, Maxwell F.
Cresti, Emanuela & Moneglia, Massimo
De Mauro, Tullio, Mancini, Federico, Vedovelli, Massimo & Voghera, Miriam
Izre’el, Shlomo, Hary, Benjamin & Rahav, Giora
Kopecka, Annetta & Narasimhan, Bhuvana
Corpus of Spontaneous Spoken Italian. http://lablita.dit.unifi.it/corpora/
Majid, Asifa, Boster, James S. & Bowerman, Melissa
2011 Natural language ontology of action. A gap with huge consequences for natural language understanding and machine translation. In Human Language Technologies as a Challenge for Computer Science and Linguistics. Proceedings of the LTC Conference , November 25–27, 2011, Zygmunt, Vetulani (ed.), 95–100. Poznań.
Moneglia, Massimo, Monachini, Monica, Calabrese, Omar, Panunzi, Alessandro, Frontini, Francesca., Gagliardi, Gloria & Russo, Irene
2012 The IMAGACT cross-linguistic ontology of action. A new infrastructure for natural language disambiguation. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12) , Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet U. Doğan, Bente Maegaard, Joseph Mariani, Jan Odijk & Stelios Piperidis (eds), 2606–2613. Paris: ELRA.
Moneglia, Massimo & Panunzi, Alessandro
2007 Action predicates and the ontology of action across spoken language corpora. The basic issue of the SEMACT project. In Proceeding of the International Workshop on the Semantic Representation of Spoken Language (SRSL7) , Manuel Alcántara & Thierry Declerck (eds), 51–58. Salamanca: Universidad de Salamanca.
Ng, Hwee Tou, Chung Yong Lim & Shou King Foo
Palmer, Martha, Gildea, Daniel & Kingsbury, Paul
Panunzi, Alessandro & Moneglia, Massimo
Panunzi, Alessandro, Fabbri, Marco, Moneglia, Massimo, Gregori, Lorenzo, & Paladini, Samuele
2012 RIDIRE-CPI: An open source crawling and processing infrastructure for supervised web-corpora building. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12) , Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet U. Doğan, Bente Maegaard, Joseph Mariani, Jan Odijk & Stelios Piperidis (eds), 2274–2279. Paris: ELRA.
Rinaldi, Pasquale, Barca, Laura & Burani, Cristina
Cited by 1 other publications
Panunzi, Alessandro & Paola Vernillo
This list is based on CrossRef data as of 01 march 2021. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.