Multiword Units in Machine Translation and Translation Technology

Editors
| University of Wolverhampton
| "L'Orientale" University of Naples
| University of Málaga
| University of Geneva
HardboundAvailable
ISBN 9789027200600 | EUR 99.00 | USD 149.00
 
e-Book
ISBN 9789027264206 | EUR 99.00 | USD 149.00
 
The correct interpretation of Multiword Units (MWUs) is crucial to many applications in Natural Language Processing but is a challenging and complex task. In recent years, the computational treatment of MWUs has received considerable attention but there is much more to be done before we can claim that NLP and Machine Translation (MT) systems process MWUs successfully.

This volume provides a general overview of the field with particular reference to Machine Translation and Translation Technology and focuses on languages such as English, Basque, French, Romanian, German, Dutch and Croatian, among others. The chapters of the volume illustrate a variety of topics that address this challenge, such as the use of rule-based approaches, compound splitting techniques, MWU identification methodologies in multilingual applications, and MWU alignment issues.
[Current Issues in Linguistic Theory, 341]  2018.  ix, 259 pp.
Publishing status: Available
Table of Contents
About the editors
viii–ix
Multiword units in machine translation and translation technology
Johanna Monti, Violeta Seretan, Gloria Corpas Pastor and Ruslan Mitkov
2–37
Part 1. Multiword units in machine translation
42–99
Analysing linguistic information about word combinations for a Spanish-Basque rule-based machine translation system
Uxoa Iñurrieta, Itziar Aduriz, Arantza Díaz de Ilarraza, Gorka Labaka and Kepa Sarasola
42–59
How do students cope with machine translation output of multiword units? An exploratory study
Joke Daems, Michael Carl, Sonia Vandepitte, Robert J. Hartsuiker and Lieve Macken
62–80
Aligning verb + noun collocations to improve a French-Romanian FSMT system
Amalia Todiraşcu and Mirabela Navlea
82–99
Part 2. Multiword units in multilingual NLP applications
104–162
Multiword expressions in multilingual information extraction
Gregor Thurmair
104–123
A multilingual gold standard for translation spotting of German compounds and their corresponding multiword units in English, French, Italian and Spanish
Simon Clematide, Stéphanie Lehner, Johannes Graën and Martin Volk
126–145
Dutch compound splitting for bilingual terminology extraction
Lieve Macken and Arda Tezcan
148–162
Part 3. Identification and translation of multiword units
166–256
A flexible framework for collocation retrieval and translation from parallel and comparable corpora
Oscar Mendoza Rivera, Ruslan Mitkov and Gloria Corpas Pastor
166–180
On identification of bilingual lexical bundles for translation purposes: The case of an English-Polish comparable corpus of patient information leaflets
Łukasz Grabowski
182–199
The quest for croatian idioms as multiword units
Kristina Kocijan and Sara Librenjak
202–221
Corpus analysis of croatian constructions with the verb doći ‘to come’
Goranka Blagus Bartolec and Ivana Matas Ivanković
224–241
Anaphora resolution, collocations and translation
Eric Wehrli and Luka Nerima
244–256
Index
257
Index
“[T]he book represents many interesting topics in the area of computational treatment of multiword expressions, with a special focus on MT and translation technology. [...] This book can essentially be viewed as an important contribution to a specialised area (i.e. computational treatment of MWUs) of interest, which will be a great help to NLP researchers, and MT researchers and users in particular.”
“[T]he accuracy of MWU translations still remains a problem, and MWU processing and translation still pose the hardest challenges to MT and translation technology (TT). [...] [T]he book definitely makes an important contribution to MWU processing, thanks to the new angle it brings to the study of MWU in NLP and the diverse and innovative models for the computational treatment of MWU.”
Subjects

Translation & Interpreting Studies

Translation Studies
BIC Subject: CFK – Grammar, syntax
BISAC Subject: LAN009060 – LANGUAGE ARTS & DISCIPLINES / Linguistics / Syntax
U.S. Library of Congress Control Number:  2017058783 | Marc record