Mercè Vàzquez

Also published as: Merce Vazquez


2024

pdf bib
Creating Terminological Resources in the Digital Age for Less-resourced Languages
Mercè Vàzquez
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Multilingual terminological resources contain the most representative knowledge of specialized domains and allow professionals to create and translate specialized content in order to spread knowledge. Today, representative and useful multilingual terminological resources are available for the most resourced languages. This reduces or limits the development of knowledge in less-resourced languages across different specialized domains, mainly those that are constantly evolving and creating or adapting new concepts as needed. In this paper we present our methodology for carrying out terminological projects in Catalan, based entirely on open access linguistic resources and using natural language processing tools. The main objective of this research is to maximize the Catalan terminology currently available in open access, using a combination of natural language processing tools. The results are supervised by linguists and terminologist experts before being publicly available to the public. The findings of our research provide a new approach to terminology work, making it possible to design high-volume multilingual terminological projects that are manually revised by linguists and terminologists in the context of less-resourced languages.

2023

pdf bib
TAN-IBE: Neural Machine Translation for the romance languages of the Iberian Peninsula
Antoni Oliver | Mercè Vàzquez | Marta Coll-Florit | Sergi Álvarez | Víctor Suárez | Claudi Aventín-Boya | Cristina Valdés | Mar Font | Alejandro Pardos
Proceedings of the 24th Annual Conference of the European Association for Machine Translation

The main goal of this project is to explore the techniques for training NMT systems applied to Spanish, Portuguese, Catalan, Galician, Asturian, Aragonese and Aranese. These languages belong to the same Romance family, but they are very different in terms of the linguistic resources available. Asturian, Aragonese and Aranese can be considered low resource languages. These characteristics make this setting an excellent place to explore training techniques for low-resource languages: transfer learning and multilingual systems, among others. The first months of the project have been dedicated to the compilation of monolingual and parallel corpora for Asturian, Aragonese and Aranese.

2020

pdf bib
TermEval 2020: Using TSR Filtering Method to Improve Automatic Term Extraction
Antoni Oliver | Mercè Vàzquez
Proceedings of the 6th International Workshop on Computational Terminology

The identification of terms from domain-specific corpora using computational methods is a highly time-consuming task because terms has to be validated by specialists. In order to improve term candidate selection, we have developed the Token Slot Recognition (TSR) method, a filtering strategy based on terminological tokens which is used to rank extracted term candidates from domain-specific corpora. We have implemented this filtering strategy in TBXTools. In this paper we present the system we have used in the TermEval 2020 shared task on monolingual term extraction. We also present the evaluation results for the system for English, French and Dutch and for two corpora: corruption and heart failure. For English and French we have used a linguistic methodology based on POS patterns, and for Dutch we have used a statistical methodology based on n-grams calculation and filtering with stop-words. For all languages, TSR (Token Slot Recognition) filtering method has been applied. We have obtained competitive results, but there is still room for improvement of the system.

2015

pdf bib
TBXTools: A Free, Fast and Flexible Tool for Automatic Terminology Extraction
Antoni Oliver | Mercè Vàzquez
Proceedings of the International Conference Recent Advances in Natural Language Processing

2007

pdf bib
A free terminology extraction suite
Antoni Oliver | Merce Vazquez
Proceedings of Translating and the Computer 29