Stella Retali-Medori

2024

pdf bib abs
The ParCoLab Parallel Corpus and Its Extension to Four Regional Languages of France
Dejan Stosic | Saša Marjanović | Delphine Bernhard | Myriam Bras | Laurent Kevers | Stella Retali-Medori | Marianne Vergez-Couret | Carole Werner
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Parallel corpora are still scarce for most of the world’s language pairs. The situation is by no means different for regional languages of France. In addition, adequate web interfaces facilitate and encourage the use of parallel corpora by target users, such as language learners and teachers, as well as linguists. In this paper, we describe ParCoLab, a parallel corpus and a web platform for querying the corpus. From its onset, ParCoLab has been geared towards lower-resource languages, with an initial corpus in Serbian, along with French and English (later Spanish). We focus here on the extension of ParCoLab with a parallel corpus for four regional languages of France: Alsatian, Corsican, Occitan and Poitevin-Saintongeais. In particular, we detail criteria for choosing texts and issues related to their collection. The new parallel corpus contains more than 20k tokens per regional language.

2020

pdf bib abs
Towards a Corsican Basic Language Resource Kit
Laurent Kevers | Stella Retali-Medori
Proceedings of the Twelfth Language Resources and Evaluation Conference

The current situation regarding the existence of natural language processing (NLP) resources and tools for Corsican reveals their virtual non-existence. Our inventory contains only a few rare digital resources, lexical or corpus databases, requiring adaptation work. Our objective is to use the Banque de Données Langue Corse project (BDLC) to improve the availability of resources and tools for the Corsican language and, in the long term, provide a complete Basic Language Ressource Kit (BLARK). We have defined a roadmap setting out the actions to be undertaken: the collection of corpora and the setting up of a consultation interface (concordancer), and of a language detection tool, an electronic dictionary and a part-of-speech tagger. The first achievements regarding these topics have already been reached and are presented in this article. Some elements are also available on our project page (http://bdlc.univ-corse.fr/tal/).

2019

pdf bib abs
Outiller une langue peu dotée grâce au TALN : l’exemple du corse et de la BDLC (Tooling up a less-resourced language with NLP : the example of Corsican and BDLC)
Laurent Kevers | Florian Guéniot | Aurelia Ghjacumina Tognotti | Stella Retali-Medori
Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Volume II : Articles courts

Nos recherches sur la langue corse nous amènent naturellement à envisager l’utilisation d’outils pour le traitement automatique du langage. Après une brève introduction sur le corse et sur le projet qui constitue notre cadre de travail, nous proposons un état des lieux concernant l’application du TAL aux langues peu dotées, dont le corse. Nous définissons ensuite les actions qui peuvent être entreprises, ainsi que la manière dont elles peuvent s’intégrer dans le cadre de notre projet, afin de progresser vers la constitution de ressources et la construction d’outils pour le TAL corse.

Co-authors

Marianne Vergez-Couret 1

Carole Werner 1

Florian Guéniot 1

Aurelia Ghjacumina Tognotti 1