Jelena Prokić

Also published as: Jelena Prokic


2024

pdf bib
A New Dataset for Tonal and Segmental Dialectometry from the Yue- and Pinghua-Speaking Area
Ho Wang Matthew Sung | Jelena Prokic | Yiya Chen
Proceedings of the 6th Workshop on Research in Computational Linguistic Typology and Multilingual NLP

Traditional dialectology or dialect geography is the study of geographical variation of language. Originated in Europe and pioneered in Germany and France, this field has predominantly been focusing on sounds, more specifically, on segments. Similarly, quantitative approaches to language variation concerned with the phonetic level are in most cases focusing on segments as well. However, more than half of the world’s languages include lexical tones (Yip, 2002). Despite this, tones are still underexplored in quantitative language comparison, partly due to the low accessibility of the suitable data. This paper aims to introduce a newly digitised dataset which comes from the Yue- and Pinghua-speaking areas in Southern China, with over 100 dialects. This dataset consists of two parts: tones and segments. In this paper, we illustrate how we can computationaly model tones in order to explore linguistic variation. We have applied a tone distance metric on our data, and we have found that 1) dialects also form a continuum on the tonal level and 2) other than tonemic (inventory) and tonetic differences, dialects can also differ in the lexical distribution of tones. The availability of this dataset will hopefully enable further exploration of the role of tones in quantitative typology and NLP research.

2023

pdf bib
ChiWUG: A Graph-based Evaluation Dataset for Chinese Lexical Semantic Change Detection
Jing Chen | Emmanuele Chersoni | Dominik Schlechtweg | Jelena Prokic | Chu-Ren Huang
Proceedings of the 4th Workshop on Computational Approaches to Historical Language Change

Recent studies suggested that language models are efficient tools for measuring lexical semantic change. In our paper, we present the compilation of the first graph-based evaluation dataset for lexical semantic change in the context of the Chinese language, specifically covering the periods of pre- and post- Reform and Opening Up. Exploiting the existing framework DURel, we collect over 61,000 human semantic relatedness judgments for 40 targets. The inferred word usage graphs and semantic change scores provide a basis for visualization and evaluation of semantic change.

2014

pdf bib
A Benchmark Database of Phonetic Alignments in Historical Linguistics and Dialectology
Johann-Mattis List | Jelena Prokić
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In the last two decades, alignment analyses have become an important technique in quantitative historical linguistics and dialectology. Phonetic alignment plays a crucial role in the identification of regular sound correspondences and deeper genealogical relations between and within languages and language families. Surprisingly, up to today, there are no easily accessible benchmark data sets for phonetic alignment analyses. Here we present a publicly available database of manually edited phonetic alignments which can serve as a platform for testing and improving the performance of automatic alignment algorithms. The database consists of a great variety of alignments drawn from a large number of different sources. The data is arranged in a such way that typical problems encountered in phonetic alignment analyses (metathesis, diversity of phonetic sequences) are represented and can be directly tested.

2012

pdf bib
Proceedings of the EACL 2012 Joint Workshop of LINGVIS & UNCLH
Miriam Butt | Sheelagh Carpendale | Gerald Penn | Jelena Prokić | Michael Cysouw
Proceedings of the EACL 2012 Joint Workshop of LINGVIS & UNCLH

pdf bib
Introduction
Miriam Butt | Jelena Prokić | Thomas Mayer | Michael Cysouw
Proceedings of the EACL 2012 Joint Workshop of LINGVIS & UNCLH

pdf bib
Detecting Shibboleths
Jelena Prokić | Çağrı Çöltekin | John Nerbonne
Proceedings of the EACL 2012 Joint Workshop of LINGVIS & UNCLH

2010

pdf bib
Exploring Dialect Phonetic Variation Using PARAFAC
Jelena Prokić | Tim Van de Cruys
Proceedings of the 11th Meeting of the ACL Special Interest Group on Computational Morphology and Phonology

2009

pdf bib
Multiple Sequence Alignments in Linguistics
Jelena Prokić | Martijn Wieling | John Nerbonne
Proceedings of the EACL 2009 Workshop on Language Technology and Resources for Cultural Heritage, Social Sciences, Humanities, and Education (LaTeCH – SHELT&R 2009)

pdf bib
Evaluating the Pairwise String Alignment of Pronunciations
Martijn Wieling | Jelena Prokić | John Nerbonne
Proceedings of the EACL 2009 Workshop on Language Technology and Resources for Cultural Heritage, Social Sciences, Humanities, and Education (LaTeCH – SHELT&R 2009)

2007

pdf bib
Identifying Linguistic Structure in a Quantitative Analysis of Dialect Pronunciation
Jelena Prokić
Proceedings of the ACL 2007 Student Research Workshop