Are Text Classifiers Xenophobic? A Country-Oriented Bias Detection Method with Least Confounding Variables

Valentin Barriere, Sebastian Cifuentes


Abstract
Classical bias detection methods used in Machine Learning are themselves biased because of the different confounding variables implied in the assessment of the initial biases. First they are using templates that are syntactically simple and distant from the target data on which the model will deployed. Second, current methods are assessing biases in pre-trained language models or in dataset, but not directly on the fine-tuned classifier that can actually produce harms. We propose a simple method to detect the biases of a specific fine-tuned classifier on any type of unlabeled data. The idea is to study the classifier behavior by creating counterfactual examples directly on the target data distribution and quantify the amount of changes. In this work, we focus on named entity perturbations by applying a Named Entity Recognition on target-domain data and modifying them accordingly to most common names or location of a target group (gender and country), and this for several morphosynctactically different languages spoken in relation with the countries of the target groups. We used our method on two models available open-source that are likely to be deployed by industry, and on two tasks and domains. We first assess the bias of a multilingual sentiment analysis model trained over multiple-languages tweets and available open-source, and then a multilingual stance recognition model trained over several languages and assessed over English language. Finally we propose to link the perplexity of each example with the bias of the model, by looking at the change in label distribution with respect to the language of the target group. Our work offers a fine-grained analysis of the interactions between names and languages, revealing significant biases in multilingual models.
Anthology ID:
2024.lrec-main.134
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
1511–1518
Language:
URL:
https://aclanthology.org/2024.lrec-main.134
DOI:
Bibkey:
Cite (ACL):
Valentin Barriere and Sebastian Cifuentes. 2024. Are Text Classifiers Xenophobic? A Country-Oriented Bias Detection Method with Least Confounding Variables. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 1511–1518, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Are Text Classifiers Xenophobic? A Country-Oriented Bias Detection Method with Least Confounding Variables (Barriere & Cifuentes, LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.134.pdf