Detecting Hate Speech in Turkish Print Media: A Corpus and A Hybrid Approach with Target-oriented Linguistic Knowledge

Gökçe Uludoğan, Atıf Emre Yüksel, Ümit Tunçer, Burak Işık, Yasemin Korkmaz, Didar Akar, Arzucan Özgür


Abstract
The use of hate speech targeting ethnicity, nationalities, religious identities, and specific groups has been on the rise in the news media. However, most existing automatic hate speech detection models focus on identifying hate speech, often neglecting the target group-specific language that is common in news articles. To address this problem, we first compile a hate speech dataset, TurkishHatePrintCorpus, derived from Turkish news articles and annotate it specifically for the language related to the targeted group. We then introduce the HateTargetBERT model, which integrates the target-centric linguistic features extracted in this study into the BERT model, and demonstrate its effectiveness in detecting hate speech while allowing the model’s classification decision to be explained. We have made the dataset and source code publicly available at url{https://github.com/boun-tabi/HateTargetBERT-TR}.
Anthology ID:
2024.case-1.29
Volume:
Proceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024)
Month:
March
Year:
2024
Address:
St. Julians, Malta
Editors:
Ali Hürriyetoğlu, Hristo Tanev, Surendrabikram Thapa, Gökçe Uludoğan
Venues:
CASE | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
205–214
Language:
URL:
https://aclanthology.org/2024.case-1.29
DOI:
Bibkey:
Cite (ACL):
Gökçe Uludoğan, Atıf Emre Yüksel, Ümit Tunçer, Burak Işık, Yasemin Korkmaz, Didar Akar, and Arzucan Özgür. 2024. Detecting Hate Speech in Turkish Print Media: A Corpus and A Hybrid Approach with Target-oriented Linguistic Knowledge. In Proceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024), pages 205–214, St. Julians, Malta. Association for Computational Linguistics.
Cite (Informal):
Detecting Hate Speech in Turkish Print Media: A Corpus and A Hybrid Approach with Target-oriented Linguistic Knowledge (Uludoğan et al., CASE-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.case-1.29.pdf
Supplementary material:
 2024.case-1.29.SupplementaryMaterial.txt
Video:
 https://aclanthology.org/2024.case-1.29.mp4