Ai Kubota


2024

pdf bib
Annotation of Japanese Discourse Relations Focusing on Concessive Inferences
Ai Kubota | Takuma Sato | Takayuki Amamoto | Ryota Akiyoshi | Koji Mineshima
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

In this study, we focus on the inference presupposed in the concessive discourse relation and present the discourse relation annotation for the Japanese connectives ‘nagara’ and ‘tsutsu’, both of which have two usages: Synchronous and Concession, just like English while. We also present the annotation for ‘tokorode’, which is ambiguous in three ways: Temporal, Location, and Concession. While corpora containing concessive discourse relations already exist, the distinctive feature of our study is that it aims to identify the concessive inferential relations by writing out the implicit presupposed inferences. In this paper, we report on the annotation methodology and its results, as well as the characteristics of concession that became apparent during annotation.

2019

pdf bib
Probing the nature of an island constraint with a parsed corpus
Yusuke Kubota | Ai Kubota
Linguistic Issues in Language Technology, Volume 18, 2019 - Exploiting Parsed Corpora: Applications in Research, Pedagogy, and Processing

This paper presents a case study of the use of the NINJAL Parsed Corpus of Modern Japanese (NPCMJ) for syntactic research. NPCMJ is the first phrase structure-based treebank for Japanese that is specifically designed for application in linguistic (in addition to NLP) research. After discussing some basic methodological issues pertaining to the use of treebanks for theoretical linguistics research, we introduce our case study on the status of the Coordinate Structure Constraint (CSC) in Japanese, showing that NPCMJ enables us to easily retrieve examples that support one of the key claims of Kubota and Lee (2015): that the CSC should be viewed as a pragmatic, rather than a syntactic constraint. The corpus-based study we conducted moreover revealed a previously unnoticed tendency that was highly relevant for further clarifying the principles governing the empirical data in question. We conclude the paper by briefly discussing some further methodological issues brought up by our case study pertaining to the relationship between linguistic research and corpus development.