Yasutomo Kimura


2024

pdf bib
Masking Explicit Pro-Con Expressions for Development of a Stance Classification Dataset on Assembly Minutes
Tomoyosi Akiba | Yuki Gato | Yasutomo Kimura | Yuzu Uchida | Keiichi Takamaru
Proceedings of the Second Workshop on Natural Language Processing for Political Sciences @ LREC-COLING 2024

In this paper, a new dataset for Stance Classification based on assembly minutes is introduced. We develop it by using publicity available minutes taken from diverse Japanese local governments including prefectural, city, and town assemblies. In order to make the task to predict a stance from content of a politician’s utterance without explicit stance expressions, predefined words that directly convey the speaker’s stance in the utterance are replaced by a special token. Those masked words are also used to assign a golden label, either agreement or disagreement, to the utterance. Finally, we constructed total 15,018 instances automatically from 47 Japanese local governments. The dataset is used in the shared Stance Classification task evaluated in the NTCIR-17 QA-Lab-PoliInfo-4, and is now publicity available. Since the construction method of the dataset is automatic, we can still apply it to obtain more instances from the other Japanese local governments.

2022

pdf bib
Budget Argument Mining Dataset Using Japanese Minutes from the National Diet and Local Assemblies
Yasutomo Kimura | Hokuto Ototake | Minoru Sasaki
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Budget argument mining attempts to identify argumentative components related to a budget item, and then classifies these argumentative components, given budget information and minutes. We describe the construction of the dataset for budget argument mining, a subtask of QA Lab-PoliInfo-3 in NTCIR-16. Budget argument mining analyses the argument structure of the minutes, focusing on monetary expressions (amount of money). In this task, given sufficient budget information (budget item, budget amount, etc.), relevant argumentative components in the minutes are identified and argument labels (claim, premise, and other) are assigned their components. In this paper, we describe the design of the data format, the annotation procedure, and release information of budget argument mining dataset, to link budget information to minutes.

2020

pdf bib
Extraction of the Argument Structure of Tokyo Metropolitan Assembly Minutes: Segmentation of Question-and-Answer Sets
Keiichi Takamaru | Yasutomo Kimura | Hideyuki Shibuki | Hokuto Ototake | Yuzu Uchida | Kotaro Sakamoto | Madoka Ishioroshi | Teruko Mitamura | Noriko Kando
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this study, we construct a corpus of Japanese local assembly minutes. All speeches in an assembly were transcribed into a local assembly minutes based on the local autonomy law. Therefore, the local assembly minutes form an extremely large amount of text data. Our ultimate objectives were to summarize and present the arguments in the assemblies, and to use the minutes as primary information for arguments in local politics. To achieve this, we structured all statements in assembly minutes. We focused on the structure of the discussion, i.e., the extraction of question and answer pairs. We organized the shared task “QA Lab-PoliInfo” in NTCIR 14. We conducted a “segmentation task” to identify the scope of one question and answer in the minutes as a sub task of the shared task. For the segmentation task, 24 runs from five teams were submitted. Based on the obtained results, the best recall was 1.000, best precision was 0.940, and best F-measure was 0.895.

2016

pdf bib
Creating Japanese Political Corpus from Local Assembly Minutes of 47 prefectures
Yasutomo Kimura | Keiichi Takamaru | Takuma Tanaka | Akio Kobayashi | Hiroki Sakaji | Yuzu Uchida | Hokuto Ototake | Shigeru Masuyama
Proceedings of the 12th Workshop on Asian Language Resources (ALR12)

This paper describes a Japanese political corpus created for interdisciplinary political research. The corpus contains the local assembly minutes of 47 prefectures from April 2011 to March 2015. This four-year period coincides with the term of office for assembly members in most autonomies. We analyze statistical data, such as the number of speakers, characters, and words, to clarify the characteristics of local assembly minutes. In addition, we identify problems associated with the different web services used by the autonomies to make the minutes available to the public.

2013

pdf bib
Detecting Cyberbullying Entries on Informal School Websites Based on Category Relevance Maximization
Taisei Nitta | Fumito Masui | Michal Ptaszynski | Yasutomo Kimura | Rafal Rzepka | Kenji Araki
Proceedings of the Sixth International Joint Conference on Natural Language Processing