Jianhui Jiang


2024

pdf bib
A Hierarchical Sequence-to-Set Model with Coverage Mechanism for Aspect Category Sentiment Analysis
Siyu Wang | Jianhui Jiang | Shengran Dai | Jiangtao Qiu
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Aspect category sentiment analysis (ACSA) aims to simultaneously detect aspect categories and their corresponding sentiment polarities (category-sentiment pairs). Some recent studies have used pre-trained generative models to complete ACSA and achieved good results. However, for ACSA, generative models still face three challenges. First, addressing the missing predictions in ACSA is crucial, which involves accurately predicting all category-sentiment pairs within a sentence. Second, category-sentiment pairs are inherently a disordered set. Consequently, the model incurs a penalty even when its predictions are correct, but the predicted order is inconsistent with the ground truths. Third, different aspect categories should focus on relevant sentiment words, and the polarity of the aspect category should be the aggregation of the polarities of these sentiment words. This paper proposes a hierarchical generative model with a coverage mechanism using sequence-to-set learning to tackle all three challenges simultaneously. Our model’s superior performance is demonstrated through extensive experiments conducted on several datasets.

2022

pdf bib
A Domain Knowledge Enhanced Pre-Trained Language Model for Vertical Search: Case Study on Medicinal Products
Kesong Liu | Jianhui Jiang | Feifei Lyu
Proceedings of the 29th International Conference on Computational Linguistics

We present a biomedical knowledge enhanced pre-trained language model for medicinal product vertical search. Following ELECTRA’s replaced token detection (RTD) pre-training, we leverage biomedical entity masking (EM) strategy to learn better contextual word representations. Furthermore, we propose a novel pre-training task, product attribute prediction (PAP), to inject product knowledge into the pre-trained language model efficiently by leveraging medicinal product databases directly. By sharing the parameters of PAP’s transformer encoder with that of RTD’s main transformer, these two pre-training tasks are jointly learned. Experiments demonstrate the effectiveness of PAP task for pre-trained language model on medicinal product vertical search scenario, which includes query-title relevance, query intent classification, and named entity recognition in query.

pdf bib
Automatic Keyphrase Generation by Incorporating Dual Copy Mechanisms in Sequence-to-Sequence Learning
Siyu Wang | Jianhui Jiang | Yao Huang | Yin Wang
Proceedings of the 29th International Conference on Computational Linguistics

The keyphrase generation task is a challenging work that aims to generate a set of keyphrases for a piece of text. Many previous studies based on the sequence-to-sequence model were used to generate keyphrases, and they introduce a copy mechanism to achieve good results. However, we observed that most of the keyphrases are composed of some important words (seed words) in the source text, and if these words can be identified accurately and copied to create more keyphrases, the performance of the model might be improved. To address this challenge, we propose a DualCopyNet model, which introduces an additional sequence labeling layer for identifying seed words, and further copies the words for generating new keyphrases by dual copy mechanisms. Experimental results demonstrate that our model outperforms the baseline models and achieves an obvious performance improvement.