Kexuan Sun


2024

pdf bib
Efficient and Accurate Contextual Re-Ranking for Knowledge Graph Question Answering
Kexuan Sun | Nicolaas Paul Jedema | Karishma Sharma | Ruben Janssen | Jay Pujara | Pedro Szekely | Alessandro Moschitti
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The efficacy of neural “retrieve and generate” systems is well established for question answering (QA) over unstructured text. Recent efforts seek to extend this approach to knowledge graph (KG) QA by converting structured triples to unstructured text. However, the relevance of KG triples retrieved by these systems limits their accuracy. In this paper, we improve the relevance of retrieved triples using a carefully designed re-ranker. Specifically, our pipeline (i) retrieves over documents of triples grouped by entity, (ii) re-ranks triples from these documents with context: triples in the 1-hop neighborhood of the documents’ subject entity, and (iii) generates an answer from highly relevant re-ranked triples. To train our re-ranker, we propose a novel “triple-level” labeling strategy that infers fine-grained labels and shows that these significantly improve the relevance of retrieved information. We show that the resulting “retrieve, re-rank, and generate” pipeline significantly improves upon prior KGQA systems, achieving a new state-of-the-art on FreebaseQA by 5.56% Exact Match. We perform multiple ablations that reveal the distinct benefits of our contextual re-ranker and labeling strategy and conclude with a case study that highlights opportunities for future works.

2022

pdf bib
DigiCall: A Benchmark for Measuring the Maturity of Digital Strategy through Company Earning Calls
Hilal Pataci | Kexuan Sun | T. Ravichandran
Proceedings of the Fourth Workshop on Financial Technology and Natural Language Processing (FinNLP)

Digital transformation reinvents companies, their vision and strategy, organizational structure, processes, capabilities, and culture, and enables the development of new or enhanced products and services delivered to customers more efficiently. Organizations, by formalizing their digital strategy attempt to plan for their digital transformations and accelerate their company growth. Understanding how successful a company is in its digital transformation starts with accurate measurement of its digital maturity levels. However, existing approaches to measuring organizations’ digital strategy have low accuracy levels and this leads to inconsistent results, and also does not provide resources (data) for future research to improve. In order to measure the digital strategy maturity of companies, we leverage the state-of-the-art NLP models on unstructured data (earning call transcripts), and reach the state-of-the-art levels (94%) for this task. We release 3.691 earning call transcripts and also annotated data set, labeled particularly for the digital strategy maturity by linguists. Our work provides an empirical baseline for research in industry and management science.

2021

pdf bib
Table-based Fact Verification With Salience-aware Learning
Fei Wang | Kexuan Sun | Jay Pujara | Pedro Szekely | Muhao Chen
Findings of the Association for Computational Linguistics: EMNLP 2021

Tables provide valuable knowledge that can be used to verify textual statements. While a number of works have considered table-based fact verification, direct alignments of tabular data with tokens in textual statements are rarely available. Moreover, training a generalized fact verification model requires abundant labeled training data. In this paper, we propose a novel system to address these problems. Inspired by counterfactual causality, our system identifies token-level salience in the statement with probing-based salience estimation. Salience estimation allows enhanced learning of fact verification from two perspectives. From one perspective, our system conducts masked salient token prediction to enhance the model for alignment and reasoning between the table and the statement. From the other perspective, our system applies salience-aware data augmentation to generate a more diverse set of training instances by replacing non-salient terms. Experimental results on TabFact show the effective improvement by the proposed salience-aware learning techniques, leading to the new SOTA performance on the benchmark.

2020

pdf bib
Learning Collaborative Agents with Rule Guidance for Knowledge Graph Reasoning
Deren Lei | Gangrong Jiang | Xiaotao Gu | Kexuan Sun | Yuning Mao | Xiang Ren
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Walk-based models have shown their advantages in knowledge graph (KG) reasoning by achieving decent performance while providing interpretable decisions. However, the sparse reward signals offered by the KG during a traversal are often insufficient to guide a sophisticated walk-based reinforcement learning (RL) model. An alternate approach is to use traditional symbolic methods (e.g., rule induction), which achieve good performance but can be hard to generalize due to the limitation of symbolic representation. In this paper, we propose RuleGuider, which leverages high-quality rules generated by symbolic-based methods to provide reward supervision for walk-based agents. Experiments on benchmark datasets shows that RuleGuider clearly improves the performance of walk-based models without losing interpretability.