Zexu Lin


2024

pdf bib
Reduce Redundancy Then Rerank: Enhancing Code Summarization with a Novel Pipeline Framework
Xiaoyu Hu | Xu Zhang | Zexu Lin | Deyu Zhou
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Code summarization is the task of automatically generating natural language descriptions from source code. Recently, pre-trained language models have gained significant popularity in code summarization due to their capacity to capture richer semantic representations of both code and natural language. Nonetheless, contemporary code summarization models grapple with two fundamental limitations. (1) Some tokens in the code are irrelevant to the natural language description and damage the alignment of the representation spaces for code and language. (2) Most approaches are based on the encoder-decoder framework, which is often plagued by the exposure bias problem, hampering the effectiveness of their decoding sampling strategies. To address the two challenges, we propose a novel pipeline framework named Reduce Redundancy then Rerank (Reˆ3). Specifically, a redundancy reduction component is introduced to eliminate redundant information in code representation space. Moreover, a re-ranking model is incorporated to select more suitable summary candidates, alleviating the exposure bias problem. The experimental results show the effectiveness of Reˆ3 over some state-of-the-art approaches across six different datasets from the CodeSearchNet benchmark.