Reduce Redundancy Then Rerank: Enhancing Code Summarization with a Novel Pipeline Framework

Xiaoyu Hu, Xu Zhang, Zexu Lin, Deyu Zhou


Abstract
Code summarization is the task of automatically generating natural language descriptions from source code. Recently, pre-trained language models have gained significant popularity in code summarization due to their capacity to capture richer semantic representations of both code and natural language. Nonetheless, contemporary code summarization models grapple with two fundamental limitations. (1) Some tokens in the code are irrelevant to the natural language description and damage the alignment of the representation spaces for code and language. (2) Most approaches are based on the encoder-decoder framework, which is often plagued by the exposure bias problem, hampering the effectiveness of their decoding sampling strategies. To address the two challenges, we propose a novel pipeline framework named Reduce Redundancy then Rerank (Reˆ3). Specifically, a redundancy reduction component is introduced to eliminate redundant information in code representation space. Moreover, a re-ranking model is incorporated to select more suitable summary candidates, alleviating the exposure bias problem. The experimental results show the effectiveness of Reˆ3 over some state-of-the-art approaches across six different datasets from the CodeSearchNet benchmark.
Anthology ID:
2024.lrec-main.1198
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
13722–13733
Language:
URL:
https://aclanthology.org/2024.lrec-main.1198
DOI:
Bibkey:
Cite (ACL):
Xiaoyu Hu, Xu Zhang, Zexu Lin, and Deyu Zhou. 2024. Reduce Redundancy Then Rerank: Enhancing Code Summarization with a Novel Pipeline Framework. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 13722–13733, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Reduce Redundancy Then Rerank: Enhancing Code Summarization with a Novel Pipeline Framework (Hu et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.1198.pdf