Rajarshi Haldar


2024

pdf bib
Analyzing the Performance of Large Language Models on Code Summarization
Rajarshi Haldar | Julia Hockenmaier
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Large language models (LLMs) such as Llama 2 perform very well on tasks that involve both natural language and source code, particularly code summarization and code generation. We show that for the task of code summarization, the performance of these models on individual examples often depends on the amount of (subword) token overlap between the code and the corresponding reference natural language descriptions in the dataset. This token overlap arises because the reference descriptions in standard datasets (corresponding to docstrings in large code bases) are often highly similar to the names of the functions they describe. We also show that this token overlap occurs largely in the function names of the code and compare the relative performance of these models after removing function names versus removing code structure. We also show that using multiple evaluation metrics like BLEU and BERTScore gives us very little additional insight since these metrics are highly correlated with each other.

2020

pdf bib
A Multi-Perspective Architecture for Semantic Code Search
Rajarshi Haldar | Lingfei Wu | JinJun Xiong | Julia Hockenmaier
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

The ability to match pieces of code to their corresponding natural language descriptions and vice versa is fundamental for natural language search interfaces to software repositories. In this paper, we propose a novel multi-perspective cross-lingual neural framework for code–text matching, inspired in part by a previous model for monolingual text-to-text matching, to capture both global and local similarities. Our experiments on the CoNaLa dataset show that our proposed model yields better performance on this cross-lingual text-to-code matching task than previous approaches that map code and text to a single joint embedding space.

2018

pdf bib
CL Scholar: The ACL Anthology Knowledge Graph Miner
Mayank Singh | Pradeep Dogga | Sohan Patro | Dhiraj Barnwal | Ritam Dutt | Rajarshi Haldar | Pawan Goyal | Animesh Mukherjee
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

We present CL Scholar, the ACL Anthology knowledge graph miner to facilitate high-quality search and exploration of current research progress in the computational linguistics community. In contrast to previous works, periodically crawling, indexing and processing of new incoming articles is completely automated in the current system. CL Scholar utilizes both textual and network information for knowledge graph construction. As an additional novel initiative, CL Scholar supports more than 1200 scholarly natural language queries along with standard keyword-based search on constructed knowledge graph. It answers binary, statistical and list based natural language queries. The current system is deployed at http://cnerg.iitkgp.ac.in/aclakg. We also provide REST API support along with bulk download facility. Our code and data are available at https://github.com/CLScholar.