Accelerating Inference for Pretrained Language Models by Unified Multi-Perspective Early Exiting

Jun Kong, Jin Wang, Liang-Chih Yu, Xuejie Zhang


Abstract
Conditional computation algorithms, such as the early exiting (EE) algorithm, can be applied to accelerate the inference of pretrained language models (PLMs) while maintaining competitive performance on resource-constrained devices. However, this approach is only applied to the vertical architecture to decide which layers should be used for inference. Conversely, the operation of the horizontal perspective is ignored, and the determination of which tokens in each layer should participate in the computation fails, leading to a high redundancy for adaptive inference. To address this limitation, a unified horizontal and vertical multi-perspective early exiting (MPEE) framework is proposed in this study to accelerate the inference of transformer-based models. Specifically, the vertical architecture uses recycling EE classifier memory and weighted self-distillation to enhance the performance of the EE classifiers. Then, the horizontal perspective uses recycling class attention memory to emphasize the informative tokens. Conversely, the tokens with less information are truncated by weighted fusion and isolated from the following computation. Based on this, both horizontal and vertical EE are unified to obtain a better tradeoff between performance and efficiency. Extensive experimental results show that MPEE can achieve higher acceleration inference with competent performance than existing competitive methods.
Anthology ID:
2022.coling-1.414
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Editors:
Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, Seung-Hoon Na
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
4677–4686
Language:
URL:
https://aclanthology.org/2022.coling-1.414
DOI:
Bibkey:
Cite (ACL):
Jun Kong, Jin Wang, Liang-Chih Yu, and Xuejie Zhang. 2022. Accelerating Inference for Pretrained Language Models by Unified Multi-Perspective Early Exiting. In Proceedings of the 29th International Conference on Computational Linguistics, pages 4677–4686, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
Accelerating Inference for Pretrained Language Models by Unified Multi-Perspective Early Exiting (Kong et al., COLING 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.coling-1.414.pdf
Code
 junkong5/mpee
Data
GLUEMRPCMultiNLIQNLISSTSST-2