Shuang Peng

Also published as:


2024

pdf bib
FlattenQuant: Breaking through the Inference Compute-bound for Large Language Models with Per-tensor Quantization
Yi Zhang | Fei Yang | Shuang Peng | Fangyu Wang | Aimin Pan
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Large language models (LLMs) have demonstrated state-of-the-art accuracies across various tasks. However, the latency of inference and the large GPU memory consumption of LLMs restrict their deployment performance. Recently, there have been some efficient attempts to quantize LLMs, yet inference with large batch size or long sequence still has the issue of being compute-bound. Fine-grained quantization methods have showcased their proficiency in achieving low-bit quantization for LLMs, while requiring FP16 data type for linear layer computations, which is time-consuming when dealing with large batch size or long sequence. In this paper, we introduce a method called FlattenQuant, which significantly reduces the maximum value of the tensor by flattening the larger channels in the tensor, to achieve low bit per-tensor quantization with minimal accuracy loss. Our experiments show that FlattenQuant can directly use 4 bits to achieve 48.29% of the linear layer calculation in LLMs, with the remaining layer using 8 bits. The 4-bit matrix multiplication introduced in the FlattenQuant method can effectively address the compute-bound caused by large matrix calculation. Our work achieves up to 2× speedup and 2.3× memory reduction for LLMs with negligible loss in accuracy.

2023

pdf bib
基于RoBERTa的中文仇恨言论侦测方法研究(Chinese Hate Speech detection method Based on RoBERTa-WWM)
Xiaojun Rao | Yangsen Zhang | Qilong Jia | Xueyang Liu | 晓俊 饶 | 仰森 张 | 爽 彭 | 启龙 贾 | 雪阳 刘
Proceedings of the 22nd Chinese National Conference on Computational Linguistics

“随着互联网的普及,社交媒体虽然提供了交流观点的平台,但因其虚拟性和匿名性也加剧了仇恨言论的传播,因此自动侦测仇恨言论对于维护社交媒体平台的文明发展至关重要。针对以上问题,构建了一个中文仇恨言论数据集CHSD,并提出了一种中文仇恨言论侦测模型RoBERTa-CHHSD。该模型首先采用RoBERTa预训练语言模型对中文仇恨言论进行序列化处理,提取文本特征信息;再分别接入TextCNN模型和Bi-GRU模型,提取多层次局部语义特征和句子间全局依赖关系信息;将二者结果融合来提取文本中更深层次的仇恨言论特征,对中文仇恨言论进行分类,从而实现中文仇恨言论的侦测。实验结果表明,本模型在CHSD数据集上的F1值为89.12%,与当前最优主流模型RoBERTa-WWM相比提升了1.76%。”

2021

pdf bib
A Dialogue-based Information Extraction System for Medical Insurance Assessment
Shuang Peng | Mengdi Zhou | Minghui Yang | Haitao Mi | Shaosheng Cao | Zujie Wen | Teng Xu | Hongbin Wang | Lei Liu
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021