Zhaoyang Xia


2024

pdf bib
Diffusion Models for Sign Language Video Anonymization
Zhaoyang Xia | Yang Zhou | Ligong Han | Carol Neidle | Dimitris N. Metaxas
Proceedings of the LREC-COLING 2024 11th Workshop on the Representation and Processing of Sign Languages: Evaluation of Sign Language Resources

pdf bib
A Multimodal Spatio-Temporal GCN Model with Enhancements for Isolated Sign Recognition
Yang Zhou | Zhaoyang Xia | Yuxiao Chen | Carol Neidle | Dimitris N. Metaxas
Proceedings of the LREC-COLING 2024 11th Workshop on the Representation and Processing of Sign Languages: Evaluation of Sign Language Resources

2022

pdf bib
Sign Language Video Anonymization
Zhaoyang Xia | Yuxiao Chen | Qilong Zhangli | Matt Huenerfauth | Carol Neidle | Dimitri Metaxas
Proceedings of the LREC2022 10th Workshop on the Representation and Processing of Sign Languages: Multilingual Sign Language Resources

Deaf signers who wish to communicate in their native language frequently share videos on the Web. However, videos cannot preserve privacy—as is often desirable for discussion of sensitive topics—since both hands and face convey critical linguistic information and therefore cannot be obscured without degrading communication. Deaf signers have expressed interest in video anonymization that would preserve linguistic content. However, attempts to develop such technology have thus far shown limited success. We are developing a new method for such anonymization, with input from ASL signers. We modify a motion-based image animation model to generate high-resolution videos with the signer identity changed, but with the preservation of linguistically significant motions and facial expressions. An asymmetric encoder-decoder structured image generator is used to generate the high-resolution target frame from the low-resolution source frame based on the optical flow and confidence map. We explicitly guide the model to attain a clear generation of hands and faces by using bounding boxes to improve the loss computation. FID and KID scores are used for the evaluation of the realism of the generated frames. This technology shows great potential for practical applications to benefit deaf signers.