About me

I am a Ph.D. student in Human-Computer Interaction at the Department of Computer Science and Technology, Tsinghua University.

Research Interests

Z. Jin*, S. Zhou*, H. Wang, et al., “From Natural Alignment to Conditional Controllability in Multimodal Dialogue,” The Fourteenth International Conference on Learning Representations (ICLR), 2026. OpenReview
S. Zhou, X. Qin, Y. Zhou, et al., “HarmoniVox: Painting Voices to Match the Avatar’s Soul,” Proceedings of the 33rd ACM International Conference on Multimedia (MM ‘25), 2025, pp. 6720-6729. DOI: 10.1145/3746027.3755736
Z. Jin, J. Jia, Q. Wang, et al., “SpeechCraft: A Fine-Grained Expressive Speech Dataset with Natural Language Description,” Proceedings of the 32nd ACM International Conference on Multimedia (MM ‘24), 2024, pp. 1255-1264. DOI: 10.1145/3664647.3681674
Y. Zhou, X. Qin, Z. Jin, et al., “VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling,” Proceedings of the 32nd ACM International Conference on Multimedia (MM ‘24), 2024, pp. 554-563. DOI: 10.1145/3664647.3681680
H. Wu, S. Zhou, J. Jia, J. Xing, Q. Wen, and X. Wen, “Speech-Driven 3D Face Animation with Composite and Regional Facial Movements,” Proceedings of the 31st ACM International Conference on Multimedia (MM ‘23), 2023, pp. 6822-6830. DOI: 10.1145/3581783.3611775