About me
I am a Ph.D. student in Human-Computer Interaction at the Department of Computer Science and Technology, Tsinghua University.
Research Interests
- Multimedia
- Multi-modal Generation
- Speech Synthesis
- Text-to-Speech
Education
- 2024 - present, Ph.D. in Computer Science and Technology, Tsinghua University
- 2019 - 2024, B.Eng. in Computer Science and Technology, Tsinghua University
Research Publications
Conference Proceedings
- Z. Jin*, S. Zhou*, H. Wang, et al., “From Natural Alignment to Conditional Controllability in Multimodal Dialogue,” The Fourteenth International Conference on Learning Representations (ICLR), 2026. OpenReview
- S. Zhou, X. Qin, Y. Zhou, et al., “HarmoniVox: Painting Voices to Match the Avatar’s Soul,” Proceedings of the 33rd ACM International Conference on Multimedia (MM ‘25), 2025, pp. 6720-6729. DOI: 10.1145/3746027.3755736
- Z. Jin, J. Jia, Q. Wang, et al., “SpeechCraft: A Fine-Grained Expressive Speech Dataset with Natural Language Description,” Proceedings of the 32nd ACM International Conference on Multimedia (MM ‘24), 2024, pp. 1255-1264. DOI: 10.1145/3664647.3681674
- Y. Zhou, X. Qin, Z. Jin, et al., “VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling,” Proceedings of the 32nd ACM International Conference on Multimedia (MM ‘24), 2024, pp. 554-563. DOI: 10.1145/3664647.3681680
- H. Wu, S. Zhou, J. Jia, J. Xing, Q. Wen, and X. Wen, “Speech-Driven 3D Face Animation with Composite and Regional Facial Movements,” Proceedings of the 31st ACM International Conference on Multimedia (MM ‘23), 2023, pp. 6822-6830. DOI: 10.1145/3581783.3611775
Employment History
- Jul 2023 - Sep 2023, Software Development Engineer Intern, HUAWEI