Research and develop speech/audio large models, including but not limited to models for speech dialogue (speech interaction/audio-video dialogue), audio understanding (ASR/audio captioning), and audio generation (TTS/video dubbing) .
Be responsible for data and algorithm work related to the pre-training, post-training, and reinforcement learning (for both text and audio) of speech/audio large models .
Oversee the open-sourcing of speech dialogue/audio understanding/audio generation models and their productization. This includes end-to-end optimization of the full pipeline for speech dialogue products, optimizing audio understanding in scenarios involving noise/accent/far-field/sound effects/music, and enhancing speech synthesis for applications like broadcasting, casual conversation, gaming, and social interaction .
Strong coding skills and a solid foundation in data structures and algorithms. Proficiency in Python or C/C++ is required, along with familiarity with model training frameworks like PyTorch, Megatron, or DeepSpeed. Prior awards in competitions such as ACM/ICPC, NOI/IOI, Top Coder, or Kaggle are advantageous .
Having publications in top-tier conferences or journals such as NeurIPS, ICLR, ICML, ACL, CVPR, ICASSP, or INTERSPEECH is preferred .
A solid background in mathematics and signal processing, good reading ability for English technical literature, strong motivation/curiosity/teamwork spirit, excellent problem-solving skills, and a passion for pursuing technological innovation
As an equal opportunity employer, we firmly believe that diverse voices fuel our innovation and allow us to better serve our users and the community. We foster an environment where every employee of Tencent feels supported and inspired to achieve individual and common goals.