NVIDIA

AI Research Intern, TAO Multi-Modal Model Development - 2026

Vietnam, Hanoi Full time

Our team is seeking to extend the internship of our current AI Research Intern for the TAO (Train, Adapt, Optimize) Multi-Modal Model Development project, recognizing their exceptional performance and strong alignment with the team’s research goals. Their innovative ideas and technical contributions have significantly enhanced our work. Given the rapidly evolving field of multi-modal AI, encompassing vision-language modeling, universal segmentation, and large-scale model training, extending this internship will provide further growth opportunities for the intern while strengthening our team’s capacity to develop scalable, high-impact AI solutions.

  

Embark on an exciting journey with NVIDIA, a global leader in AI and accelerated computing. As an AI Research Intern focusing on multi-modal AI and vision-language model development within the TAO framework in Hanoi/HCM City, Vietnam, you will be at the forefront of advancing cutting-edge machine learning research. You’ll collaborate with a talented team of engineers and researchers dedicated to developing state-of-the-art deep learning models for tasks such as image segmentation, cross-modal understanding, and universal representation learning. This internship offers a unique opportunity to contribute to next-generation AI systems with real-world impact across industries—from autonomous vehicles to intelligent content understanding.

What you'll be doing:

  • Develop and fine-tune multi-modal AI models using NVIDIA’s TAO Toolkit and deep learning frameworks.

  • Contribute to the design and implementation of vision-language models (VLMs) and universal segmentation systems.

  • Conduct experiments and benchmarking to evaluate model accuracy, robustness, and scalability.

  • Collaborate with cross-functional teams to integrate your research into production-level pipelines and NVIDIA SDKs.

  • Participate in research discussions, code reviews, and technical documentation to share insights and improve methodologies.

 

What we need to see:

  • Currently pursuing a degree in Computer Science, Computer Engineering, or a related field.

  • Proven experience with machine learning, deep learning, or computer vision model development.

  • Strong Python programming skills and proficiency with PyTorch or similar frameworks.

  • Solid understanding of neural network architectures, transformers, and multi-modal learning techniques.

  • Excellent problem-solving abilities, attention to detail, and a collaborative mindset.

  • Familiarity with vision-language models, image segmentation, or large-scale pretraining is a strong plus.

 

Widely considered to be one of the technology world’s most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package. As you plan your future, see what we can offer to you and your family www.nvidiabenefits.com/

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.