We are looking for experienced engineers to help build and scale next-generation AI infrastructure using PyTorch, one of the world’s most widely used deep learning frameworks. This role sits at the intersection of machine learning systems, compilers, and high-performance computing, enabling researchers and product teams to train and deploy large-scale models efficiently. You will work on core components of the PyTorch ecosystem, including model execution, distributed training, performance optimization, and developer experience.
What you'll be doing:
Design and build core PyTorch capabilities across runtime, autograd, distributed training, and model execution
Optimize performance across GPU/accelerator backends (CUDA, Triton, etc.)
Contribute to or lead development of large-scale ML systems and infrastructure
Improve model training efficiency, scalability, and reliability across multi-node environments
Work on compilers / graph transformations / kernel optimizations to accelerate deep learning workloads
Partner with researchers and applied teams to translate cutting-edge models into production systems
Drive open-source contributions and collaborate with the broader PyTorch community
Influence roadmap and architecture for next-gen AI platforms
Work at the forefront of AI and accelerated computing
Direct impact on how PyTorch runs on the world’s most advanced GPU platforms
Collaborate across hardware, systems software, and AI research to push performance boundaries and enable breakthroughs in generative AI, autonomous systems, and high-performance computing
What we need to see:
PhD or MSc degree in Computer Science, Applied Math, Physics, or related science or engineering field (or equivalent experience)
8+ years of software development experience,
Strong programming skills in C++ and Python
Deep understanding of deep learning frameworks, preferably PyTorch
Experience with GPU programming (CUDA or similar) and performance optimization
Ways to stand out from the crowd:
Contributions to PyTorch core or ecosystem libraries
Experience with NVIDIA AI stack (TensorRT, Triton Inference Server, cuBLAS, cuDNN, NCCL)
Familiarity with ML compilers (TorchInductor, Triton, XLA, TVM)
Experience optimizing LLMs or large-scale recommendation / vision models
Background working closely with hardware-aware software optimization
With competitive salaries and a generous benefits package, we are widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us and, due to outstanding growth, our best-in-class engineering teams are rapidly growing.
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5.You will also be eligible for equity and benefits.
This posting is for an existing vacancy.
NVIDIA uses AI tools in its recruiting processes.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.