We are now looking for a Senior AI Frameworks Engineer (C++/Python)! NVIDIA's high-performance computing platforms are powering the AI revolution across many applications and industries. Within our software stack, CUTLASS stands out as a popular open-source ecosystem dedicated to high-performance math primitives. Since 2017, it has provided the community with C++ template abstractions to implement custom GEMM and related computations efficiently on NVIDIA GPUs.
We are building the next frontier of this ecosystem: Pythonic CUTLASS (CUTLASS DSL). This initiative aims to bring "speed-of-light" performance and powerful abstractions of our stack directly into the Python environment. Join the CUTLASS team and help bridge the gap between low-level hardware primitives and high-level developer productivity. If you are passionate about building elegant, high-performance DSLs and want to empower the next generation of AI researchers and engineers with better tools, apply today!
What you'll be doing:
As a core contributor to the CUTLASS project, you will use your expertise in systems programming and API design to create a world-class developer experience for GPU programming and kernel delivery.
Design APIs that prioritize user productivity, providing a "native" feel for developers accustomed to modern scientific computing and deep learning frameworks.
Develop robust compilation infrastructure—including AST transformations and JIT-friendly execution—to lower Pythonic descriptions into high-performance GPU machine code.
Optimize developer experience by creating debugging tools, profiler integrations, and validation methodologies that make writing and using kernels easy.
Build production-grade delivery infrastructure for the open-source community, managing everything from package distribution (wheels, conda) to the user-facing documentation and testing.
What we need to see:
MS or PhD degree in Computer Science, Electrical Engineering, or related field (or equivalent experience).
At least 3+ years of relevant experience.
Strong proficiency in Python and C++, specifically regarding the design of Python extensions and foreign function interfaces (FFI).
Experience in library or framework development, with a focus on creating intuitive APIs for complex technical systems.
Deep understanding of the Python ecosystem’s delivery stack, including building, testing, and distributing high-performance compiled extensions.
Ways to stand out from the crowd:
Active maintainer status or significant contributions to high-performance open-source libraries, AI frameworks or compiler projects (LLVM/MLIR).
Understanding of compiler foundations, such as intermediate representations (IR), lowering passes, or AST manipulation.
Experience with GPU Architecture and parallel programming models (CUDA).
NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative and autonomous engineer, we want to hear from you!
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 152,000 USD - 241,500 USD for Level 3, and 184,000 USD - 287,500 USD for Level 4.You will also be eligible for equity and benefits.
This posting is for an existing vacancy.
NVIDIA uses AI tools in its recruiting processes.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.