Our technology has no boundaries! NVIDIA is building the world’s most groundbreaking and state-of-the-art accelerated computing platforms. Because of our work, scientists, researchers, and engineers can advance their ideas. We pioneered a supercharged form of computing loved by the fastest-paced computer users in the world—scientists, designers, artists, and gamers.
We seek a highly motivated Network Performance Exploration Engineer to join our team of experts and help shape the foundational infrastructure for the AI revolution. Our next-generation networking systems are at the forefront of connecting and powering the world's most advanced AI clusters. As a key member of our architecture team, you will be responsible for exploring and identifying critical network optimization opportunities across our entire hardware and software stack, analyzing how system-level changes impact application-level performance.
What You’ll Be Doing:
Explore and validate end-to-end application performance, defining comprehensive test plans and critical metrics to identify optimization opportunities in both hardware and software.
Establish and maintain a comprehensive database of benchmark results, tracking performance across releases to drive data-informed decisions.
Conduct deep-dive analysis into communication libraries (like NCCL), system software, and hardware configurations to investigate performance characteristics, validate architectural theories, and identify bottlenecks.
Provide critical performance data to correlate and enhance simulation tools, ensuring our models accurately predict real-world hardware behavior.
Analyze application-level traffic patterns (e.g., LLMs) on our advanced networking fabrics to identify hardware and software optimization opportunities and tune system parameters.
Lead Proof-of-Concept (POC) projects to prototype and evaluate potential hardware and software optimizations and their impact on application performance.
What We Need To See:
B.Sc. or M.Sc. degree in Computer Science, Computer Engineering, or Electrical Engineering, or equivalent experience.
5+ years of relevant industry or research experience in high-performance computing, computer architecture, or computer networks.
Hands-on programming skills in Python and/or C/C++ for system analysis, automation, and customizing benchmarks.
Excellent understanding of large-scale system behavior and the effect of distributed computing workloads on network and system performance.
Proven experience in performance analysis, benchmarking, and identifying system bottlenecks.
Exceptional analytical, problem-solving, and systems-thinking skills, with the ability to dive deep into complex software and hardware interactions.
Ability to thrive in a a fast-paced, dynamic environment and work concurrently with multiple cross-functional teams.
Ways To Stand Out From The Crowd:
Deep understanding of and hands-on experience with communication libraries such as NCCL, UCX, or MPI.
Direct experience debugging or modifying the source code of a major communication library.
Expertise in the architecture and system-level requirements of large-scale, distributed Deep Learning workloads (e.g., LLMs).
Expertise in high-performance network protocols (Ethernet, InfiniBand, RoCE) and interconnect technologies like NVLink.
Familiarity with the PyTorch ecosystem, especially for distributed workloads.
NVIDIA has some of the most forward-thinking and hardworking people in the world working for us, and due to unprecedented growth, our world-class engineering teams are growing fast. If you're a creative and autonomous engineer with a real passion for technology, we want to hear from you.
We are committed to fostering a diverse work environment and are proud to be an equal-opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.