At NVIDIA, we are shaping the future of accelerated computing and AI infrastructure. As part of our Networking and Systems Software organization, NVIDIA NetQ is a next-generation network operations and management platform that provides real-time visibility, validation, and troubleshooting for modern data center fabrics, including NVLink-based GPU clusters. We are looking for a skilled and motivated QA Engineer to help ensure the quality, reliability, and scalability of NetQ — a critical software platform used to manage and monitor large-scale, high-performance computing environments.
What you’ll be doing:
- Lead quality efforts for NVIDIA NetQ, a software management and monitoring platform for NVLink-based and high-performance cluster environments.
- Design, develop, and maintain Python-based automated test frameworks and validation tools.
- Define and execute functional, integration, regression, and system-level tests for distributed systems.
- Work closely with R&D, architects, and product managers to define test strategies and quality metrics.
- Validate telemetry, APIs, and system behavior across large-scale cluster environments.
- Investigate complex issues, perform root-cause analysis, and drive defects to resolution.
- Integrate quality processes into CI/CD pipelines and release flows.
- Contribute to improving system observability, testability, and overall product robustness.
What we need to see:
- BSc in Computer Science, Engineering, or equivalent practical experience.
- 1-3 years of professional experience
- Strong experience with Python and test automation frameworks.
- Proven experience working in Linux-based environments.
- Solid understanding of networking concepts, distributed systems, and system-level software.
- Experience testing complex software platforms (management tools, infrastructure software, or large systems).
- Familiarity with CI/CD pipelines, Git, and modern development workflows.
- Strong debugging and problem-solving skills.
- Ability to work independently and collaborate across multiple teams.
Ways to stand out from the crowd:
- Experience with NVLink, InfiniBand, or high-performance cluster environments.
- Knowledge of telemetry, monitoring, or observability systems.
- Experience with REST APIs, microservices, or containerized environments.
- Background in infrastructure, networking, or system software testing.
NVIDIA NetQ plays a key role in ensuring the reliability and performance of the world’s most advanced AI and HPC clusters. You’ll be working on software that directly impacts how large-scale GPU and NVLink systems are monitored, validated, and operated in production environments. At NVIDIA, we value innovation, ownership, and technical excellence — and we provide the environment to grow, lead, and make an impact.