NVIDIA

Senior DevOps Engineer

India, Pune Full time

NVIDIA is looking for an excellent engineer to join its Software Infrastructure team. The position will be part of a dynamic crew that develops sophisticated software tools to optimize development workflow and increase overall efficiency. NVIDIA is crafting a vision of incredible user experiences in the mobile, embedded and automotive spaces by combining our cutting edge Tegra and GPU development efforts into creative boundary pushing and genre defining products.

The Infrastructure, Planning and Processes (IPP) team is a global organization within NVIDIA which helps make this vision possible by crafting and maintaining a large scale private cloud system used for providing build and test infrastructure services for NVIDIA GPU, Mobile and Automotive Divisions. You should thrive when working in the critical path supporting thousands of developers working for billion dollar business lines as well as intimately understand the values of responsiveness, thoroughness and teamwork.

What you’ll be doing:

  • Drive automation to monitor and gain more insight into applications and system health.

  • Design solutions with service discovery, networking, monitoring, logging, scheduling in Kubernetes.

  • Implement, manage & maintain end to end Jenkins instances - tools, plugins, nodes, user management, back up, restore, monitoring, etc.

  • Implement & support end-to-end CI/CD system using open-source software.

  • Develop, Improve and Maintain our infrastructure codebase.

  • Craft and Implement critical metrics using various analytics methods and dashboards.

  • Architect the scaling operation in our data centers.

What we need to see:

  • Solid programming background in python and/or similar scripting languages.

  • Experience of maintaining cloud infrastructure and a highly-available production environment.

  • Excellent debugging, problem solving and analytical skills.

  • Strong understanding of architectural requirements and development processes involved in building reliable, robust, scalable data products and pipelines.

  • Background in Databases both SQL (MySQL) and NoSQL (Elastic Search /MongoDB/Cassandra).

  • Proficient with configuration management tools like Ansible, Puppet & Chef.

  • Strong background with Jenkins and/or other CI/CD systems and proficient with Kubernetes, dockers & virtualization.

  • Knowledge of monitoring systems such as Zabbix, Prometheus and/or similar systems.

  • 5+ years of proven experience.

  • Bachelor's or Master’s degree in Computer Science, Software Engineering, or equivalent experience.

  • Curiosity about LLMs, NLP, or AI-driven developer tools

Ways to stand out from the crowd:

  • Experience with Windows server infrastructure.

  • Experience with using and improving data centers.

  • Background with computer algorithms and ability to choose the best possible algorithms to meet the scaling challenge.

  • Analyze sophisticated problems into simple sub problems and then reuse available solutions to implement most of those.

  • Ability to design simple systems that can work efficiently without needing much support.