Ecolab

Lead AI Engineer- DevOps

IND - Karnataka - Bangalore - EDC Full time

ROLE SUMMARY

As a Lead DevOps Engineer, you will lead the design and operation of the build, release, infrastructure, observability, and runtime engineering practices that enable product teams to ship and operate secure, scalable, and reliable digital solutions — including AI-enabled and agentic AI products. This role is not limited to infrastructure automation; it also requires a strong full-stack engineering flavor, with the ability to understand how frontend, backend, APIs, data services, and AI services come together in production systems.

You will work closely with software engineers, AI engineers, integration engineers, Team Leads, architects, and platform teams to ensure systems are deployable, observable, supportable, and cost-aware. You will guide CI/CD design, environment standardization, release automation, platform reliability, cloud-native deployment practices, and engineering enablement across the SDLC.

The ideal candidate brings deep experience in DevOps, cloud platforms, automation, platform engineering, containerization, release engineering, and observability, along with strong practical understanding of application architecture, full-stack delivery patterns, and production support expectations.

KEY RESPONSIBILITIES

  • Lead the design, implementation, and continuous improvement of CI/CD pipelines, deployment workflows, environment strategies, and release automation for digital and AI-enabled products
  • Build and operate cloud-native infrastructure and runtime platforms that support backend services, APIs, UI applications, integrations, and AI workloads
  • Partner with engineering teams to improve deployability, testability, scalability, observability, and operational resilience across the full product stack
  • Design and maintain infrastructure-as-code, environment provisioning, secrets management, access control, and deployment consistency across development, test, and production environments
  • Support delivery of containerized services, microservices, web applications, event-driven systems, and AI-enabled application components
  • Contribute to architecture and delivery discussions by bringing a strong understanding of backend services, APIs, frontend deployment needs, runtime dependencies, and full-stack production patterns
  • Implement and optimize observability using logs, traces, metrics, distributed tracing, dashboards, alerts, and cost / capacity signals
  • Support AI and ML workloads by enabling deployment environments, model-serving patterns, runtime monitoring, cost visibility, and release controls
  • Drive operational readiness practices such as runbooks, deployment validation, rollback mechanisms, incident response, root-cause analysis, and post-incident improvement
  • Standardize engineering practices for build automation, release quality, environment hygiene, dependency control, and operational support
  • Collaborate with security, architecture, and platform teams to ensure solutions meet requirements for security, reliability, compliance, supportability, and scale
  • Mentor engineers on DevOps and platform engineering best practices and contribute reusable accelerators for delivery teams

Required Qualifications

  • 6 to 8+ years of experience in DevOps, platform engineering, site reliability engineering, cloud engineering, or software engineering, including strong hands-on experience operating production systems in enterprise environments
  • Proven experience building and operating CI/CD pipelines, cloud-native deployment platforms, containerized workloads, infrastructure automation, and release engineering frameworks
  • Strong hands-on experience with Azure DevOps, GitHub, GitHub Actions, Terraform, Bicep, Docker, Kubernetes / AKS, Azure Container Registry, Azure Functions, Azure Container Apps, or equivalent DevOps and cloud platforms
  • Strong understanding of cloud-native application delivery, including backend APIs, event-driven services, authentication flows, runtime dependencies, deployment pipelines, and production support models
  • Practical experience with full-stack application delivery patterns, including operational understanding of React-based frontends, Node.js services, Python / backend APIs, REST services, microservices, containerized applications, and modern web deployment architectures
  • Familiarity with frontend and backend build pipelines, static asset deployment, service configuration, environment variables, API gateway integration, and full-stack runtime troubleshooting
  • Strong experience with observability, logging, tracing, and platform diagnostics using tools such as Azure Monitor, Application Insights, OpenTelemetry, Log Analytics, Datadog, New Relic, Grafana, Prometheus, or equivalent monitoring and reliability platforms
  • Experience implementing infrastructure as code, secrets and identity management, environment standardization, deployment controls, rollback strategies, and operational governance practices
  • Familiarity with AI- and ML-enabled workloads, including runtime support for Azure OpenAI, Azure AI Studio, PromptFlow, Azure Machine Learning, or equivalent platforms from a deployment, monitoring, and operational readiness standpoint
  • Understanding of CI/CD, test automation, release quality, incident response, root-cause analysis, and continuous reliability improvement across the SDLC
  • Ability to work closely with software engineers, AI engineers, architects, and Team Leads to enable fast, secure, and maintainable delivery
  • Proven ability to reduce operational toil, improve engineering productivity, and standardize delivery through automation and platform improvements
  • Strong communication and technical leadership skills, including mentoring engineers and influencing engineering standards across teams

Preferred Qualifications

  • Experience supporting or enabling AI / GenAI / agentic AI products in production environments
  • Familiarity with Azure OpenAI, Azure AI Studio, PromptFlow, Azure Machine Learning, or equivalent platforms from a deployment, monitoring, and operational support perspective
  • Experience designing deployment and runtime patterns for LLM-powered services, agent orchestration services, vector-enabled retrieval, and API-integrated AI systems
  • Familiarity with Model Context Protocol (MCP), asynchronous workflows, long-running agents, or other runtime patterns relevant to agentic AI systems
  • Experience enabling secure delivery of products with integrations into SAP, ServiceNow, API gateways, workflow platforms, and event-driven enterprise systems
  • Hands-on experience with performance tuning, caching strategies, request tracing, service dependency analysis, and runtime diagnostics in full-stack production systems
  • Experience contributing platform accelerators, reusable IaC modules, DevOps templates, shared dashboards, or internal engineering enablement toolkits
  • Familiarity with cost optimization / FinOps, capacity planning, and scaling strategies for cloud-native and AI-heavy workloads
  • Experience in a build-own-operate product organization where engineering teams are responsible for long-term supportability and operational excellence
  • Ability to influence architecture, platform choices, and delivery patterns across multiple teams without losing hands-on technical depth.