Veeam is the Data and AI Trust Company, specializing in helping organizations ensure their data and AI are fully understood, secured, and resilient to enable the acceleration of safe AI at scale. As the market leader in both data resilience and data security posture management, Veeam is built for the convergence of identity, data, security, and AI risk. Headquartered in Seattle with offices in more than 30 countries, Veeam protects over 550,000 customers worldwide, who trust Veeam to keep their businesses running. Join us as we go fearlessly forward together, growing, learning, and making a real impact for some of the world’s biggest brands.
We’re building a new global Production Engineering (Prod Eng) function to serve as the bridge between Technical Support and Engineering. As Supervisor, you will lead your regional team to reduce the time it takes to resolve customer-impacting issues, empowering Technical Support with better escalation pathways, and keep SRE focused on strategic reliability work.
Your team will be part of a 24×7 follow-the-sun model, running operations for Veeam Data Cloud - handling day-to-day tasks, troubleshooting problems, identifying incidents, and owning escalations from Technical Support. You will work in conjunction with our SRE and Product Development teams - turning insights into systemic improvements, making our products more resilient while improving customer satisfaction at scale.
Regional Leadership & Team Enablement
Lead your region’s Production Engineering team’s day-to-day execution, including shift planning, priority setting, and workload balancing.
Coach and support engineers through investigations, ticket handling, incident response, and customer-impacting escalations.
Partner with the Production Engineering Manager on hiring engineers, onboarding, performance feedback, and career development for the region.
Ensure sustainable operations: fair scheduling, recovery time, and proactive burnout prevention.
Escalations & Incident Partnership
Own regional coverage for escalations from Technical Support, ensuring timely triage, communication, and resolution.
Coordinate mitigations and fixes with Product Engineering, SRE, and Support - maintaining a clear single-threaded owner per escalation.
Drive high-quality customer-impacting issue updates: crisp status, impact, next steps, and expected timelines.
Identify and participate in proactive incident response during the regional daytime
Operational Excellence & Continuous Improvement
Maintain and improve runbooks, on-call playbooks, and escalation pathways between Support ↔ Prod Eng ↔ SRE ↔ Product Engineering.
Identify recurring issues and systemic pain points; turn them into actionable engineering work (bugs, reliability improvements, automation, documentation).
Reduce handle times and operational friction by improving tooling and process workflows (e.g., SNOW ↔ Jira handoffs, ownership clarity, and “ticket → code → docs” closure).
Observability & Diagnostic Readiness
Ensure services and operational workflows are “debuggable by default” through strong logging, metrics, tracing, and alert hygiene.
Partner with SRE and platform teams to improve telemetry standards, dashboards, and escalation signals.
Contribute to creating and maintaining a reliable escalation experience for Support and field teams (e.g., known issues, standard diagnostics, and common mitigations).
What We Are Looking For
Required
3+ years of experience in an operations, systems engineering, or technical support role
Prior experience leading a small team or serving as a senior/primary escalation point.
Experience triaging and coordinating resolution for user-impacting issues.
Experience with ticketing systems (eg ServiceNow, Jira) and building clean handoff processes.
Strong troubleshooting skills across various integrated applications and systems.
Comfortability working across functions (Support, Engineering, SRE, Product) and across time zones.
Clear written and verbal communication, especially during escalations and incidents.
Preferred
Working knowledge of incident response practices (triage, mitigation, communication, post-incident follow-up).
Prior experience with Veeam backup software, Microsoft 365, Salesforce,
Experience with public cloud environments (Azure preferred; AWS/GCP also valued).
Familiarity with observability tooling and practices (e.g., OpenTelemetry, Prometheus, Grafana, logging platforms).
Experience improving operational workflows and reducing toil through automation
Define and lead a new global Production Engineering function from inception.
Drive measurable improvements in handle times, escalation quality, and customer satisfaction.
Collaborate closely with SRE and Engineering leadership to build resilient, scalable systems.
Work in a global environment with competitive compensation and opportunities for growth.
Please note: If the applicant is permanently located outside India, Veeam reserves the right to decline the application.
#LI-SK2
Please note that any personal data collected from you during the recruitment process will be processed in accordance with our Recruiting Privacy Notice.
The Privacy Notice sets out the basis on which the personal data collected from you, or that you provide to us, will be processed by us in connection with our recruitment processes.
By applying for this position, you consent to the processing of your personal data in accordance with our Recruiting Privacy Notice.
By submitting your application, you acknowledge that the information provided in your job application and any supporting documents is complete and accurate to the best of your knowledge. Any misrepresentation, omission, or falsification of information may result in disqualification from consideration for employment or, if discovered after employment begins, termination of employment.