Powering the agentic revolution in travel. Sabre is an AI-native technology leader, backed by one of the world’s largest travel data clouds. Built on an open, modular, cloud-native architecture, Sabre serves as the backbone for both established leaders and bold, new disruptors, guiding them to the next age of travel retailing through intelligent, connected, and personalized experiences. With AI at its core and operating at unparalleled scale, Sabre transforms insights into innovation, empowering airlines, hoteliers, agencies and other partners to retail, distribute and fulfill travel worldwide.

About Sabre

Powering the agentic revolution in travel. Sabre is an AI‑native technology leader, backed by one of the world’s largest travel data clouds. Built on an open, modular, cloud‑native architecture, Sabre serves as the backbone for both established leaders and bold, new disruptors—guiding them to the next age of travel retailing through intelligent, connected, and personalized experiences. With AI at its core and operating at unparalleled scale, Sabre transforms insights into innovation, empowering airlines, hoteliers, agencies, and other partners to retail, distribute, and fulfil travel worldwide.

This role requires a strong blend of people leadership, stakeholder management, technical depth, and communication excellence to deliver reliable platforms and measurable business outcomes.

Team Description

The Connectivity SRE team is responsible for the reliability, availability, performance, and cost efficiency of mission‑critical connectivity platforms operating across hybrid and cloud environments (GCP: GKE/GCE). The team partners closely with Engineering, Product, Network/Infrastructure, Security, Capacity, and external vendors to ensure resilient services that support Sabre’s core business.

Role Summary

As a Site Reliability Engineering Manager, you will lead a globally distributed SRE team responsible for the reliability and operational excellence of mission‑critical connectivity platforms and applications. You will balance people leadership with operational ownership, technical oversight, and cross‑regional collaboration.

This is a hands-on leadership role focused on reliability engineering and SRE maturity, owning on‑call strategy, incident leadership, SLO/error budgets, disaster recovery readiness, observability, toil reduction, security compliance, and cost optimization, while driving cross‑functional execution and continuous improvement.

Key Responsibilities

Own production reliability for connectivity services, including SLO and error‑budget management, proactive production health monitoring, and continuous improvement.
Lead 24x7 on‑call operations and major incident response, including rotation design, escalation paths, incident leadership, and blameless post‑incident reviews.
Own operational execution and work intake, including prioritization, assignment, and tracking of work items (e.g., Jira/Rally) to ensure timely and reliable delivery.
Ensure systems are secure, compliant, and resilient, including OS/platform patching, vulnerability remediation, configuration compliance, and PCI audit readiness, in partnership with Security and Compliance teams.
Maintain disaster recovery readiness, including RTO/RPO posture, testing cadence, and remediation of identified DR gaps.
Drive SRE best practices, including observability (metrics, logs, traces), alert hygiene, automation, toil reduction, and standardized runbooks and readiness reviews.
Own production change governance, including review and approval of changes (e.g., ServiceNow), ensuring appropriate risk assessment, rollback plans, and cross‑team coordination to prevent production impact.
Collaborate with engineering teams to embed reliability by design into architectures, releases, and change management practices.
Lead, coach, and develop a globally distributed SRE team, establishing clear ownership models, supporting career growth, and fostering a culture of accountability and continuous learning.
Act as the primary SRE partner for Engineering, Product, Network/Infrastructure, Security, Capacity, and key vendors, driving cross‑functional initiatives such as modernization efforts, DR drills, observability improvements, and cost/capacity optimization.

Qualifications

Required

8-10+ years of experience in SRE, DevOps, or Infrastructure Engineering roles;
3+ years of experience as a people manager, leading engineers or SRE teams.
Proven experience running 24×7, large‑scale production systems with strong incident management and on‑call leadership.
Hands‑on experience with GCP (or other major cloud), Kubernetes/GKE, Linux, and networking fundamentals.
Strong depth in monitoring and observability (e.g., Grafana, Splunk, AppDynamics, or equivalents) and reliability governance (SLOs, error budgets).
Strong stakeholder management skills with ability to communicate clearly with senior engineering and business partners.
Bachelor’s degree in Computer Science, Engineering, or equivalent experience.

Preferred

Experience leading cloud migrations and platform modernization initiatives.
Demonstrated outcomes in cost optimization and capacity planning.
Familiarity with CI/CD pipelines and change‑risk controls in high‑availability or regulated environments.
Experience supporting security compliance and audit requirements (e.g., PCI).
Experience leading or collaborating with globally distributed teams across multiple time zones

We will give careful consideration to your application and review your details against the position criteria. You will receive separate notification as your application progresses.

Please note that only candidates who meet the minimum criteria for the role will proceed in the selection process.

#LI-Hybrid#LI-BG1

Manager Site Reliability Engineering

Related Jobs

Recreation Assistant (Swim Instructor)

Sr. Consultant

Production II

Production II

Archaeological Field Technician (On-Call)

Manager, Avionics Manufacturing Engineering