Principal Site Reliability Engineer (Hospitality Solutions) at Sabre

Sabre is a technology company that powers the global travel industry. By leveraging next-generation technology, we create global technology solutions that take on the biggest opportunities and solve the most complex challenges in travel. 

Positioned at the center of the travel, we shape the future by offering innovative advancements that pave the way for a more connected and seamless ecosystem as we power mobile apps, online travel sites, airline and hotel reservation networks, travel agent terminals, and scores of other solutions.

Simply put, we connect people with moments that matter.

NOTE: TPG Capital, a global alternative asset management firm, recently acquired Hospitality Solutions. Over the coming months, Sabre is working with TPG to formally separate the Hospitality Solutions business from Sabre. It is important to understand that while you will be employed by a Sabre legal entity, your role will be to support the Hospitality Solutions business, which is now owned by TPG.

Hospitality Solutions, formerly part of Sabre Holdings, is a global leader at the forefront of hospitality technology powering over 40,000 properties across 174 countries. Celebrated for our innovative and customer-centric approach, we deliver integrated platforms for distribution, reservations, retailing, and guest experience to both renowned hotel brands and independent properties worldwide.

Hospitality Solutions is currently Looking for a Principal Site Reliability Engineer.

Job Description

The Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Hospitality Solutions services—both our internally critical and our externally-visible systems—have reliability, uptime appropriate to users' needs and a fast rate of improvement. Additionally, SRE’s will keep an ever-watchful eye on our systems capacity and performance. Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation.

On the SRE team, you’ll have the opportunity to lead & manage the complex challenges of scale, while using your expertise in coding/scripting, algorithms, troubleshooting, complexity analysis, small/large scale system design and leadership skills.

SRE's culture of diversity, intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences, and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow.

Responsibilities

An individual with deep passion for innovation, performance and leadership skills, to work in the SRE team.
This position requires you to have experience in leading small teams from a technical lead position and be comfortable delegating and people reporting.
Review software architecture for operational readiness; influence design for reliability and scalability.
Must own and operate services; manage system-level reliability and performance.
Develop reliability-focused software, implement SLOs/SLIs, and automate operational tasks.
Lead incident response, root cause analysis (RCA), and postmortem processes.
Collaborate with development teams to “shift reliability left” by embedding SRE practices early in the SDLC.
Participate in on-call rotations and mission control shifts.
Drive operational readiness for new services and features.
Engage in capacity planning, performance tuning, and cost optimization.
Provides & oversee application support by utilizing monitoring tools and diagnostic methods.
Will be responsible in providing specialized technical and product support for Hospitality Solutions hosted solutions, this includes troubleshooting and debugging complex software solutions.
Design and enhance current pipeline to extend it to a true end to end CI/CD pipeline.
Will be responsible for the planning, implementation, deployments, and measurement of Hospitality Solutions’ operational assets & portfolios.
Collaborate with the Engineering, Technology, other relevant enterprise teams in an agile environment to remove toil and facilitate successful delivery of initiatives and work items.
Must have leadership skills & expertise In ITIL, Change Management (i.e. ServiceNow) procedures to be able to drive root cause analysis for preventive actions and continually evaluate and improve the processes and application standards.
Must have the knowledge in Provisioning, Manage Cloud environments (AWS/GCP) & Lead the Cloud Migrations.
Lead and initiate in tuning the system to achieve optimum performance levels, participate in long-term strategies for scalability, stability and high availability of platforms.
Engage in day-to-day operational tasks as needed for management of monitoring and automation work.
Supports and oversee a 24x7 global environment leveraging distributed global team members.
Ability to communicate effectively to stakeholders and higher management in terms of project status, incidents, problems and root cause.
Identify & drive the opportunities for automation and implement innovative solutions to drive efficiencies, simplicities, speed and to mitigate toil.
Act as part of a SRE team that maintains extensive development and test environments using Continuous Delivery methods
Responsible in setting up, creating AWS/GCP instance, AWS/GCP deployments, administration, Operations and maintenance.

Soft Skills

Must be great team player, guide & mentor.
Excellent communication, interpersonal and leadership skills. Very good written and oral communication skills.
A go getter who loves challenges, feels empowered to execute and makes things happen.
Good problem-solving and troubleshooting skills, attention to details and data driven approach to arrive at quick solutions to situations.

Desired Skills & Experience

12+ years of experience in Site Reliability Engineering/IT Operations/DevOps
Strong knowledge and hands-on experience in Windows, UNIX/Linux platforms.
Strong knowledge on Oracle, SQL; able to troubleshoot issues.
Must have hands-on expertise in AWS/GCP platform, deployments, administration & Operations, which includes EC2, S3, Auto-scaling, LAMBDA, CloudFormation, GCP, Terraform, core concepts & services.
Must have strong Automation expertise and knowledge in any of the scripting languages, such as PERL/Shell scripting/Python. PowerShell is an added advantage.
Must have minimum 2-4yrs of development skills in at least .NET or Java. Node, PHP & Ruby are a plus.
Knowledge and implementation experience of ITIL process, procedures and ITSM.
Experience in handling multiple tasks/projects simultaneously, organizational skills.
Ability to handle multiple tasks/subprojects simultaneously, organizational skills.
Experience supporting multi-layer applications including Frontend services (API and GUI based) and Backend services.
Understanding of TCP/IP and HTTP protocols. Expertise in debugging issues using network layer knowledge. Able to capture traffic and analyze it to understand issues.
Knowledge in DevOps tools such as Jenkins/Ansible and any orchestration tools.
Knowledge in any DevOps tools such as Jenkins/Ansible and knowledge in any orchestration tools/languages like Groovy, Packer an added advantage.
Fundamental knowledge & experience in Networking, Load balancers, firewalls, TCP/IP protocols.
Hands on experience in the installation, configuration, and maintenance of Software Applications in support of business processing requirements.
Experience in Agile process is an added advantage. Both Scrum & Kanban.

Qualifications

Bachelor’s Degree in Computer Science is required and Master’s Degree in Computer Science is preferred.
Overall 12+ years of professional experience in SRE, IT Operations and DevOps.

Outstanding Benefits

Very competitive compensation
Generous Paid Time Off (25 PTO days)
4 days (one day/quarter) Volunteer Time Off (VTO)
5 days off annually for Year-End Break
We offer a comprehensive medical, dental and Wellness Program
12 weeks paid parental leave
An infrastructure that allows flexible working arrangements
Formal and informal reward, recognition and acknowledgement programs
Lots of fun and engaging employee development events

Reasonable Accommodation

Sabre is committed to working with and providing reasonable accommodation to applicants with disabilities. Applicants applying for a Sabre position with a disability who require a reasonable accommodation for any part of the application or hiring process may contact Sabre at recruiting@careers.sabre.com.

Determinations on requests for reasonable accommodation will be made on a case-by-case basis.

Affirmative Action

Sabre is an equal employment opportunity/affirmative action employer and is committed to providing employment opportunities to minorities, females, veterans and disabled individuals. EEO IS THE LAW

#LI-Hybrid#LI-TJ1