Change the way the world travels

Join the GetYourGuide journey to connect people with unforgettable travel experiences around the world. Millions look to us for unique activities they can trust, and it’s all powered by our commitment to make every single journey extraordinary - including yours.

Ready to unlock your potential with a community of fellow explorers? Find your next role at our Berlin HQ or one of our local offices around the globe, from New York to Bangkok. Head to getyourguide.careers to take the first step.

Team mission

Incidents interrupt operations, drain team productivity, and erode user trust. As a member of the Operational Excellence team, you will help GetYourGuide move toward a world of fewer interruptions and higher user trust — by preventing incidents before they happen and enabling teams to resolve them faster when they do.

As we push boldly into AI-powered experiences, we don't ignore the risks that increased output velocity creates. You will be a key part of ensuring our engineering organisation moves fast with confidence, so our customers continue to have great experiences every time.

Beyond reliability, you will drive observability and cost efficiency — building the tooling, culture, and practices that make operational excellence a shared standard across all product teams.

Your mission

You will act as an "engineer for the engineers" — partnering with product teams to raise the bar on reliability, speed, and confidence in their systems.

Incident management & reliability

Drive down incident frequency, MTTD and MTTR
Lead post-incident reviews and translate learnings into systemic improvements
Build tooling and runbooks that enable teams to diagnose and resolve production issues faster
Champion a culture of blameless incident handling and continuous improvement
Participate in the infrastructure on-call rotation

Observability & production confidence

Advance our Datadog-based observability practice — metrics, logs, traces, dashboards, and alerting
Ensure teams have meaningful SLOs and actionable alerts — not alert fatigue
Enable production debugging capabilities so engineers can triage issues without needing a specialist

Change confidence & release quality

Improve change failure rate by helping teams invest in the right automated test coverage and pre-production validation
Identify blast radius risks and architectural gaps in evolving systems, and drive mitigation
Reduce the cost and risk of deployments through better tooling, feature flagging, and progressive rollout practices

Platform enablement

Design and maintain paved paths - well - documented golden paths for development, observability, testing, and incident response so product teams can do the right things by default
Work hands-on with product teams using Java and React to help them improve system design, testability, and operational hygiene
Leverage Kubernetes, AWS, and Istio expertise to guide teams on infrastructure best practices
Identify cost optimisation opportunities and drive efficiency improvements across services
Leverage AI tooling to accelerate incident response, improve developer workflows, and scale operational practices

Your toolkit

Deep understanding of observability tooling — we use Datadog (metrics, APM, logs, dashboards)
Proven experience reducing MTTD, MTTR and change failure rate; DORA metrics are not just acronyms to you
Strong coding skills in Java; comfortable reading and contributing in Go across infrastructure contexts; enough frontend context to collaborate with React / Vue teams
Experience with Kubernetes, AWS, and service mesh technologies (Istio/Envoy)
Solid understanding of distributed systems, networking, and container technology
Hands-on experience with CI/CD, automated testing strategies, and build systems
Ability to influence engineers and teams without direct authority — you raise standards by coaching, not dictating
Excellent written and verbal communication skills in English
Positive, proactive team player who is passionate about operational excellence and helps others deliver

Extras that give you an edge

You have led company-wide initiatives to measurably improve DORA metrics — specifically MTTD, MTTR and change failure rate
Identified systemic gaps in automated testing and driven improvements that led to meaningful reductions in change failure rate and production incidents
You have embedded operational excellence practices into the culture of product engineering teams, not just platform teams
You have driven meaningful cost-reduction outcomes through architectural or operational improvements

How we’ll make your career journey extraordinary

Annual personal growth budget and mentorship programs for continuous learning and development
Work from anywhere in the world for 30 days per year
A hybrid working approach with three days of in office collaboration (Mon, Tue, Thur) and two days of optional at home focus time.
Opportunities to collaborate and socialize with team members through quarterly team events and yearly company-wide events
Monthly transportation and fitness budget
Discounts for you, your friends, and family on GetYourGuide activities
Language reimbursement program
Health and wellness benefits

And more…

How to apply

Submit your CV/resume in English using the form below. For tips and insights into our hiring process and culture, check out ‘how we hire’ and ‘life at GetYourGuide’. If you have any further questions, please don’t hesitate to get in touch at jobs@getyourguide.com.

We’re an equal opportunities employer

Our commitment is that every qualified person will be evaluated according to their skills regardless of age, gender identity, ethnicity, sexual orientation, disability status, or religion. Please refrain from including your picture and age with your application.

#LI-Hybrid

Senior Engineer, Operational Excellence

Change the way the world travels

Team mission

Your mission

Your toolkit

Extras that give you an edge

How we’ll make your career journey extraordinary

How to apply

We’re an equal opportunities employer

Related Jobs

Supervisory Information Technology Specialist

Facilities and Space Management Specialist

Program Analyst

MANAGEMENT ANALYST

SURFACE MAINTENANCE MANAGER

Technical Surveillance Specialist