At Roche you can show up as yourself, embraced for the unique qualities you bring. Our culture encourages personal expression, open dialogue, and genuine connections, where you are valued, accepted and respected for who you are, allowing you to thrive both personally and professionally. This is how we aim to prevent, stop and cure diseases and ensure everyone has access to healthcare today and for generations to come. Join Roche, where every voice matters.
Roche Digital Technology (RDT) is where innovation meets purpose. As a global team at the heart of Roche, we are a community of business-minded technologists committed to help shape tomorrow’s digital future of healthcare. Our mission is to power Roche through cutting-edge digital technologies, harnessing the potential of artificial intelligence, data, and scalable tech innovations. Driven by purpose and passion, we’re building a future where digital is a core strength across all of Roche, enabling smarter ways of working, unlocking human potential, and driving breakthroughs that truly matter for millions of patients around the world.
The Head of Digital Operations & Reliability is a critical, senior leadership role responsible for the resilience, performance, and stability of our global digital ecosystem. Reporting to the Global Head of Enterprise Platforms, Infrastructure & Engineering, this leader is mandated to drive a fundamental cultural and technical shift from reactive support to proactive, engineering-driven operations. This position ensures our foundational technology platforms provide the confidence and agility necessary for all product teams to deliver transformative solutions.
Ensure Consistent Global Service Availability: Directs the global 24/7 operations function, guaranteeing continuous service delivery through expert incident management, rapid problem resolution, and maintaining the health of all mission-critical production systems.
Define Site Reliability Engineering (SRE) Strategy: Champions and embeds SRE principles globally, partnering with engineering teams to establish meaningful Service Level Objectives (SLOs), manage error budgets, and foster a culture of shared reliability ownership.
Future-Proof Observability Platforms: Owns the strategic vision, investment, and execution roadmap for the enterprise observability stack (monitoring, logging, tracing, alerting), ensuring teams utilize best-in-class tools for real-time operational insights.
Drive Predictive Operational Intelligence: Develops and executes a strategy to harness Artificial Intelligence (AI) and Machine Learning (ML) for predictive operations, actively preventing incidents and optimizing performance before they impact the business.
Ensure GxP-Compliant Change Governance: Oversees all operational change management processes, ensuring that changes to regulated GxP systems are executed in a controlled, safe, and fully compliant manner.
Cultivate a Reliability-First Culture: Acts as the primary agent for cultural change, driving the shift from reactive 'firefighting' to proactive, engineering-driven operational excellence across the entire technology organization.
Influence Strategic Stakeholders: Negotiates and influences senior technology and business leaders across the organization to align reliability strategy with enterprise priorities and effectively communicate operational performance status.
Leadership and Strategic Capabilities
Calm Leadership Under Pressure: Exceptional ability to lead and make critical decisions during major incidents, maintaining composure and providing clear direction to global teams.
Powerful Agent for Change: A proven capability to influence, mentor, and inspire senior leaders and engineering teams, driving sustained cultural adoption of new, reliable ways of working.
Strategic Direction: Contributes to the development of departmental and functional strategy by reflecting internal and external standards, and aligning operational execution with long-term business goals.
Technical and Domain Expertise
SRE and Cloud Mastery: Deep, pragmatic expertise in SRE principles and practices, coupled with hands-on experience managing and scaling large-scale applications and infrastructure within public cloud environments (e.g., AWS, Azure).
Regulated Environment Experience (Mandatory): Proven background in leading global technology operations within a GxP-regulated environment, with meticulous attention to compliance, control, and audit readiness.
Modern Observability: Practical experience owning and leveraging modern observability platforms to drive actionable insights and systemic reliability improvements.
Education & Experience
Minimum of 10+ years of progressive leadership experience in technology operations, ideally leading a global Site Reliability Engineering organization.
A Bachelor’s degree in Computer Science, Engineering, or a related technical field is required.
A healthier future drives us to innovate. Together, more than 100’000 employees across the globe are dedicated to advance science, ensuring everyone has access to healthcare today and for generations to come. Our efforts result in more than 26 million people treated with our medicines and over 30 billion tests conducted using our Diagnostics products. We empower each other to explore new possibilities, foster creativity, and keep our ambitions high, so we can deliver life-changing healthcare solutions that make a global impact.
Let’s build a healthier future, together.
Roche is an Equal Opportunity Employer.