ResMed

Director, Software Engineering

San Diego, CA, United States Full time

At Resmed, we are pioneering the future of digital healthcare by delivering reliable, performant, and resilient systems that serve millions of patients, providers, and partners worldwide. We are seeking a Principal Software Engineer, Production Engineering to lead the reliability, scalability, and operational excellence of our production systems. This role is for a deeply technical expert who thrives in high-scale distributed environments and takes ownership of ensuring systems run flawlessly in production.

You will operate at the intersection of software engineering and site reliability engineering (SRE), driving system stability, diagnosing complex issues, and building the processes and tooling that enable engineering teams to deliver resilient, high-performing products. In this deeply hands-on technical leadership role, you will partner with SRE and engineering teams across the globe to elevate production readiness, and operability.

If you are passionate about keeping complex systems running at scale, mentoring engineers to think production-first, and driving a culture of engineering rigor and continuous improvement, this role is for you.

Key Responsibilities

Production Excellence & Reliability

  • Act as a technical expert for production systems, focusing on reliability, performance, and scalability
  • Lead deep debugging and root cause analysis of complex issues in distributed systems
  • Partner with SRE & engineering teams to diagnose and resolve production incidents, reducing customer impact
  • Contribute to and improve incident response, escalation, and postmortem practices
  • Guide teams in defining and applying SLIs, SLOs, and error budgets

Distributed Systems & Architecture

  • Provide technical leadership in designing and reviewing large-scale distributed systems and microservices architectures
  • Identify systemic risks, bottlenecks, and failure modes, and drive improvements in system resilience
  • Collaborate with engineering teams to ensure systems are designed for operability and production readiness

Observability & Tooling

  • Design and implement improvements to observability (logging, metrics, tracing)
  • Build and advocate for tools that enhance debugging, monitoring, and operational insight
  • Reduce operational toil through automation and better tooling

Cloud & Infrastructure

  • Provide deep expertise across AWS, Azure, and on-premise environments
  • Support teams in optimizing infrastructure for scalability, reliability, and cost efficiency
  • Influence best practices in deployment, release, and rollback strategies

Technical Leadership & Mentorship

  • Mentor and coach engineers on production engineering and reliability practices
  • Lead by example through hands-on problem solving in critical situations
  • Raise the overall engineering bar through knowledge sharing and technical guidance

Process & Standards

  • Champion AI transformation - evolving how your teams build software, not just what they build. This means advocating for AI-assisted development, agentic workflows, and the composed product engineer model where small teams deliver outsized impact
  • Contribute to defining and evolving production engineering and SRE best practices
  • Drive adoption of consistent operational standards and practices across teams
  • Promote a culture of continuous improvement through learning and systemic fixes

Let’s Talk About You

Top 5 Skills

  • Strategic Thinker – Aligns technology and engineering outcomes with ResMed’s long-term vision and business goals.
  • Innovator – Embraces new ideas and fosters experimentation to accelerate digital transformation.
  • Technical Proficiency – Demonstrates deep expertise in modern software engineering, architecture, and cloud-native development.
  • Problem Solver – Solves the hardest problems at scale.
  • System Thinker – Understands how complex systems interact and behave.

Mindsets & Behaviors

  • Build Relationships (Collaboration): Develop strong partnerships across teams to achieve shared outcomes.
  • Develop People: Empower and enable others to perform at their best.
  • Lead Change: Drive purposeful transformation with clarity and confidence.
  • Think Critically: Apply evidence-based reasoning to solve complex challenges.
  • Communicate Clearly: Share insights transparently, concisely, and with purpose.
  • Create Accountability: Establish clear expectations and ownership for results.

Qualifications

  • Bachelor’s or Master’s degree in Computer Science, Software Engineering, or related field.
  • 10+ years of experience in software engineering, with significant focus on production systems and reliability
  • Proven expertise in debugging complex distributed systems at scale
  • Deep understanding of microservices architectures and cloud-native systems
  • Strong experience with AWS and Azure, as well as on-premise environments
  • Solid understanding of networking fundamentals (TCP/IP, DNS, load balancing, proxies, etc.)
  • Experience with observability tools (e.g. Datadog)
  • Strong programming skills in one or more languages (e.g., Java, C#, Go, Python)
  • Experience leading incident response and root cause analysis in production environments
  • Experience in healthcare, medical devices, or regulated industries where quality systems, data privacy, and compliance are not optional

We are shaping the future at ResMed, and we recognize the need to build on and broaden our existing skills and continue to attract and retain the world’s best talent. We work hard to offer holistic benefits packages, provide flexible work arrangements, cultivate a workforce culture that allows employees to grow personally and professionally, and deliver competitive salaries to our team members. Employees scheduled to work 30 or more hours per week are eligible for benefits. This position qualifies for the following benefits package: comprehensive medical, vision, dental, and life, AD&D, short-term and long-term disability insurance, sleep care management, Health Savings Account (HSA), Flexible Spending Account (FSA), commuter benefits, 401(k), Employee Stock Purchase Plan (ESPP), Employee Assistance Program (EAP), and tuition assistance. Employees accrue three weeks Paid Time Off (PTO) in their first year of employment, receive 11 paid holidays plus 3 floating days and are eligible for 14 weeks of primary caregiver or two weeks of secondary caregiver leave when welcoming new family members. Individual pay decisions are based on a variety of factors, such as the candidate’s geographic work location, relevant qualifications, work experience, and skills. At ResMed, it is not typical for an individual to be hired at or near the top of the range for their role and compensation decisions are dependent on the facts and circumstances of each case.  A reasonable estimate of the current base range for this position is: $195,000.00 - $293,000.00    USD

Joining us is more than saying “yes” to making the world a healthier place. It’s discovering a career that’s challenging, supportive and inspiring. Where a culture driven by excellence helps you not only meet your goals, but also create new ones. We focus on creating a diverse and inclusive culture, encouraging individual expression in the workplace and thrive on the innovative ideas this generates. If this sounds like the workplace for you, apply now! We commit to respond to every applicant.