Job Description Summary

As a Principal Engineer in Site Reliability Engineering (SRE), you'll be a technical leader shaping the reliability, scalability, and efficiency of our cloud-based platforms. You'll architect fault-tolerant systems, champion automation to reduce toil, and mentor teams on SRE principles.

This individual contributor role is perfect for a seasoned engineer passionate about cloud ecosystems, distributed systems, and turning complex challenges into streamlined, high-impact solutions. You will define SRE best practices, drive automation, observability, incident response, performance, and collaborate with cross-functional teams (e.g. Dev, Security, Product) to ensure that the systems meet the highest standards of reliability. You will be a senior technical leader who influences architecture, leads complex projects, mentor others, and acts as a stabilizing presence during incidents.

GE Healthcare is a leading global medical technology and digital solutions innovator. Our mission is to create a world where healthcare has no limits. Unlock your ambition, turn ideas into world-changing realities, and join an organization where every voice makes a difference, and every difference builds a healthier world.

Job Description

Key Responsibilities:

Lead Platform Reliability Initiatives: Design and optimize multi-region, highly available cloud architectures using services like container orchestration, compute instances, managed databases, and object storage to achieve SLIs/SLOs and error budgets that exceed 99.99% availability.

Drive Automation and IaC: Build and maintain Infrastructure as Code (IaC) pipelines with tools like CDK, Terraform, or CloudFormation; automate deployments via CI/CD tools and serverless functions to accelerate delivery while minimizing operational overhead.

Reliability, Availability & Resilience: Establish, track and enforce SLIs, SLOs, error budgets. Ensure systems’ availability, latency, and throughput meet targets. Build strategies for redundancy, high availability, multi-AZ / multi-region failover, backups, disaster recovery

Enhance Observability and Monitoring: Implement comprehensive monitoring stacks with cloud-native metrics, open-source monitoring, and visualization tools; define alerting thresholds, conduct root cause analyses (RCAs), and optimize performance for distributed systems including message brokers, caching layers, and relational databases.

Champion Security and Compliance: Enforce cloud best practices for identity and access management, encryption, networking, and policy-as-code with tools like OPA; integrate security into CI/CD pipelines to protect sensitive data in regulated environments.

Innovate on Scalability: Evaluate and implement advanced cloud features like serverless architectures, service meshes, and autoscaling solutions to support growing user demands and reduce latency.

Operational Excellence: Participate and lead incident response for production issues and continuously improve processes to balance feature velocity with system reliability.

Cost & Performance: Monitor and optimize cloud spend, resource usage; rightsizing, discount strategies and waste elimination.

Mentor and Influence: Guide junior engineers through design reviews, incident post-mortems, and adoption of SRE practices; collaborate with stakeholders to shape cloud strategy, cost optimization, and capacity planning for enterprise-scale workloads.

Educational Qualification:

Bachelor's Degree or equivalent in Computer Science or “STEM” Majors (Science, Technology, Engineering and Math)

Technical skills:

15+ years in software engineering, site reliability engineering, or cloud platform roles, with significant exposure to AWS production systems.

Deep hands-on expertise with core cloud services including container orchestration, compute, databases, storage, monitoring, identity management, serverless, and networking.

Expert level skill in Infrastructure as Code: Terraform, CloudFormation, AWS CDK or similar.

Proficiency in programming languages like Python, Go, or Java for automation, scripting, and building tools.

Deep understanding of observability tooling: metrics, logging, distributed tracing, alerting (e.g. CloudWatch, Prometheus, Grafana, ELK, etc.).

Strong experience with incident management: debugging, performance tuning, root cause analysis.

Proven track record of cost optimization in cloud environments.

Security mindset: knowledge of AWS security services, governance, compliance standards.

Proven track record in implementing SRE practices: SLIs/SLOs, error budgets, monitoring/alerting, and incident management.

Strong communication and collaboration abilities to influence without authority and translate technical concepts to non-technical stakeholders

Good-to-Have Skills: 

Experience working with multi-cloud or hybrid cloud deployments.

Familiarity with container orchestration (Kubernetes / EKS), service meshes, serverless frameworks.

Certifications in AWS (e.g. Solutions Architect ‐ Professional, DevOps Engineer, Security Specialty).

Experience in regulated industries (finance, healthcare, government, etc.).

Inclusion and Diversity:

GE Healthcare is an Equal Opportunity Employer where inclusion matters. Employment decisions are made without regard to race, colour, religion, national or ethnic origin, sex, sexual orientation, gender identity or expression, age, disability, protected veteran status or other characteristics protected by law.

We expect all employees to live and breathe our behaviours: to act with humility and build trust; lead with transparency; deliver with focus, and drive ownership – always with unyielding integrity.

Our total rewards are designed to unlock your ambition by giving you the boost and flexibility you need to turn your ideas into world-changing realities. Our salary and benefits are everything you’d expect from an organization with global strength and scale, and you’ll be surrounded by career opportunities in a culture that fosters care, collaboration, and support.

#Everyroleisvital

#LI-SM1

#Hybrid

Additional Information

Relocation Assistance Provided: No

Principal Software Engineer - SRE

Job Description Summary

Job Description

Additional Information