Empowering every employee.

Our mission is to be the world's most used AI employee experience platform by changing the way frontline employees work.

At Flip, we have a clear goal: to revolutionize the world for frontline workers and give them a voice. Become a Flip Game Changer and work with an unbeatable team to ensure that all employees, no matter where they work, have access to their company's internal information. If you're ready to make an impact and shape the work lives of millions of people, then you've come to the right place!

Job Teaser

As a Site Reliability Engineer in our Platform Squad, you will be a key player in keeping Flip's infrastructure fast, resilient and ready to scale. You'll shape the reliability culture, tooling and practices that allow our engineering teams to ship with confidence - at scale and without compromising availability.

This role is perfect for an engineer who is passionate about building high-throughput, highly available systems and who wants to shape how a fast-growing SaaS platform runs in production.

What awaits you with us

Enable scaling: Further expand and optimize our cloud infrastructure on Azure and our Kubernetes clusters - designed for high throughput and highest availability - to support Flip's rapid growth across the globe.
Ensure resilience & safety: Design and implement zero-downtime deployments, rollback mechanisms and disaster-recovery strategies that keep our platform available around the clock.
Create observability: Evolve our LGTM stack (Loki, Grafana, Tempo, Mimir) to give every team the visibility they need - and use it to define and optimize our SLOs.
Automate everything: Design, develop and optimize infrastructure as code with Pulumi in Go, eliminating toil and making our platform self-service for engineering teams.
Champion reliability practices: Promote CI/CD best practices, incident management, post-mortems and developer experience across the entire engineering organization.
Shape our roadmap: Collaborate with your squad and engineering leadership to define the platform's direction - from scalable, high-throughput systems and cost optimization to security posture and compliance.

What you bring to the table

We're looking for a hands-on, product-minded engineer who is passionate about high-throughput, highly available systems - and who cares about reliability as much as velocity.

Must-Have Qualifications

You have 1–3 years of hands-on experience as a Site Reliability Engineer (SRE), Platform Engineer, DevOps Engineer, Infrastructure Engineer, Cloud Engineer, or Backend Engineer with a strong infrastructure focus.
Experience operating and scaling cloud infrastructures (Azure, GCP, AWS).
Deep knowledge of Kubernetes and container orchestration in production environments.
Hands-on experience with modern observability stacks (e.g. Prometheus, Mimir, Loki, ELK) and comfortable defining and operating SLOs and error budgets.
Solid software development skills in Go (preferred, since our IaC runs on Pulumi in Go), Python or Kotlin.
Hands-on experience with infrastructure as code (e.g. Pulumi, OpenTofu, Terraform) and configuration tooling (e.g. Ansible, Chef).
A collaborative mindset, strong communication skills and business-fluent English.
Willingness to participate in on-call rotations to ensure the reliability of our platform.

Nice-to-Have Qualifications

Experience building and operating high-throughput, highly available systems in production.
Experience with Azure Kubernetes Service (AKS) specifically.
Experience with Kubernetes Gateway API and Envoy Gateway.
Familiarity with GitOps workflows and CI/CD pipeline design.
Knowledge of service mesh technologies (e.g. Linkerd, Istio).
Experience with Kubernetes Operators (e.g. Strimzi, CNPG)
Experience with operating High-Availability PostgreSQL

What we offer you

Work mode: We’re remote-first, giving you flexibility to work from home. At the same time, we deeply value the power of in-person collaboration. Depending on the role, you’ll join occasional team events, workshops, or meetings in our Berlin or Stuttgart offices - always with plenty of notice. The exact balance will be discussed during your interview.
Work-Life-Balance: We don't want you to grow roots to your desk chair. That's why we cover the costs of your E-Gym-Wellpass membership and offer job bike leasing.
Celebrating success: Expect highly motivated and committed people in a relaxed working atmosphere.
Be part of something bigger: You actively shape Flip in your role. Along the way, you are an enabler of the rapid growth process of a young tech company and grow towards your goals, fun is guaranteed.
Happy to be a Flipster: Stay tuned for regular team events and culture days that bring us together as Flipsters.
Working abroad: At Flip you can also work abroad in the European Union. Let's talk about remote work in the interview.

At Flip, everyone is welcome - no matter what gender you identify as or how old you are. Sexual identity, origin, religion, world view and disabilities do not influence your potential job at Flip. The most important thing is that YOU fit in!

Site Reliability Engineer (m/f/d)

Empowering every employee.

Our mission is to be the world's most used AI employee experience platform by changing the way frontline employees work.

Job Teaser

What awaits you with us

What you bring to the table

Must-Have Qualifications

Nice-to-Have Qualifications

What we offer you

Related Jobs

Associate Software Engineer - Ecosystem Engineering (Raanana Office, Israel)

Sr Data Engineer - AI

Senior AWS Cloud Engineer - (US - Remote)

Embedded Software Engineer – Level 1 or 2

Senior Software Engineer – PyTorch and AI Frameworks

Software Engineer / Principal Software Engineer (AHT)