Staff SRE Reliability Engineer
Location: New York, NY (Hybrid) / Remote
Department: Engineering
The Role
Flowcode is seeking a Staff Site Reliability Engineer (SRE) to lead reliability and infrastructure efforts across our platforms. This role will help grow and drive our infrastructure strategy, operational rigor and observability while building and supporting the systems and tooling required to support Flowcode’s continued growth.
As a technical leader within our engineering organization, you will grow and operate scalable cloud infrastructure, establish best practices around deployment and reliability, and partner closely with engineering teams to ensure systems are scalable, resilient and observable.
This role combines hands-on engineering with systems and architectural leadership. You will be a pivotal member of our engineering leadership team, leading the charge for reliability and long term infrastructure growth.
What You’ll Do
Reliability & Infrastructure Leadership
- Lead Flowcode’s site reliability engineering strategy and implementation.
- Improve system availability, scalability, and resilience across our platforms
- Drive operational best practices across our engineering teams
Cloud & Platform Engineering
- Maintain, grow and operate scalable infrastructure on our AWS platform
- Lead infrastructure best practices for scalability, failover, and disaster recovery
- Work with critical infrastructure vendors on monitoring, analysis and security.
CI/CD & Deployment Automation
- Build and maintain modern deployment and testing pipelines
- Grow and maintain our GitOps workflows using ArgoCD
- Enable safe, reliable releases through automated testing and validation
Observability & Monitoring
- Manage monitoring, logging, and alerting systems
- Improve system visibility through metrics, tracing, and logging
Technical Leadership
- Serve as a reliability and infrastructure subject matter expert across engineering
- Mentor engineers and promote best practices
- Collaborate with our engineering and data team to ensure new systems are built for reliability and scale
Qualifications
Required
- 8+ years of experience in Site Reliability Engineering, DevOps, Infrastructure Engineering, or Platform Engineering
- Hands-on experience with Kubernetes and container orchestration
- Experience building and maintaining CI/CD and deployment pipelines
Experience implementing and growing GitOps workflows and tools such as ArgoCD
- Github actions familiarity and exposure, ideally in a multiple contributor production pipeline
- Experience with observability platforms, code quality tools and common security practices
- Strong scripting or programming skills (Python, Go, or similar)
- Experience supporting high-scale distributed systems
- Experience with Infrastructure as Code (Terraform, Pulumi, or CloudFormation)
- Strong core AWS service familiarity (EKS, EC2, S3, RDS, etc)
Preferred
- Experience designing highly available and multi-region architectures
- Experience implementing progressive delivery or deployment strategies
- Experience building internal developer platform tooling
Flowcode is not for everyone. We hire with a pinhole lens — only those with the rare combination of intellectual horsepower, execution velocity, and uncompromising drive will thrive here. If you are seeking to operate at the highest levels of performance and impact, we want to meet you.
How to Apply
We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.
A successful candidate’s starting pay will be determined based on the role, job-related skills, experience, qualifications, work location, and market conditions. The current range for this role is up to $260k - $290k plus equity.