At F5, we strive to bring a better digital world to life. Our teams empower organizations across the globe to create, secure, and run applications that enhance how we experience our evolving digital world. We are passionate about cybersecurity, from protecting consumers from fraud to enabling companies to focus on innovation.

Everything we do centers around people. That means we obsess over how to make the lives of our customers, and their customers, better. And it means we prioritize a diverse F5 community where each individual can thrive.

F5 is bringing a better digital world to life by helping organizations create, secure, and run applications that power our lives. Within the Platform Engineering team, this role helps ensure our platform is operated safely, reliably, and with operational excellence.

We are looking for a Senior Site Reliability Engineer that leads with kindness, and possesses a strong software development background to join our Infrastructure Engineering team. Your primary focus will be building automation, tooling, and internal platforms that enable our team to operate a global, multi-datacenter infrastructure spanning a growing number of Points of Presence across the globe.

Deep familiarity with production infrastructure -- bare-metal hypervisors, containerized workloads, Kubernetes clusters, and cloud platforms -- is essential, but your primary lens should be on automation and code. You will develop internal tools and APIs in Python and Go, design and maintain Ansible automation across hundreds of hosts, build CI/CD pipelines, and create self-service interfaces that reduce toil and eliminate manual operations.

You will work within a PCI-DSS compliant environment and participate in a 24x7 on-call rotation.

What You'll Do

Internal Tooling & Application Development

Design and develop internal tools, CLIs, and APIs (primarily in Go and Python) that enable infrastructure self-service, automate complex workflows, and improve operational efficiency
Build integrations between infrastructure systems -- connecting CMDB/IPAM (NetBox), secrets management (HashiCorp Vault), hypervisor APIs (Proxmox), monitoring platforms, and CI/CD pipelines into cohesive automated workflows
Develop and maintain API clients and libraries for interacting with infrastructure services (Proxmox API, Vault API, NetBox API, iLO Redfish, container registries)
Write well-tested, documented, and maintainable code with proper versioning, release processes, and code review practices

Infrastructure as Code & Ansible Development

Architect, develop, and refactor Ansible roles and playbooks across a large-scale inventory spanning 30+ datacenters, 80+ group variable files, and 40+ roles
Design reusable, composable Ansible role patterns that scale cleanly as the DC footprint grows -- new DCs should be deployable with minimal variable additions
Improve idempotency, error handling, and test coverage across the existing Ansible codebase
Develop custom Ansible modules, plugins, and lookup plugins where upstream modules may be insufficient (e.g., custom Vault integration, Proxmox API interactions, iLO automation)
Automate bare-metal server lifecycle end-to-end: from iLO bootstrap through OS installation, hypervisor configuration, VM provisioning, and service deployment

CI/CD Pipeline Engineering

Design, write, and maintain GitLab CI pipelines for infrastructure automation, including multi-stage deployment workflows with linting, validation, canary testing, and regional rollout
Build pipeline patterns for safe infrastructure changes: staged rollouts, automated rollback, drift detection, and change validation
Create reusable pipeline templates and shared CI components that standardise how infrastructure changes are tested and deployed
Implement automated testing for Ansible roles and infrastructure changes (molecule, ansible-lint, integration testing in ephemeral environments)

Kubernetes & Container Platform Automation

Develop automation for self-hosted Kubernetes cluster lifecycle management: provisioning, upgrades, scaling, and disaster recovery
Build and maintain container image build pipelines, registry management, and image promotion workflows
Create Kubernetes operators or controllers (in Go) where custom automation of cluster-level concerns is needed
Automate workload deployment patterns, including Helm chart development and GitOps workflows

Cloud Infrastructure Automation

Develop IaC and automation for AWS and Azure resources, integrating cloud infrastructure with on-premises systems
Build automation that spans hybrid environments -- coordinating deployments across bare-metal, virtualized, and cloud targets from a unified pipeline

Observability & Reliability Engineering

Instrument internal tools and automation with proper logging, metrics, and tracing
Build automated remediation workflows that respond to monitoring alerts and reduce mean time to recovery
Develop reporting and dashboards that provide visibility into infrastructure state, automation success rates, and toil metrics
Identify and automate away recurring operational toil; track and quantify toil reduction over time

Security & Compliance Automation

Automate PCI-DSS compliance workflows including CIS benchmark hardening, audit evidence collection, and configuration drift detection
Build automated secret rotation pipelines using HashiCorp Vault
Develop security scanning integration into CI/CD pipelines (container image scanning, infrastructure configuration validation)

What We're Looking For

5+ years of experience in an SRE, DevOps, or Infrastructure Engineering role with a strong emphasis on writing code and building automation
Proficiency in Python, with experience building CLI tools, APIs (Flask/FastAPI or equivalent), and automation frameworks
Expert-level Ansible skills: custom role development, module/plugin authorship, complex Jinja2 templating, inventory management at scale, and CI/CD integration
Solid Linux systems knowledge (RHEL/CentOS) -- you need to understand the systems you're automating at a depth that lets you debug failures and design robust automation
Experience building and maintaining CI/CD pipelines (GitLab CI preferred) for infrastructure automation, not just application builds
Production experience with self-hosted Kubernetes: cluster operations, controller/operator development, and workload automation
Practical AWS and Azure experience with an IaC mindset -- provisioning and managing cloud resources through automation, not console clicks
Experience with API-driven infrastructure management (RESTful APIs, Redfish/iLO, hypervisor APIs)
Familiarity with HashiCorp Vault or equivalent secrets management platforms, including programmatic integration
Understanding of PCI-DSS requirements as they apply to automated infrastructure management -- audit trails, change control, hardening automation
Strong software engineering fundamentals: version control workflows, code review, testing practices, documentation, and release management

Preferred

Experience with Proxmox VE API automation or similar hypervisor platform APIs (VMware vSphere, libvirt)
Familiarity with bare-metal server management automation (HPE iLO Redfish API, IPMI, or equivalent)
Working proficiency in Go, with experience building at least one of: CLI tools, APIs, Kubernetes operators/controllers, or systems-level tooling
Experience building custom Ansible modules or plugins in Python
Familiarity with NetBox (or similar CMDB/IPAM) API integration for inventory-driven automation
Experience developing Kubernetes operators using operator-sdk, kubebuilder, or controller-runtime
Background in network automation (DNS management, load balancer configuration, LDAP/directory services)
Experience operating in colocation / carrier-neutral DC environments (Equinix, Interxion, or similar)
Contributions to open-source infrastructure tooling or libraries

What You'll Need to Succeed

A software engineering mindset applied to infrastructure problems -- you think in terms of abstractions, interfaces, testability, and maintainability, not just "getting it working"
Strong opinions on code quality but pragmatism about when to ship -- this is infrastructure tooling, not a SaaS product, and the right trade-offs are different
The ability to understand complex existing systems deeply enough to automate them safely -- our Ansible codebase has evolved over years and new automation must integrate cleanly
Comfort working autonomously in a globally distributed, remote-first team across multiple time zones
Clear written communication -- you will write design documents, READMEs, and runbooks as a natural part of your development workflow
Willingness to participate in a 24x7 on-call rotation; your on-call experience will directly inform what you build next

Nice to Have

Experience with OSTree / image-based OS lifecycle automation
Familiarity with Pulp or on-premises package repository management automation
Experience building developer portals or self-service infrastructure platforms (Backstage or similar)
Background in DDoS mitigation automation or network-function virtualization

The Job Description is intended to be a general representation of the responsibilities and requirements of the job. However, the description may not be all-inclusive, and responsibilities and requirements are subject to change.

Please note that F5 only contacts candidates through F5 email address (ending with @f5.com) or auto email notification from Workday (ending with f5.com or @myworkday.com).

Equal Employment Opportunity

It is the policy of F5 to provide equal employment opportunities to all employees and employment applicants without regard to unlawful considerations of race, religion, color, national origin, sex, sexual orientation, gender identity or expression, age, sensory, physical, or mental disability, marital status, veteran or military status, genetic information, or any other classification protected by applicable local, state, or federal laws. This policy applies to all aspects of employment, including, but not limited to, hiring, job assignment, compensation, promotion, benefits, training, discipline, and termination. F5 offers a variety of reasonable accommodations for candidates. Requesting an accommodation is completely voluntary. F5 will assess the need for accommodations in the application process separately from those that may be needed to perform the job. Request by contacting accommodations@f5.com.

Senior SRE, Infrastructure & Platform

Related Jobs

Principal Product Manager

Senior Software Engineer (MSF)

Analyst - Credit Controller (Collection)

Software Consultant - Modernization

Contract Sales Operations Executive

Workplace Safety & Fire Safety Manager (FSM)