F5

Senior SRE, Infrastructure & Platform

Singapore Office Full time

At F5, we strive to bring a better digital world to life. Our teams empower organizations across the globe to create, secure, and run applications that enhance how we experience our evolving digital world. We are passionate about cybersecurity, from protecting consumers from fraud to enabling companies to focus on innovation. 
 

Everything we do centers around people. That means we obsess over how to make the lives of our customers, and their customers, better. And it means we prioritize a diverse F5 community where each individual can thrive.

F5 is bringing a better digital world to life by helping organizations create, secure, and run applications that power our lives. Within the Platform Engineering team, this role helps ensure our platform is operated safely, reliably, and with operational excellence.  

We are looking for a Senior Site Reliability Engineer that leads with kindness, and possesses a strong software development background to join our Infrastructure Engineering team. Your primary focus will be building automation, tooling, and internal platforms that enable our team to operate a global, multi-datacenter infrastructure spanning a growing number of Points of Presence across the globe. 

Deep familiarity with production infrastructure -- bare-metal hypervisors, containerized workloads, Kubernetes clusters, and cloud platforms -- is essential, but your primary lens should be on automation and code.  You will develop internal tools and APIs in Python and Go, design and maintain Ansible automation across hundreds of hosts, build CI/CD pipelines, and create self-service interfaces that reduce toil and eliminate manual operations. 

You will work within a PCI-DSS compliant environment and participate in a 24x7 on-call rotation. 

What You'll Do 

Internal Tooling & Application Development 

  • Design and develop internal tools, CLIs, and APIs (primarily in Go and Python) that enable infrastructure self-service, automate complex workflows, and improve operational efficiency 
  • Build integrations between infrastructure systems -- connecting CMDB/IPAM (NetBox), secrets management (HashiCorp Vault), hypervisor APIs (Proxmox), monitoring platforms, and CI/CD pipelines into cohesive automated workflows 
  • Develop and maintain API clients and libraries for interacting with infrastructure services (Proxmox API, Vault API, NetBox API, iLO Redfish, container registries) 
  • Write well-tested, documented, and maintainable code with proper versioning, release processes, and code review practices 

Infrastructure as Code & Ansible Development 

  • Architect, develop, and refactor Ansible roles and playbooks across a large-scale inventory spanning 30+ datacenters, 80+ group variable files, and 40+ roles 
  • Design reusable, composable Ansible role patterns that scale cleanly as the DC footprint grows -- new DCs should be deployable with minimal variable additions 
  • Improve idempotency, error handling, and test coverage across the existing Ansible codebase 
  • Develop custom Ansible modules, plugins, and lookup plugins where upstream modules may be insufficient (e.g., custom Vault integration, Proxmox API interactions, iLO automation) 
  • Automate bare-metal server lifecycle end-to-end: from iLO bootstrap through OS installation, hypervisor configuration, VM provisioning, and service deployment 

CI/CD Pipeline Engineering 

  • Design, write, and maintain GitLab CI pipelines for infrastructure automation, including multi-stage deployment workflows with linting, validation, canary testing, and regional rollout 
  • Build pipeline patterns for safe infrastructure changes: staged rollouts, automated rollback, drift detection, and change validation 
  • Create reusable pipeline templates and shared CI components that standardise how infrastructure changes are tested and deployed 
  • Implement automated testing for Ansible roles and infrastructure changes (molecule, ansible-lint, integration testing in ephemeral environments) 

Kubernetes & Container Platform Automation 

  • Develop automation for self-hosted Kubernetes cluster lifecycle management: provisioning, upgrades, scaling, and disaster recovery 
  • Build and maintain container image build pipelines, registry management, and image promotion workflows 
  • Create Kubernetes operators or controllers (in Go) where custom automation of cluster-level concerns is needed 
  • Automate workload deployment patterns, including Helm chart development and GitOps workflows 

Cloud Infrastructure Automation 

  • Develop IaC and automation for AWS and Azure resources, integrating cloud infrastructure with on-premises systems 
  • Build automation that spans hybrid environments -- coordinating deployments across bare-metal, virtualized, and cloud targets from a unified pipeline 

Observability & Reliability Engineering 

  • Instrument internal tools and automation with proper logging, metrics, and tracing 
  • Build automated remediation workflows that respond to monitoring alerts and reduce mean time to recovery 
  • Develop reporting and dashboards that provide visibility into infrastructure state, automation success rates, and toil metrics 
  • Identify and automate away recurring operational toil; track and quantify toil reduction over time 

Security & Compliance Automation 

  • Automate PCI-DSS compliance workflows including CIS benchmark hardening, audit evidence collection, and configuration drift detection 
  • Build automated secret rotation pipelines using HashiCorp Vault 
  • Develop security scanning integration into CI/CD pipelines (container image scanning, infrastructure configuration validation) 

What We're Looking For 

  • 5+ years of experience in an SRE, DevOps, or Infrastructure Engineering role with a strong emphasis on writing code and building automation 
  • Proficiency in Python, with experience building CLI tools, APIs (Flask/FastAPI or equivalent), and automation frameworks 
  • Expert-level Ansible skills: custom role development, module/plugin authorship, complex Jinja2 templating, inventory management at scale, and CI/CD integration 
  • Solid Linux systems knowledge (RHEL/CentOS) -- you need to understand the systems you're automating at a depth that lets you debug failures and design robust automation 
  • Experience building and maintaining CI/CD pipelines (GitLab CI preferred) for infrastructure automation, not just application builds 
  • Production experience with self-hosted Kubernetes: cluster operations, controller/operator development, and workload automation 
  • Practical AWS and Azure experience with an IaC mindset -- provisioning and managing cloud resources through automation, not console clicks 
  • Experience with API-driven infrastructure management (RESTful APIs, Redfish/iLO, hypervisor APIs) 
  • Familiarity with HashiCorp Vault or equivalent secrets management platforms, including programmatic integration 
  • Understanding of PCI-DSS requirements as they apply to automated infrastructure management -- audit trails, change control, hardening automation 
  • Strong software engineering fundamentals: version control workflows, code review, testing practices, documentation, and release management 

Preferred 

  • Experience with Proxmox VE API automation or similar hypervisor platform APIs (VMware vSphere, libvirt) 
  • Familiarity with bare-metal server management automation (HPE iLO Redfish API, IPMI, or equivalent) 
  • Working proficiency in Go, with experience building at least one of: CLI tools, APIs, Kubernetes operators/controllers, or systems-level tooling 
  • Experience building custom Ansible modules or plugins in Python 
  • Familiarity with NetBox (or similar CMDB/IPAM) API integration for inventory-driven automation 
  • Experience developing Kubernetes operators using operator-sdk, kubebuilder, or controller-runtime 
  • Background in network automation (DNS management, load balancer configuration, LDAP/directory services) 
  • Experience operating in colocation / carrier-neutral DC environments (Equinix, Interxion, or similar) 
  • Contributions to open-source infrastructure tooling or libraries 

What You'll Need to Succeed 

  • A software engineering mindset applied to infrastructure problems -- you think in terms of abstractions, interfaces, testability, and maintainability, not just "getting it working" 
  • Strong opinions on code quality but pragmatism about when to ship -- this is infrastructure tooling, not a SaaS product, and the right trade-offs are different 
  • The ability to understand complex existing systems deeply enough to automate them safely -- our Ansible codebase has evolved over years and new automation must integrate cleanly 
  • Comfort working autonomously in a globally distributed, remote-first team across multiple time zones 
  • Clear written communication -- you will write design documents, READMEs, and runbooks as a natural part of your development workflow 
  • Willingness to participate in a 24x7 on-call rotation; your on-call experience will directly inform what you build next 

Nice to Have 

  • Experience with OSTree / image-based OS lifecycle automation 
  • Familiarity with Pulp or on-premises package repository management automation 
  • Experience building developer portals or self-service infrastructure platforms (Backstage or similar) 
  • Background in DDoS mitigation automation or network-function virtualization 

The Job Description is intended to be a general representation of the responsibilities and requirements of the job. However, the description may not be all-inclusive, and responsibilities and requirements are subject to change.

Please note that F5 only contacts candidates through F5 email address (ending with @f5.com) or auto email notification from Workday (ending with f5.com or @myworkday.com).

Equal Employment Opportunity

It is the policy of F5 to provide equal employment opportunities to all employees and employment applicants without regard to unlawful considerations of race, religion, color, national origin, sex, sexual orientation, gender identity or expression, age, sensory, physical, or mental disability, marital status, veteran or military status, genetic information, or any other classification protected by applicable local, state, or federal laws. This policy applies to all aspects of employment, including, but not limited to, hiring, job assignment, compensation, promotion, benefits, training, discipline, and termination.  F5 offers a variety of reasonable accommodations for candidates. Requesting an accommodation is completely voluntary. F5 will assess the need for accommodations in the application process separately from those that may be needed to perform the job. Request by contacting accommodations@f5.com.