We are seeking a Site Reliability Engineering Lead to head a team delivering mission-critical cloud services for a UK Public Sector client. This role combines hands-on technical expertise with leadership responsibilities, ensuring high availability, reliability, and scalability of cloud platforms. You will drive operational excellence, champion automation, and foster collaboration across cross-functional teams to deliver secure, resilient solutions.
Key Responsibilities
Team Leadership & Management
- Lead, manage and mentor a team of CloudOps engineers, ensuring performance management, career development, and engagement.
- Manage on-call rota and operational readiness for 24/7 support.
- Oversee administrative and resource planning tasks.
- Represent CloudOps in Programme Board, Architecture, Service Reviews & Client Meetings where necessary
Cloud Operations & Automation
- Design and implement Infrastructure-as-Code (IaC) solutions using tools such as Terraform and Ansible.
- Automate provisioning, configuration, and scaling of AWS cloud resources.
- Build and maintain CI/CD pipelines for infrastructure and application deployments.
Platform Reliability & Performance
- Monitor and troubleshoot cloud services to ensure uptime and rapid incident resolution.
- Optimise system performance through metrics, dashboards, and proactive tuning.
- Implement cost optimisation strategies for cloud resource usage.
Application Support
- Become familiar with the application and service to be able to provide L2 support
- Co-ordination with Service Management, Engineering & DevOps around application issues
Operational Excellence
- Develop and maintain disaster recovery and backup strategies.
- Ensure compliance with security and governance standards, including handling sensitive data (PII/PHI).
- Maintain comprehensive documentation for infrastructure and operational processes.
Collaboration & Continuous Improvement
- Partner with QA, Product, and Development teams to enhance service reliability.
- Drive initiatives to improve time-to-market, quality, and resilience of solutions.
About You
- Proven experience in CloudOps/DevOps/SRE roles (10+ years), with strong leadership capabilities.
- Skilled in cloud architecture (AWS preferred), Linux environments, and containerisation frameworks.
- Proficient in Python or similar programming languages.
- Hands-on experience with IaC tools (Terraform, Ansible) and CI/CD automation.
- Strong problem-solving skills and ability to work in fast-paced, distributed teams.
- Eligible for DBS check and UK Security Clearance.
Desirable:
- • Experience supporting client-facing systems in public sector or healthcare.
- • Familiarity with secure systems handling sensitive data.
- • Proactive mindset for identifying operational improvements.
Note: This role is not eligible for UK visa sponsorship.