Senior Application and Platform Engineer
Shift Pattern:
Standard 40 Hour Week (United Kingdom)
Scheduled Weekly Hours:
40
Corporate Grade:
D - Assistant Vice President
Reporting Line:
(UK Division) Information Technology
Location:
UK-London
Worker Type:
Permanent
Overall, Purpose of Role
Deliver Level 2 and Level 3 technical support with a strong focus on reliability, resilience, and automation for the LME’s Middle Office, Back Office, and Market Data mission-critical applications. This role blends traditional application support with SRE principles and platform engineering practices to ensure stability, scalability, and continuous improvement across systems serving internal teams and external clients.
Core Responsibilities
- Reliability Engineering: Embed SRE best practices into operational workflows, including error budgets, SLIs/SLOs, and proactive monitoring to improve system uptime and performance.
- Design, Build, migrate, support, optimise and manage our physical, virtual, containerised Openshift and Kubernetes environments for scalability, resilience, and operational efficiency, with a focus on a ‘Five nines’ operational availability.
- Deliver technical support within a project-based framework to ensure successful application rollouts.
- Support project delivery across Waterfall and Agile frameworks, with an emphasis on Hybrid approaches to ensure both flexibility and efficiency.
- Prioritise and resolve incidents across the full application suite, ensuring rapid recovery and root cause analysis.
- Identify and implement service improvements with measurable outcomes, focusing on automation and reducing TOIL.
- Manage day-to-day production incidents and validate changes through QA and automated testing pipelines.
- Troubleshoot issues across network, database, infrastructure, and application layers.
- Actively contribute to incident, change, and problem management processes.
Key Accountabilities:
- Provide support and maintenance for mission-critical applications across Pre- and Post-Trade business units.
- Support delivery of regulatory, market growth, infrastructure, and security projects.
- Monitor and optimise application performance using observability tools and proactive tuning.
- Maintain and support both test and production environments for stability and readiness.
- Champion automation, CI/CD, and self-healing systems to reduce manual intervention.
- Oversee end-to-end release management, ensuring smooth deployments with minimal risk.
- Drive continuous improvement by evaluating and enhancing support processes.
- Maintain up-to-date documentation for all supported systems and platforms.
- Lead operational resiliency exercises, including disaster recovery and chaos engineering tests.
- Identify, manage, and remediate security vulnerabilities across systems and applications.
Technical Responsibilities
- Maintain and regularly test disaster recovery procedures.
- Recommend and implement standards to enhance environment efficiency and resilience.
- Validate system builds against operational and reliability requirements.
- Respond promptly to production issues, ensuring resolution and stakeholder communication.
- Support 24/7/365 system availability for production systems, this will incorporate working flexible shift patterns (07:00–16:00 and 10:00–19:00), including participation in on-call and weekend rota to cover out of hours, alongside the HK team.
- Participate in on-call and weekend rota for out-of-hours coverage.
- CI/CD Pipeline Management: Design, implement, and maintain pipelines using Bamboo alongside BitBucket
- Infrastructure as Code (IaC): Champion IaC and help to build and manage our application and environment releases via Ansible tower.
- Platform Management and Availability: Build, migrate, support, optimise and manage our physical, virtual, containerised Openshift and Kubernetes environments for scalability, resilience, and operational efficiency, with a focus on a ‘Five nines’ operational availability.
- Monitoring & Observability: Design, implement, and maintain observability stacks with a primary focus on Grafana and Prometheus for real-time metrics and dashboards, complemented by Splunk for log analytics and incident investigation. Define and track SLIs/SLOs to ensure reliability and performance across platforms.
- Implement and “plug in” offensive monitoring rules within the Grafana stack to anticipate and predict potential system failures or performance degradation, with a view to enabling early intervention and improved service resilience.
- Automation: Automate repetitive tasks using Python, Bash, or PowerShell.
Working with others:
- Internal production teams
- Business stakeholders
- Project teams
- Risk, Security, and GRC teams
- External vendors and auditors
PERSON SPECIFICATION:
Qualifications
Degree in Computer Science or a related discipline OR 5+ years equivalent professional experience
Preferred Experience
- Strong SQL and database expertise (MySQL, Oracle, Liquibase).
- Experience with CI/CD, IaC, containerisation, and orchestration tools.
- Strong experience building and maintaining code in repository managers like Bitbucket/GitLab
- Proficiency in monitoring and observability platforms.
- Familiarity with DevSecOps practices and vulnerability scanning.
- SRE and Agile/Lean methodologies.
- 3–5 years in financial markets or mission-critical environments.
- Solid understanding of networking, infrastructure, and cloud technologies.
- Experience supporting .NET, Java, and microservices-based applications. Alongside Active Directory, DNS, IPA, FTP/sFTP, SSL, networking/firewalls
- Scripting languages: Python, Perl, PowerShell, Bash, YAML, JScript
Desirable
- ITIL Foundation Certification v3/v4.
- ServiceNow experience.
- Swift Payment systems knowledge.
Skills
- Promotes a culture of reliability, automation, and continuous improvement.
- Strong incident communication and stakeholder engagement.
- Ability to mentor and lead technical discussions.
- Hands-on experience with Kubernetes, OpenShift, and observability stacks.
- Skilled in scripting, automation, and legacy system support.
Personal Qualities:
- Delivers high-quality results under pressure and tight deadlines.
- Strong problem-solving skills with an innovative, curious mindset.
- Clear communicator and collaborative team player, promoting knowledge sharing.
- Organised, proactive, and detail-oriented with strong planning skills.
- Ownership mindset with a passion for automation and continuous improvement.
- Excellent knowledge of Agile practices and CI/CD tools (Jira, Confluence).
- Supports a culture of learning through blameless post-mortems and experimentation.
The LME is committed to creating a diverse environment and is proud to be an equal opportunity employer. In recruiting for our teams, we welcome the unique contributions that you can bring in terms of education, ethnicity, race, sex, gender identity and expression, nation of origin, age, languages spoken, colour, religion, disability, sexual orientation and beliefs. In doing so, we want every LME employee to feel our commitment to showing respect for all and encouraging open collaboration and communication.