Job Description:
Job Title: Service Reliability Engineer (Production Engineering), AVP
Location: Bangalore, India
Corporate Title: AVP
Role Description
We are seeking a highly motivated and experienced AVP Site Reliability Engineer to join our Engineering team, focusing on Investment Banking Settlements. This role is uniquely positioned to act as a hands-on developer and engineer, dedicated to proactively improving the reliability, scalability, and performance of our core settlement systems, which encompass both modern Fabric-based solutions and integrated legacy components. As an SRE embedded within Production Engineering, you will be instrumental in architecting, developing, and implementing solutions that enhance our CI/CD practices, drive robust cloud adoption, and champion SRE best practices directly into our system architectures.
What we’ll offer you
As part of our flexible scheme, here are just some of the benefits that you’ll enjoy
- Best in class leave policy
- Gender neutral parental leaves
- 100% reimbursement under childcare assistance benefit (gender neutral)
- Sponsorship for Industry relevant certifications and education
- Employee Assistance Program for you and your family members
- Comprehensive Hospitalization Insurance for you and your dependents
- Accident and Term life Insurance
- Complementary Health screening for 35 yrs. and above
Your key responsibilities
- Reliability Engineering & Development:
- Proactively develop and implement engineering solutions to enhance the reliability, availability, and observability of Fixed Income Securities Settlement systems, aligning with SRE best practices and the "you built it, you run it" paradigm.
- Define and instrument Service Level Objectives (SLOs) and Service Level Indicators (SLIs), leveraging them to drive development priorities and measure customer satisfaction.
- Design, develop, and integrate reliability and resilience patterns such as auto-scaling, circuit breakers, bulk-heads, rate limiters, and retry mechanisms directly into application and infrastructure code.
- Actively contribute to the reduction of toil by developing automation and tooling for service request fulfillment, incident, and problem management, thereby improving Mean Time To Resolution (MTTR).
- Lead root-cause analysis, performance optimization initiatives, and incident response, translating insights directly into permanent code-based solutions and system improvements.
- Champion and contribute to state-of-the-art SRE best practices including GitOps, Distributed Tracing, Open Telemetry, and Chaos Engineering within the development lifecycle.
- CI/CD Engineering & Automation Development:
- Design, develop, and maintain advanced CI/CD pipelines using tools such as GitHub Actions, Ansible, to automate build, test, and deployment workflows for both GCP and Fabric-based.
- Develop robust scripts, utilities, and automation frameworks to streamline deployment processes, environment provisioning, and monitoring, reducing manual effort and improving deployment velocity.
- Cloud Engineering & Platform Development:
- Architect, develop, and implement scalable, secure, and resilient systems on Google Cloud Platform (GCP is mandatory; Azure/AWS experience is a plus).
- Develop and maintain infrastructure-as-code solutions using Terraform for managing cloud resources, ensuring consistency and repeatability across environments.
- Contribute to and implement deployment architectures, with a focus on optimizing multi-platform and distributed systems that integrate seamlessly with legacy infrastructure.
- Containerization & Orchestration Development:
- Implement and manage Docker-based solutions, including creating and maintaining Dockerfiles, Helm charts, and Kubernetes workload configurations, specifically within Google Kubernetes Engine (GKE).
- Develop and optimize microservices deployments, container clustering, and container lifecycle automation.
- Collaboration & Engineering Excellence:
- Collaborate closely with Developers, QA Engineers, other SREs, and Platform teams to ensure seamless and efficient delivery processes, embedding reliability from inception.
- Consult with Business Functional Analysts and Solution Architects to proactively embed resilience into solution design early in the development lifecycle.
- Utilize development ecosystem tools such as Bitbucket, Artifactory, Confluence, and Jira for effective collaboration and project management.
- Apply a strong understanding of SDLC models, Agile methodologies, and engineering best practices, fostering a culture of continuous improvement and shared ownership.
- Mentor junior engineers, contributing to overall team capability uplift.
Your skills and experience
- Technical Expertise (Mandatory):
- Bachelor's or Master’s degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field.
- Strong hands-on programming/scripting proficiency in at least one: Java, NodeJS/TypeScript, Python, Shell scripting – with a proven track record of developing production-grade solutions.
- Hands-on experience with automation tools: GitHub Actions.
- Leverage AI-powered tools to enhance incident detection, automate responses, and optimize reliability engineering workflows.
- Expert knowledge of containerization (Docker), orchestration (Kubernetes, specifically GKE), and packaging (Helm).
- Extensive hands-on experience with Google Cloud Platform (GCP) is mandatory.
- Hands-on experience with Terraform-based Infrastructure-as-Code (IaC) for cloud infrastructure management.
- Proven experience in setting up and developing observability, monitoring, and self-healing solutions (e.g., New Relic, Splunk, Google Cloud Operations, Ansible).
- Strong familiarity with application development, distributed systems, and multi-platform architectures, including robust integration with legacy systems.
- Deep understanding of SDLC processes, DevOps models, and cloud-native engineering practices.
- Technical Expertise (Beneficial):
- Experience with Azure/AWS.
- Experience with financial domain knowledge.
- Soft Skills:
- Strong analytical and troubleshooting skills, with a creative and solution-oriented approach to complex technical problems.
- Excellent communication skills (written and verbal English) and proven cross-team collaboration abilities, capable of driving technical discussions and consensus.
- Proactive mindset with strong ownership and initiative in solving complex engineering problems, even in stressful situations, maintaining a calm and detail-oriented approach.
- Ability to work independently in agile, fast-paced environments, actively seeking opportunities for system improvement.
- A collaborative team player mindset, with self-confidence and a passion for continuous learning and mentoring.
How we’ll support you
- Training and development to help you excel in your career
- Coaching and support from experts in your team
- A culture of continuous learning to aid progression
- A range of flexible benefits that you can tailor to suit your needs
About us and our teams
Please visit our company website for further information:
https://www.db.com/company/company.html
We strive for a culture in which we are empowered to excel together every day. This includes acting responsibly, thinking commercially, taking initiative and working collaboratively.
Together we share and celebrate the successes of our people. Together we are Deutsche Bank Group.
We welcome applications from all people and promote a positive, fair and inclusive work environment.