Job Description:

Job Title: Senior Site Reliability Engineer

Corporate Title: AVP

Location: Bangalore, India

Role Description

We are seeking a Site Reliability Engineer for Observability platforms in the Bank to enhance, scale, and modernise our enterprise observability capability.
This role focuses on owning and evolving Observability and Monitoring tools across the Bank, driving a shift towards OpenTelemetry (OTel)-based telemetry standardisation.
The successful candidate will contribute to automation, AI adoption, and observability-by-design practices to improve reliability, scalability, and developer experience.

What we’ll offer you

As part of our flexible scheme, here are just some of the benefits that you’ll enjoy,

Best in class leave policy.
Gender neutral parental leaves
100% reimbursement under childcare assistance benefit (gender neutral)
Sponsorship for Industry relevant certifications and education
Employee Assistance Program for you and your family members
Comprehensive Hospitalization Insurance for you and your dependents
Accident and Term life Insurance
Complementary Health screening for 35 yrs. and above

Your key responsibilities

Tools Reliability Governance:

Own the availability, performance, and resilience of the Observability tool stack in the Bank
Act as admin of the tool stack, ensuring platforms effectively support enterprise monitoring requirements
Drive standardisation of telemetry using OpenTelemetry (OTel) across Metrics, Events, Logs, and Traces (MELT)
Define and implement telemetry collection, enrichment, and routing strategies using OTel collectors and pipelines
Identify and implement automation and self-healing for common issues and adopt AI practices to enhance tools availability and user experience

Own Incident and Problem Management framework (severity, escalation, response and resolution):

Ensure quick incident response, containment, and service restoration
Perform deep root cause analysis and deliver permanent resolutions
Oversee major incidents and proactively identify systemic risks
Identify and eliminate audit and control risks

Align and adhere with SRE best practices:

Provide frameworks, playbooks, and automation capabilities
Conduct reliability reviews and implement and improve SLO/SLI tracking
Maintain and govern error budgets
Promote observability-by-design principles across application and platform teams

Strong SRE / production engineering experience

Expertise in SLOs, error budgets, incident governance, and modern observability practices
Experience with distributed systems, GCP, Kubernetes, Openshift
Leverage OTel-driven telemetry insights to improve reliability and proactive issue detection
Strong understanding of risk, audit, and compliance (financial services preferred)
Own and evolve the Observability platform ecosystem – ITRS Geneos, New Relic (SaaS), Netcool, Grafana (KDB), and OTel-based telemetry pipelines

Your skills and experience

Strong experience as admin of at-least 2 of the observability tools: ITRS Geneos, New Relic (SaaS), Netcool, Grafana (KDB)
Strong understanding of MELT concepts and modern Observability architectures
Hands-on experience with OpenTelemetry (OTel):
Application and infrastructure instrumentation (auto and manual)
OTel collectors, exporters, and telemetry pipelines
Integration of OTel with tools such as Grafana and New Relic
Understanding of vendor-agnostic telemetry frameworks
Hands-on experience in working on Unix servers (Windows server would be added benefit), Google Cloud, Openshift
Strong hands-on experience in any scripting language: shell, bash, python etc. Experience with ansible playbooks and terraform will be beneficial
Experience in Oracle, MSSQL database, KDB knowledge will be an added advantage

How we’ll support you

Training and development to help you excel in your career.
Coaching and support from experts in your team.
A culture of continuous learning to aid progression.
A range of flexible benefits that you can tailor to suit your needs.

About us and our teams

Please visit our company website for further information:

https://www.db.com/company/company.html

We strive for a culture in which we are empowered to excel together every day. This includes acting responsibly, thinking commercially, taking initiative and working collaboratively.

Together we share and celebrate the successes of our people. Together we are Deutsche Bank Group.

We welcome applications from all people and promote a positive, fair and inclusive work environment.

Senior Site Reliability Engineer - AVP

Job Description:

Related Jobs

Correctional Program Specialist (Discipline Hearing Officer)

Electronics Technician

IT SPECIALIST (INFOSEC/NETWORK)

HR ASST

Secretary (Associate Warden's Secretary)

Medical Supply Technician (Anesthesia)