RSM

Site Reliability Engineer Senior 1

Gurugram Full time

We are the leading provider of professional services to the middle market globally, our purpose is to instill confidence in a world of change, empowering our clients and people to realize their full potential. Our exceptional people are the key to our unrivaled, inclusive culture and talent experience and our ability to be compelling to our clients. You’ll find an environment that inspires and empowers you to thrive both personally and professionally. There’s no one like you and that’s why there’s nowhere like RSM.

The Senior Platform Site Reliability Engineer ensures the reliability, scalability, and availability of NAS AI Ecosystem platforms. This role combines software engineering and operations to automate platform operations, improve observability, and maintain stable production environments for AI, data, and backend services.

Job Profile Responsibilities

  • Implement reliability engineering practices for AI and data platforms 

  • Define and monitor SLIs, SLOs, and SLAs 

  • Automate operational processes to reduce manual effort 

  • Manage monitoring, logging, and alerting systems 

  • Perform incident response and root cause analysis 

  • Improve scalability, resilience, and disaster recovery capabilities 

  • Partner with engineering teams to embed reliability into system design 

  • Maintain CI/CD pipelines and deployment strategies 

  • Ensure security and compliance across infrastructure 

  • Participate in production support and on-call rotations 

Requirements & Qualifications

Minimum Requirements

  • Experience in Site Reliability Engineering, DevOps, or Platform Engineering 

  • Proficiency in Python, Go, or Bash 

  • Experience with Azure, AWS, or GCP 

  • Hands-on experience with Docker and Kubernetes 

  • Experience with Prometheus, Grafana, Azure Monitor, or ELK 

  • Experience with Terraform, ARM, or CloudFormation 

  • Strong understanding of networking and distributed systems 

Preferred Requirements

  • Experience supporting AI/ML or data platforms 

  • Knowledge of chaos engineering and resiliency testing 

  • Cloud or Kubernetes certifications 

  • Experience with high-availability, multi-region systems 

Educational Requirements

  • Bachelor’s degree

At RSM, we offer a competitive benefits and compensation package for all our people. We offer flexibility in your schedule, empowering you to balance life’s demands, while also maintaining your ability to serve clients. Learn more about our total rewards at https://rsmus.com/careers/india.html.  

RSM does not tolerate discrimination and/or harassment based on race; colour; creed; sincerely held religious beliefs, practices or observances; sex (including pregnancy or disabilities related to nursing); gender (including gender identity and/or gender expression); sexual orientation; HIV Status; national origin; ancestry; familial or marital status; age; physical or mental disability; citizenship; political affiliation; medical condition (including family and medical leave); domestic violence victim status; past, current or prospective service in the Indian Armed Forces; Indian Armed Forces Veterans, and Indian Armed Forces Personnel status; pre-disposing genetic characteristics or any other characteristic protected under applicable provincial employment legislation.  

Accommodation for applicants with disabilities is available upon request in connection with the recruitment process and/or employment/partnership. RSM is committed to providing equal opportunity and reasonable accommodation for people with disabilities. If you require a reasonable accommodation to complete an application, interview, or otherwise participate in the recruiting process, please send us an email at careers@rsmus.com.