Pfizer

Manager, Site Reliability Engineer (SRE)

India - Chennai Full time

ROLE SUMMARY

The IT Operations and Global Command Center organization delivers excellence in the pursuit of breakthroughs that change patients’ lives through industry-leading infrastructure operations performance. We ensure optimal performance of global application hosting and network services that power Pfizer's business processes.  We strive to revolutionize service dependability by applying advanced analytics to drive predictive detection, identifying potential issues with our services and intervening before they disrupt our business. We place data at the heart of what we do and apply a relentless focus on continuous improvement to enable Pfizer’s business processes and patient outcomes.

We are seeking a highly-skilled and experienced Site Reliability Engineer (SRE) Hosting Operations team. This role will be accountable for the reliability, scalability, and performance of our Hosting infrastructure services for all Pfizer business units globally. This includes Server, Storage, Data Protection, Database, Middleware, HCI operations. The successful candidate will apply SRE principles to drive operational excellence, continuous improvement, and deliver tangible business outcomes that support the organization’s strategic goals.

ROLE RESPONSIBILITIES

  • We're looking for someone with strong hands-on experience in one or both of the following areas:

Enterprise Storage (e.g. Pure Storage, NetApp, SAN/NAS, FSx, cloud storage)

Data Protection (e.g. Rubrik, Avamar, Networker)

Experience with Operating Systems such as RedHat or Windows is a plus.

If you don't meet every requirement but have relevant experience in similar technologies and can demonstrate the ability to succeed in this role, we encourage you to apply.

  • Identify areas for improvement & automation opportunities by using data analysis skills and develop proactive solutions to enhance system reliability & reduce toil (manual effort).
  • Automate everything: Build and maintain IaC/CM ( Ansible etc.) and scripting (PowerShell, Bash, Python) to administer, patch, and configure Servers, Storage, Data Protection systems end‑to‑end.
  • Be comfortable leveraging AI tools and platforms to enhance operational efficiency, with a proactive mindset to rethink and transform traditional workflows through intelligent automation and innovation.
  • Stakeholder Engagement: Act as a point of contact for technical and audit queries from internal and external stakeholders, ensuring timely and accurate responses that reflect deep system understanding and compliance awareness.
  • Lead root cause analysis (RCA) events including assisting the addressing identified corrective actions and service/process improvements with an SRE mindset.
  • Ensure strong observability across Hosting infrastructure by developing effective monitoring and alerting, enabling predictive operations in partnership with the Command Center. Act as an escalation point for L2/L3 teams on complex issues, resolving tickets within SLA in coordination with clients.
  • Cross-Functional Readiness: While domain expertise is required, the role requires a flexible mindset and readiness to support adjacent domains such as Unix , Database, HCI etc. . This ensures operational continuity and resilience across Hosting SRE function.
  • Provide technical leadership for the Hosting domain during major incident response by actively participating in on-call and shift rotations, working closely with the Command Center to ensure timely resolution and maintain infrastructure reliability.

BASIC QUALIFICATIONS

  • Bachelor's degree in a technical field or equivalent practical experience
  • 5 year+ of experience in Storage/DB administration / engineering roles
  • Solid experience with at least two or more of the following technologies: Enterprise Storage, Data protection, Operating systems knowledge. . Certifications in any of the following areas. Hosting technologies (Storage, Data Protection, OS), AI, Observability or Cloud are considered a plus
  • Development skills in one of the programming languages such as : Java, Python, C/C#/C++, PowerShell, GitHub, CI/CD with Coding best practices to design, develop, and maintain tools and scripts for system monitoring, automation, and troubleshooting Hosting technologies.
  • Strong data literacy and analytical skills to interpret system metrics, identify trends, and support predictive operations
  • Knowledge of configuration management and infrastructure as code tools like Ansible, Terraform, and others.

PHYSICAL/MENTAL REQUIREMENTS

Data Literacy - the ability to analyze, interpret and use data to provide actionable insights

NON-STANDARD WORK SCHEDULE, TRAVEL OR ENVIRONMENT REQUIREMENTS

  • Occasional travel (<10%)
  • Willingness to work in split shifts (morning/evening) and participate in on-call rotations to support the 24/7 nature of the Operations environment

Work Location Assignment: Hybrid

Pfizer is an equal opportunity employer and complies with all applicable equal employment opportunity legislation in each jurisdiction in which it operates.

Information & Business Tech