At CVS Health, we’re building a world of health around every consumer and surrounding ourselves with dedicated colleagues who are passionate about transforming health care.
As the nation’s leading health solutions company, we reach millions of Americans through our local presence, digital channels and more than 300,000 purpose-driven colleagues – caring for people where, when and how they choose in a way that is uniquely more connected, more convenient and more compassionate. And we do it all with heart, each and every day.
Position Summary
The PCW (Pharmacy & Consumer Wellness) SRE team is seeking a Staff Site Reliability Engineer (SRE) to lead the reliability, scalability, and security of our Conversational and Generative AI Platform . This platform enables deployment and orchestration of open-source Large Language Models (LLMs), supports advanced AI use cases such as model fine-tuning, retrieval-augmented generation (RAG), and multi-agent systems, and powers next-generation conversational and predictive experiences. The ideal candidate will combine deep expertise in cloud-native infrastructure, security compliance, and AI-driven systems with a proven ability to drive automation, observability, and resilience across mission-critical platforms.
Key Responsibilities:
- Architect and maintain highly available, secure, and scalable infrastructure for AI-driven applications and services.
- Automate end-to-end workflows, from infrastructure provisioning to application deployment and incident response, eliminating operational toil.
- Champion security-first principles, embedding compliance with HIPAA, PCI DSS, and ADA standards into all processes; partner with enterprise security teams to ensure governance.
- Implement observability best practices, leveraging tools like Prometheus, Grafana, and Istio to monitor system health and performance.
- Collaborate cross-functionally to troubleshoot complex platform, service, and data issues; perform root cause analysis and implement preventive measures.
- Mentor and guide engineers in SRE and DevOps best practices, fostering a culture of reliability and continuous improvement.
- Drive innovation in AI infrastructure, optimizing for GPU/TPU resource management, distributed training, and orchestration of AI workloads in Kubernetes environments.
- Support audits and compliance reviews, ensuring timely implementation of recommendations and adherence to security standards.
Required Qualifications
- 10+ years of experience in IT and Digital solution development, with a proven track record of delivering enterprise-scale systems.
- Demonstrated leadership in Site Reliability Engineering (SRE), including managing 24/7 operations and driving system resilience.
- CISSP certification with deep expertise in cloud security, network security, application security, and compliance standards (HIPAA, PCI DSS, ADA).
- Strong knowledge of cloud security architectures, networking fundamentals (DNS, WAF, DHCP, Firewalls, IP routing), and secure application design.
- Expertise in modern AI paradigms, including Generative AI, Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), and multi-agent systems.
- Hands-on experience with AI frameworks and platforms (e.g., TensorFlow, PyTorch, Hugging Face) and orchestration of AI pipelines for production environments.
- 5+ years of experience with MLOps practices, including model lifecycle management, monitoring, and continuous improvement in cloud-native environments.
- Ability to evaluate emerging AI technologies and integrate them into scalable architectures for predictive analytics, NLP, and computer vision.
- Leadership in ethical AI and governance, ensuring compliance with data privacy, bias mitigation, and responsible AI principles.
- 5+ years of experience in Kubernetes and Docker containerization, with hands-on experience in Rancher and Google Kubernetes Engine (GKE).
- 5+ year of experience on Cloud Technologies (GCP Preferred), Microservices and web APIs.
- 5+ years of experience in Implementing DevOps, GitOps, Grafana, Istio, Prometheus.
Preferred Qualifications
- Cloud Architect certification in AWS, Azure, or Google Cloud.
- Advanced experience with relational and NoSQL databases, including Oracle, PostgreSQL, MySQL, Cassandra, and MongoDB.
- Strong understanding of infrastructure environments and scalable system design.
- Exceptional problem management skills, with a proven ability to identify root causes and implement measures to prevent recurrence.
- Demonstrated success in building Site Reliability and Assurance Engineering organizations, transforming traditional service operations into modern platforms focused on automation, self-healing capabilities, and resiliency frameworks.
- In-depth knowledge of IT security best practices and compliance standards such as ADA, HIPAA, and PCI DSS.
- Proven ability to coach and mentor teams, while maintaining deep functional knowledge of supported applications and their interdependencies.
- Experience leading AI-driven initiatives, including evaluating emerging AI technologies, integrating advanced models into enterprise architectures, and driving innovation in Generative AI, LLMs, and multi-agent systems.
- Familiarity with AI infrastructure optimization, such as GPU/TPU resource management, distributed training, and orchestration of AI workloads in cloud-native environments.
- Strong understanding of ethical AI principles, governance frameworks, and compliance with data privacy and bias mitigation standards.
Education
Bachelor’s degree or equivalent in Computer Science, Computer Engineering, or equivalent years of experience
Pay Range
The typical pay range for this role is:
$118,450.00 - $236,900.00
This pay range represents the base hourly rate or base annual full-time salary for all positions in the job grade within which this position falls. The actual base salary offer will depend on a variety of factors including experience, education, geography and other relevant factors. This position is eligible for a CVS Health bonus, commission or short-term incentive program in addition to the base pay range listed above. This position also includes an award target in the company’s equity award program.
Our people fuel our future. Our teams reflect the customers, patients, members and communities we serve and we are committed to fostering a workplace where every colleague feels valued and that they belong.
Great benefits for great people
We take pride in our comprehensive and competitive mix of pay and benefits – investing in the physical, emotional and financial wellness of our colleagues and their families to help them be the healthiest they can be. In addition to our competitive wages, our great benefits include:
Affordable medical plan options, a 401(k) plan (including matching company contributions), and an employee stock purchase plan.
No-cost programs for all colleagues including wellness screenings, tobacco cessation and weight management programs, confidential counseling and financial coaching.
Benefit solutions that address the different needs and preferences of our colleagues including paid time off, flexible work schedules, family leave, dependent care resources, colleague assistance programs, tuition assistance, retiree medical access and many other benefits depending on eligibility.
For more information, visit https://jobs.cvshealth.com/us/en/benefits
We anticipate the application window for this opening will close on: 12/13/2025
Qualified applicants with arrest or conviction records will be considered for employment in accordance with all federal, state and local laws.