Key Duties and Responsibilities
- Operational Excellence: Own availability, performance, and resilience targets across all Retirement & Insurance production systems. Deliver measurable improvements in MTTR (Mean Time to Resolution), change success rates, and proactive issue detection.
- Vendor Ecosystem Orchestration: Govern and optimize a complex managed services portfolio ensuring accountability, cost efficiency, and service level achievement. Transform vendor relationships from transactional to strategic partnerships.
- SRE Transformation: Build the roadmap and capabilities to evolve from reactive TechOps to proactive Site Reliability Engineering practices—introducing observability, automation, error budgets, and engineering culture into operations.
- Business Continuity & Resilience: Ensure disaster recovery readiness, incident response excellence, and crisis leadership that protects TIAA's reputation and participant trust during high-stakes operational events.
This Head of Production Operations & Resiliency Services is accountable for the operational excellence, availability, and resilience of all Retirement & Insurance technology platforms serving millions of participants and managing hundreds of billions in assets. This role leads a complex ecosystem where approximately 70% of production operations are delivered through managed services partnerships, requiring exceptional vendor governance, operational discipline, and the ability to build high-performing hybrid operational models.
This leader will strengthen our operational foundation while simultaneously transforming toward Site Reliability Engineering (SRE) practices—balancing the immediate need for enterprise-grade stability with the strategic imperative to automate, instrument, and engineer reliability into our systems at scale. They will also lead operational readiness and production stability for a major core platform transformation while establishing the operational excellence framework that will define Retirement & Insurance technology for the next decade.
STRATEGIC ACCOUNTABILITY (WHAT SUCCESS LOOKS LIKE):
- Operational Excellence: Own availability, performance, and resilience targets across all Retirement & Insurance production systems. Deliver measurable improvements in MTTR (Mean Time to Resolution), change success rates, and proactive issue detection.
- Vendor Ecosystem Orchestration: Govern and optimize a complex managed services portfolio ensuring accountability, cost efficiency, and service level achievement. Transform vendor relationships from transactional to strategic partnerships.
- SRE Transformation: Build the roadmap and capabilities to evolve from reactive TechOps to proactive Site Reliability Engineering practices—introducing observability, automation, error budgets, and engineering culture into operations.
- Business Continuity & Resilience: Ensure disaster recovery readiness, incident response excellence, and crisis leadership that protects TIAA's reputation and participant trust during high-stakes operational events.
- Platform Transformation Leadership: Serve as operational anchor for major platform migrations and technology modernization initiatives, ensuring production stability throughout complex transitions.
KEY RESPONSIBILITIES:
Production Operations & Service Delivery (40%)
- Accountable for 24/7/365 production operations across Retirement & Insurance technology platforms including recordkeeping systems, participant portals, financial transaction processing, and business-critical applications
- Define and enforce Service Level Objectives (SLOs), availability targets, and operational KPIs aligned with business requirements and regulatory obligations
- Lead production change management processes ensuring disciplined risk assessment, rollback planning, and deployment coordination across development, infrastructure, and vendor teams
- Oversee capacity planning, performance optimization, and scalability management to support business growth and seasonal demand patterns
- Drive continuous improvement in operational metrics: uptime, MTTR, change success rates, proactive monitoring coverage, and automation maturity
- Partner closely with Infrastructure teams on compute, storage, network capacity planning, cloud migrations, and platform optimization initiatives to ensure production environments meet availability and performance targets
- Collaborate with Cybersecurity teams on security incident response, vulnerability remediation in production systems, security patching strategies, and embedding security controls into operational processes without compromising system availability
Managed Services & Vendor Governance (30%)
- Govern relationships with multiple managed services providers delivering infrastructure, application support, monitoring, and incident management capabilities
- Enforce vendor SLA compliance, conduct regular performance reviews, and drive accountability through data-driven scorecards and escalation frameworks
- Optimize managed services portfolio for cost efficiency while maintaining or improving service quality
- Negotiate contracts, SOWs, and operational models that align vendor incentives with TIAA business outcomes
- Build internal capabilities in areas where vendor performance gaps exist or strategic control is required
- Ensure seamless operational integration across internal teams, offshore partners, and third-party service providers
Incident Management & Crisis Leadership (15%)
- Own enterprise incident management framework including severity definitions, escalation paths, communication protocols, and executive reporting
- Serve as Incident Commander for Severity 1/2 incidents affecting Retirement & Insurance operations, orchestrating cross-functional response teams, and ensuring timely resolution
- Coordinate joint incident response with Infrastructure and Cybersecurity teams during complex outages, security events, or infrastructure failures requiring integrated troubleshooting and resolution
- Drive blameless postmortem culture focused on systemic improvement rather than individual fault-finding
- Establish and maintain disaster recovery and business continuity plans with regular testing and validation
- Partner with Enterprise Risk, Compliance, and Business Continuity teams to meet regulatory requirements and audit expectations
- Communicate operational health, risks, and incidents to executive leadership with transparency and appropriate urgency
SRE Transformation & Automation (10%)
- Build the multi-year roadmap to introduce Site Reliability Engineering practices including error budgets, SLO-based decision making, toil reduction, and automation-first culture
- Partner with development teams to embed reliability requirements earlier in the software lifecycle
- Implement observability strategy leveraging modern APM, logging, and monitoring platforms to enable proactive issue detection
- Automate repetitive operational tasks (deployments, monitoring, incident response runbooks) to improve efficiency and reduce human error
- Develop talent pipeline transitioning traditional TechOps professionals toward SRE/DevOps engineering capabilities
Leadership & Stakeholder Management (5%)
- Lead and develop a high-performing operations team blending internal staff and managed services partners
- Serve as primary operational liaison to Infrastructure leadership ensuring alignment on platform roadmaps, capacity investments, cloud strategy, and shared operational standards
- Work in lockstep with Cybersecurity leadership to balance security requirements with operational stability, jointly own security incident response procedures, and integrate security automation into production operations
- Collaborate with Technology Leaders, Product Owners, Architecture, and Business stakeholders to align operational priorities with business objectives
- Communicate complex operational issues and trade-offs to non-technical executives in clear, business-oriented language
- Foster culture of operational excellence, accountability, continuous learning, and psychological safety
- Represent Retirement & Insurance Operations in enterprise forums, steering committees, and strategic planning sessions
REQUIRED QUALIFICATIONS:
Experience
- 10+ years in large-scale production operations, infrastructure management, or site reliability engineering roles
- Minimum 7 years in leadership roles managing distributed operations teams and/or complex managed services partnerships
- Proven track record managing mission-critical systems in highly regulated industries (financial services, healthcare, insurance) with stringent availability and compliance requirements
- Demonstrated success leading operational stability during major platform migrations, data center transitions, or core system transformations
- Deep expertise in vendor management including contract negotiation, SLA enforcement, and building accountability frameworks for offshore and third-party providers
- Hands-on background as infrastructure engineer, systems administrator, or site reliability engineer providing credibility with technical teams
- Bachelor’s degree in computer science, Engineering, Information Systems, or related technical field
- Availability for 24/7 incident escalation (this is a production accountability role)
Technical Competencies
- Expert knowledge of ITIL/ITSM frameworks (Incident Management, Problem Management, Change Management) and modern SRE practices (SLOs, error budgets, observability, toil reduction)
- Strong understanding of enterprise infrastructure including compute, storage, networking, databases, middleware, and integration platforms (on-premises and cloud)
- Experience with observability and monitoring tools such as Splunk, AppDynamics, Dynatrace, Datadog, or similar APM platforms
- Familiarity with cloud operations (AWS, Azure, GCP) and hybrid cloud operational models
- Knowledge of CI/CD pipelines, automation frameworks (Ansible, Terraform), and DevOps toolchains
- Understanding of mainframe and legacy system operations alongside modern distributed architectures (experience with mainframe-to-modern migrations a plus)
- Working knowledge of disaster recovery, business continuity planning, and high-availability architectures
Leadership & Business Acumen
- Exceptional crisis leadership skills with proven ability to remain calm, decisive, and transparent during high-pressure operational incidents
- Strong vendor negotiation and influence skills driving outcomes without direct authority over third-party teams
- Excellent executive communication translating technical operational issues into business impact and risk language
- Strategic thinking balancing immediate operational stability needs with long-term transformation initiatives
- Talent development mindset coaching teams through cultural and technical transformations
- Demonstrated ability to build trust and credibility with business partners, development teams, and executive stakeholders
PREFERRED QUALIFICATIONS:
- Experience in retirement, insurance, or wealth management technology with understanding of recordkeeping, participant transactions, or financial administration systems
- Experience with AWS, Azure, and/or GCP cloud computing platforms
- Track record transforming traditional TechOps organizations toward SRE/DevOps culture and practices
- Familiarity with regulatory requirements affecting financial services technology operations (SOC2, SOX, SEC regulations)
- Experience with Agile/SAFe methodologies collaborating closely with product and development teams
- Technical certifications such as AWS Solutions Architect, ITIL Expert, or Google SRE
- Prior experience at large financial institutions or FinTech companies managing operations on a scale
- Master’s degree
Related Skills
Agile Methodology, Analytical Skills, Automation, Cloud Platforms, Configuration Management, Data Management, Infrastructure Deployment, Infrastructure Support, IT Infrastructure, Network Administration/Maintenance, Problem Solving, Programming, Project Management, Relationship Management, Technology Systems
Anticipated Posting End Date:
2026-03-26
Base Pay Range: $198,100/yr - $308,400/yr
Actual base salary may vary based upon, but not limited to, relevant experience, time in role, base salary of internal peers, prior performance, business sector, and geographic location. In addition to base salary, the competitive compensation package may include, depending on the role, participation in an incentive program linked to performance (for example, annual discretionary incentive programs, non-annual sales incentive plans, or other non-annual incentive plans).
_____________________________________________________________________________________________________
Company Overview
Every worker deserves a secure retirement. For more than 100 years, TIAA has delivered it for millions of people. Founded to help educators retire with dignity, today weʼre a market-leading retirement company fueled by world-class asset management. But weʼre not just another legacy financial services firm. Weʼre fighting harder than ever before for our clients and the many Americans who need us.
Our Culture of Impact
At TIAA, we're on a mission to build on our 100+ year legacy of delivering for our clients while evolving to meet tomorrow's challenges. We equip our associates with future-focused skills and AI tools that enable us to advance our mission. Together, we are fighting to ensure a more secure financial future for all and for generations to come. We are guided by our values: Champion Our People, Be Client Obsessed, Lead with Integrity, Own It, and Win As One. They influence every decision we make and how we work together to serve our clients every day. We thrive in a collaborative in-office environment where teams work across organizational boundaries with shared purpose, accelerating innovation and delivering meaningful results. Our workplace brings together TIAA and Nuveen's entrepreneurial spirit, where we work hard and work together to create lasting impact. Here, every associate can grow through meaningful learning experiences and development pathways—because when our people succeed, our impact on clients' lives grows stronger.
Benefits and Total Rewards
The organization is committed to making financial well-being possible for its clients, and is equally committed to the well-being of our associates. That’s why we offer a comprehensive Total Rewards package designed to make a positive difference in the lives of our associates and their loved ones. Our benefits include a superior retirement program and highly competitive health, wellness and work life offerings that can help you achieve and maintain your best possible physical, emotional and financial well-being. To learn more about your benefits, please review our Benefits Summary.
Equal Opportunity
We are an Equal Opportunity Employer. TIAA does not discriminate against any candidate or employee on the basis of age, race, color, national origin, sex, religion, veteran status, disability, sexual orientation, gender identity, or any other legally protected status.
Our full EEO & Non-Discrimination statement is on our careers home page, and you can read more about your rights and view government notices here.
Accessibility Support
TIAA offers support for those who need assistance with our online application process to provide an equal employment opportunity to all job seekers, including individuals with disabilities.
If you are a U.S. applicant and desire a reasonable accommodation to complete a job application please use one of the below options to contact our accessibility support team:
Phone: (800) 842-2755
Email: accessibility.support@tiaa.org
Drug and Smoking Policy
TIAA maintains a drug-free and smoke/free workplace.
Privacy Notices
For Applicants of TIAA, Nuveen and Affiliates residing in US (other than California), click here.
For Applicants of TIAA, Nuveen and Affiliates residing in California, please click here.
For Applicants of TIAA Global Capabilities, click here.
For Applicants of Nuveen residing in Europe and APAC, please click here.