Synchrony Financial

VP, AI Reliability & Performance Architect (L12)

Hyderabad IN Full time

Job Description:

Role Title: VP, AI Reliability & Performance Architect (L12)

Company Overview:

Synchrony (NYSE: SYF) is a premier consumer financial services company delivering one of the industry’s most complete digitally enabled product suites. Our experience, expertise and scale encompass a broad spectrum of industries including digital, health and wellness, retail, telecommunications, home, auto, outdoors, pet and more.

  • We have recently been ranked #2 among India’s Best Companies to Work for by Great Place to Work. We were among the Top 50 India’s Best Workplaces in Building a Culture of Innovation by All by GPTW and Top 25 among Best Workplaces in BFSI by GPTW. We have also been recognized by Ambition Box Employee Choice Awards among the Top 20 Mid-Sized Companies, ranked #3 among Top Rated Companies for Women, and Top-Rated Financial Services Companies.

  • We provide best-in-class employee benefits and programs that cater to work-life integration and overall well-being.

  • We provide career advancement and upskilling opportunities, focusing on Advancing Diverse Talent to take up leadership roles.

Organizational Overview: 

Synchrony's Engineering Team is a dynamic and innovative team dedicated to driving technological excellence. As a member of this Team, you'll play a pivotal role in designing and developing cutting-edge tech stack and solutions that redefine industry standards.

The Credit Card that we use every day to purchase our essentials and later settle the bills - A simple process that we all are used to on a day to day basis. Now, consider the vast complexity hidden behind this seemingly simple process, operating tirelessly for millions of cardholders. The sheer volume of data processed is mind-boggling. Fortunately, advanced technology stands ready to automate and manage this constant torrent of information, ensuring smooth transactions around the clock, 365 days a year.

Our collaborative environment encourages creative problem-solving and fosters career growth. Join us to work on diverse projects, from fintech to data analytics, and contribute to shaping the future of technology. If you're passionate about engineering and innovation, Synchrony's Engineering Team is the place to be.

Role Summary/Purpose:

The VP, AI Reliability & Performance Architect is a strategic technical leader responsible for ensuring the production-grade reliability, accuracy, and performance of our AWS-based agentic AI ecosystem. This role bridges platform engineering and agent development to ensure agentic workflows are observable, secure, resilient, and performant.

Key Responsibilities:

  • System Reliability & Troubleshooting: Lead investigations of complex agent/AI workflow failures using logs, metrics, and traces (CloudWatch, X-Ray, Splunk, New Relic or similar). Run blameless post-mortems and drive preventive actions.

  • RAG & Workflow Optimization: Improve the quality and performance of Retrieval-Augmented Generation (RAG) and agent workflows by tuning retrieval, ranking/re-ranking, prompt/tooling behavior, and data access patterns across stores such as PostgreSQL/Redshift (or equivalent).

  • Comprehensive Evaluations: Establish and oversee evaluation approaches for models, RAG, and agents (automated test suites, scorecards, success criteria; e.g., LLM-as-a-judge/RAGAS concepts or equivalents) to improve fidelity and reduce regressions.

  • Security & Compliance Collaboration: Partner with InfoSec/AppSec to review architectures and ensure designs follow enterprise security patterns, identity controls (IAM, SSO/federation such as Okta/Cognito or equivalents), and data residency requirements.

  • Governance & Control: Work with Governance teams to implement and monitor guardrails and controls (e.g., model safety guardrails, policy enforcement, cost/usage controls) across the AI platform.

  • Agentic Protocols (Exposure Helpful): Contribute to or help operationalize agent interoperability/protocol patterns (A2A/ACP/AP2 or similar concepts), where applicable.

  • Architectural Stewardship: Drive “Design for Reliability” patterns across both Platform and Agent Building teams—fault tolerance, graceful degradation, load/performance testing, incident readiness, and operational excellence.

  • Stakeholder Communication: Translate reliability risks, performance trends, and operational metrics into clear business language for senior leaders, risk, and product owners.

  • Technical Mentorship: Coach DevLeads and architects on debugging agent behaviors, strengthening observability pipelines, improving orchestration, and hardening production deployments.

Required skills/Knowledge:

  • Bachelor's degree in Computer Science, Engineering, Information Systems, or related field (or equivalent experience)10–14 years of IT experience including meaningful roles in application development, platform engineering, SRE/operations, and/or architecture or in lieu of a degree 12–16 years of IT experience including meaningful roles in application development, platform engineering, SRE/operations, and/or architecture.

  • Cloud & Reliability Foundation: Strong experience operating and improving reliability of cloud-native systems (AWS preferred; comparable cloud experience acceptable), including containers/compute, networking, and security fundamentals.

  • AI/Agent Experience (Learnable): Experience supporting AI/ML systems is beneficial, but not mandatory if you demonstrate strong troubleshooting ability, systems thinking, and a proactive plan/track record of learning AI reliability patterns quickly.

  • Programming & Automation: Strong ability to script/build tooling in Python (or similar language) for reliability automation, analysis, testing, and operational workflows.

  • Observability & Incident Response: Hands-on experience with observability practices and tools (CloudWatch/X-Ray/Splunk/New Relic or similar)—dashboards, alerts, tracing, log analysis, incident response and post-mortems.

  • Infrastructure & Data: Experience with Infrastructure-as-Code (Terraform preferred; similar tools acceptable) and practical knowledge of data stores used in production systems (SQL proficiency helpful; PostgreSQL/Redshift experience a plus, but equivalents acceptable).

  • Security Mindset: Working knowledge of identity and security patterns (OAuth2, SSO/federation, IAM roles/policies/SCP concepts) and secure API/service design.

  • Leadership: Proven ability to lead through influence, drive standards/guardrails, and align multiple agile teams in a matrixed environment.

Desired Skills/knowledge:

  • Hands-on experience with AWS Bedrock/AgentCore, agent frameworks (CrewAI, LangChain, Strands SDK), or similar ecosystems.

  • Experience building automated evaluation pipelines for LLM-based agents (including multimodal evaluation).

  • Experience with LLM gateways / multi-model routing and API management tools (Kong/APIGEE).

  • Fintech/regulatory exposure and experience operating systems under strict governance/compliance constraints.

  • Familiarity with enterprise architecture frameworks (TOGAF/Zachman).

  • Familiarity with agent protocol standards (A2A, ACP, AP2) or related integration approaches.

Eligibility Criteria:

Bachelor's degree in Computer Science, Engineering, Information Systems, or related field (or equivalent experience)10–14 years of IT experience including meaningful roles in application development, platform engineering, SRE/operations, and/or architecture or in lieu of a degree 12–16 years of IT experience including meaningful roles in application development, platform engineering, SRE/operations, and/or architecture.

Work Timings: 2 PM – 11 PM IST

 This role qualifies for Enhanced Flexibility offered in Synchrony India and will require the incumbent to be available between 06:00 AM Eastern Time – 11:30 AM Eastern Time (timings are anchored to US Eastern hours and will adjust twice a year locally). This window is for meetings with India and US teams. The remaining hours will be flexible for the employee to choose. Exceptions may apply periodically due to business needs)

We are proud to offer flexibility at Synchrony. Our way of working allows you the option to work from home or workspaces in our Regional Engagement Hubs—Hyderabad, Bengaluru, Pune, Kolkata, or Delhi/NCR.
Occasionally you may be required to commute or travel to Hyderabad or one of the Regional Engagement Hubs for in person engagement activities such as business or team meetings, trainings, and culture events.

For Internal Applicants:

  • Understand the criteria or mandatory skills required for the role, before applying

  • Inform your manager and HRM before applying for any role on Workday

  • Ensure that your professional profile is updated (fields such as education, prior experience, other skills) and it is mandatory to upload your updated resume (Word or PDF format)

  • Must not be any corrective action plan (Formal/Final Formal)

  • L10+ Employees who have completed 18 months in the organization and 12 months in their current role and level are only eligible to apply for this opportunity

 

Grade/Level: 12

Job Family Group:

Information Technology