Wells Fargo & Company

Senior Manager of Kubernetes Observability

IRVING, TX Full time

About the Role

We are seeking a Senior Manager of Kubernetes Observability to provide strategic leadership for the design, standardization, and scaled execution of our enterprise observability ecosystem across Kubernetes and OpenShift platforms, including Azure Kubernetes Service (AKS) and Google Kubernetes Engine (GKE). This role is responsible for ensuring a robust, unified, and automated observability platform that enables reliability, performance, and operational excellence across all clusters and workloads in hybrid and multi‑cloud environments.

As a senior technology leader, you will define the long‑term vision and operating model for metrics, logging, tracing, eventing, and monitoring standards across on‑prem, cloud‑managed, and hosted Kubernetes platforms. You will guide multiple engineering teams to execute consistently against this strategy, ensuring full instrumentation, proactive issue detection, reduced MTTR, and improved platform stability. Through strong architectural direction, organizational alignment, and focused mentorship, you will elevate engineering maturity and ensure developers and SREs have actionable insights that accelerate innovation and support enterprise growth at scale.

Key Responsibilities

Kubernetes Observability Strategy & Operating Model

  • Define the target‑state vision and multi‑year roadmap for observability across Kubernetes, OpenShift, AKS, and GKE, including metrics, logging, tracing, eventing, and alerting standards.
  • Establish a unified observability operating model that ensures consistency, scalability, and reuse across on‑prem, cloud‑managed, and multi‑cloud Kubernetes environments.
  • Define success metrics and outcomes that measure observability effectiveness, reliability improvements, and reductions in MTTR across all platforms.

Platform Architecture, Standardization & Instrumentation

  • Set architectural direction for enterprise observability platforms, tooling, and telemetry pipelines across Kubernetes, OpenShift, AKS, and GKE.
  • Establish standardized instrumentation patterns for clusters, workloads, control planes, and platform services, ensuring complete and consistent telemetry coverage regardless of Kubernetes distribution or cloud provider.
  • Drive convergence toward unified observability frameworks that abstract provider‑specific differences while preserving deep platform insight.

Automation, Telemetry Workflows & Adoption

  • Drive automation of observability onboarding and telemetry workflows across Kubernetes, AKS, and GKE to reduce manual effort and accelerate adoption.
  • Enable self‑service observability capabilities that allow developers and SREs to easily instrument, monitor, and troubleshoot workloads across cloud and on‑prem clusters.
  • Ensure observability is embedded by default into platform, infrastructure‑as‑code, and application delivery pipelines.

Reliability, Monitoring & Operational Excellence

  • Enable proactive issue detection through scalable alerting frameworks, actionable dashboards, and standardized monitoring practices across all Kubernetes platforms.
  • Improve reliability and performance visibility for workloads running on OpenShift, AKS, and GKE, reducing reliance on reactive troubleshooting.
  • Partner with SRE and operations teams to continuously improve incident response, post‑incident learning, and preventative engineering across hybrid and multi‑cloud environments.

Leadership, Organization & Cross‑Team Alignment

  • Lead, mentor, and develop engineering leaders and teams responsible for observability platform components and services.
  • Align platform, SRE, cloud, and application teams around shared observability standards and operational goals across Kubernetes, AKS, and GKE.
  • Strengthen cross‑team collaboration and engineering rigor to raise overall organizational maturity in observability and operations.

Required Qualifications

  • 6+ years of Software Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • 3+ years of management or leadership experience
  • 5+ years of experience in platform engineering, reliability engineering, or observability‑focused technical leadership roles, or equivalent demonstrated experience.
  • 6+ years of Grafana & Splunk
  • 5+ years of experience with Kubernetes observability concepts, including metrics, logging, tracing, eventing, and monitoring platforms, across OpenShift, AKS, and GKE.

Desired Qualifications

  • 6+ years of people management or senior technical leadership experience guiding multiple engineering teams.
  • Demonstrated success defining and scaling enterprise observability platforms across large, multi‑cloud Kubernetes environments.
  • Strong understanding of SRE, operational excellence, and reliability engineering practices.
  • Experience driving automation and standardization to reduce MTTR and operational toil.
  • Proven ability to influence across platform, infrastructure, cloud, and application teams.
  • Strong executive communication skills, including the ability to articulate strategy, tradeoffs, and outcomes to senior stakeholders.

Job Expectations

  • There is no Visa sponsorship available for this position.
  • There is no relocation allowance available for this position
  • This position requires working in one of the posted locations in a hybrid environment

Posting End Date: 

21 Mar 2026

*Job posting may come down early due to volume of applicants.

We Value Equal Opportunity

Wells Fargo is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other legally protected characteristic.

Employees support our focus on building strong customer relationships balanced with a strong risk mitigating and compliance-driven culture which firmly establishes those disciplines as critical to the success of our customers and company. They are accountable for execution of all applicable risk programs (Credit, Market, Financial Crimes, Operational, Regulatory Compliance), which includes effectively following and adhering to applicable Wells Fargo policies and procedures, appropriately fulfilling risk and compliance obligations, timely and effective escalation and remediation of issues, and making sound risk decisions. There is emphasis on proactive monitoring, governance, risk identification and escalation, as well as making sound risk decisions commensurate with the business unit’s risk appetite and all risk and compliance program requirements.

Candidates applying to job openings posted in Canada: Applications for employment are encouraged from all qualified candidates, including women, persons with disabilities, aboriginal peoples and visible minorities. Accommodation for applicants with disabilities is available upon request in connection with the recruitment process.

Applicants with Disabilities

To request a medical accommodation during the application or interview process, visit Disability Inclusion at Wells Fargo.

Drug and Alcohol Policy

 

Wells Fargo maintains a drug free workplace.  Please see our Drug and Alcohol Policy to learn more.

Wells Fargo Recruitment and Hiring Requirements:

a. Third-Party recordings are prohibited unless authorized by Wells Fargo.

b. Wells Fargo requires you to directly represent your own experiences during the recruiting and hiring process.