Job Posting Title:
Lead Software Engineer - AI Operations and ToolingReq ID:
10137409Job Description:
Disney Entertainment and ESPN Product & Technology
Technology is at the heart of Disney’s past, present, and future. Disney Entertainment and ESPN Product & Technology is a global organization of engineers, product developers, designers, technologists, data scientists, and more – all working to build and advance the technological backbone for Disney’s media business globally.
The team marries technology with creativity to build world-class products, enhance storytelling, and drive velocity, innovation, and scalability for our businesses. We are Storytellers and Innovators. Creators and Builders. Entertainers and Engineers. We work with every part of The Walt Disney Company’s media portfolio to advance the technological foundation and consumer media touch points serving millions of people around the world.
Here are a few reasons why we think you’d love working here:
Building the future of Disney’s media: Our Technologists are designing and building the products and platforms that will power our media, advertising, and distribution businesses for years to come. Reach, Scale & Impact: More than ever, Disney’s technology and products serve as a signature doorway for fans' connections with the company’s brands and stories. Disney+. Hulu. ESPN. ABC. ABC News…and many more. These products and brands – and the unmatched stories, storytellers, and events they carry – matter to millions of people globally. Innovation: We develop and implement groundbreaking products and techniques that shape industry norms, and solve complex and distinctive technical problems.
Ad Platforms is responsible for Disney’s industry-leading ad technology and products – driving advertising performance, innovation, and value in Disney’s sports, news, and entertainment content, across all media platforms.
Job Summary:
We are hiring a Lead Engineer to establish and guide our AI Operations and Tooling practice, enabling the safe, reliable, and cost-efficient operation of AI applications across AWS, Azure, and GCP. This role is focused on enabling AI-specific operations, such as hallucination testing, A/B evaluation, guardrail enforcement, and cost optimization, by leveraging, extending, and building around existing tools and platforms to accelerate operational stability and performance.
As a hands-on technical lead, you will mentor engineers, design operational enablement frameworks, and partner closely with AI engineering, product teams. The goal is not to own every tool, but to make AI systems more observable, testable, and resilient by enabling the right capabilities and automation around them. This role will deliver measurable business outcomes by preventing runaway spend, improving reliability, and driving efficiency in AI/cloud usage.
Responsibilities and Duties of the Role:
Operational Architecture & Enablement
Define frameworks for AI-specific operations: hallucination/quality testing, evaluation pipelines, and continuous validation.
Establish reference patterns for scaling LLM services, prompt orchestration, and multi-agent workloads.
Build automation for safe rollout, monitoring, and incident response.
Observability, Reliability & Cost Management
Implement end-to-end observability: latency, drift, failure modes, hallucination rates, and GPU/compute utilization.
Drive cost optimization and efficiency across AI cloud usage (AWS, Azure, GCP).
Define SLOs, dashboards, and runbooks for AI/LLM production systems.
Governance, Guardrails & Security
Embed compliance, safety checks, and prompt-injection defenses into operational frameworks.
Partner with security and governance teams to enforce enterprise-grade auditability and policy enforcement.
Leadership & Cross-Team Collaboration
Mentor engineers in DevOps, infra, and AI operations.
Drive adoption of best practices for AI reliability, test automation, and incident management.
Collaborate across AI Core, Data Foundations, Security, and Product teams to ensure operational safety and scale.
Basic Qualifications
Bachelor’s degree in Computer Science, Engineering, or related technical field (Master’s preferred), or equivalent experience.
7+ years of experience in software engineering, DevOps, or infrastructure, with at least 2 years in a lead role.
Expert in at least one foundational language (Python, Java, or Go) with production-grade system experience.
Hands-on experience with cloud-native infrastructure (AWS preferred; Azure/GCP a plus) and modern orchestration platforms
Proven experience with observability stacks (Datadog, Prometheus, Grafana) and incident response automation.
Familiarity with AI/LLM APIs (OpenAI, Anthropic, Bedrock, Azure AI Foundry) and orchestration frameworks (LangChain, LangGraph).
Strong knowledge of operational AI testing (A/B evaluation, regression, red-teaming) and guardrail enforcement.
Demonstrated ability to optimize cloud/GPU usage and manage costs at scale.
Excellent communication skills and proven ability to lead design reviews, mentor engineers, and influence cross-functional teams.
Preferred Qualifications
Experience with AI-focused evaluation frameworks (LangSmith, PromptLayer, etc.).
Prior work in AI operations, SRE, or ML platform DevOps roles.
Knowledge of multi-agent orchestration patterns and operational reliability for AI systems.
Strong background in test automation and continuous validation for distributed systems.
Skilled at incident review (RCA) and driving operational excellence across large-scale environments.
#disneytech
Job Posting Segment:
Ad PlatformsJob Posting Primary Business:
AP - Software EngineeringPrimary Job Posting Category:
Software EngineerEmployment Type:
Full timePrimary City, State, Region, Postal Code:
Glendale, CA, USAAlternate City, State, Region, Postal Code:
USA - CA - Market StDate Posted:
2025-12-15