Disney

Senior Systems Reliability Engineer

Glendale, CA, USA Full time

Job Posting Title:

Senior Systems Reliability Engineer

Req ID:

10136364

Job Description:

At Disney, we’re storytellers. We make the impossible, possible. The Walt Disney Company is a world-class entertainment and technological leader. Walt’s passion was to continuously envision new ways to move audiences around the world—a passion that remains our touchstone in an enterprise that stretches from theme parks, resorts and a cruise line to sports, news, movies and a variety of other businesses. Uniting each endeavor is a commitment to creating and delivering unforgettable experiences — and we’re constantly looking for new ways to enhance these exciting experiences.

The Enterprise Technology mission is to deliver technology solutions that align to business strategies while enabling enterprise efficiency and promoting cross-company collaborative innovation. Our group drives competitive advantage by enhancing our consumer experiences, enabling business growth, and advancing operational excellence.

Team Description:

As Systems Reliability Engineers (SREs) embedded in Walt Disney Imagineering, we apply software engineering principles to ensure our systems are highly reliable and efficient. Our responsibilities include architecting resilient platforms, developing automation solutions for deployment and operations, implementing robust monitoring and alerting strategies, and driving incident response and root cause analysis. We deeply embed in engineering teams to continuously improve system performance and reliability.

The Senior Systems Reliability Engineer is responsible for ensuring the stability, scalability, and performance of mission-critical systems that support Disney’s innovative entertainment experiences. This role bridges the gap between traditional IT and industrial control systems (ICS)—ensuring automation engineers have reliable, secure, and high-performance infrastructure and tooling for SCADA, HMI, and PLC programming. You will work collaboratively across engineering and operations to architect resilient solutions, champion best practices in reliability engineering, and drive continuous improvement of platforms and processes. As a senior technical leader, you will be a key contributor to making Disney’s technology vision a reality, ensuring that every system delivers magic with consistency and excellence.

Responsibilities Of Role:

  • Administer Windows and Linux servers supporting automation and industrial applications (e.g. Ignition, FactoryTalk, Copia, Coverity).

  • Collaborate closely with engineering and project teams to implement CI pipeline automation to streamline PLC testing.

  • Develop tools or scripts to automate documentation generation.

  • Define, measure, and monitor service-level indicators/objectives (SLIs/SLOs) and manage error budgets for critical services.

  • Manage Kubernetes clusters and Helm charts deployments for automation and monitoring applications.

  • Identify and automate manual operational processes (“toil”) within project teams to improve reliability.

  • Ensure high availability, scalability, and disaster recovery readiness for OT (Operational Technology) related systems.

Must-Haves:

  • Minimum of 5+ years in production system reliability (web, cloud, OT, or embedded)—including at least 2 years with industrial or embedded control systems.

  • Hands-on experience managing Kubernetes clusters and Helm-based deployments.

  • Understand how to install and configure operating systems, specifically with expertise in Linux and Windows Server. 

  • Software Development Continuous Integration (CI) expertise in GitLab CI or similar 

  • Experience with Source Control Management systems (Git) 

  • Experience in AWS or other cloud platform. 

  • Advanced skills in at least one programming language such as Python, PHP, Ruby, Java, Go, Swift or C++ and able to build unit test suites for all software being developed. 

  • Excellent verbal and written communication to all levels in the organization. 

  • Communication of ideas and solutions in a clear and organized manner. 

  • Clear and effective presentations to groups of people. 

  • Construction of concise and complete technical documentation. 

Nice-To-Haves:

  • Experience supporting industrial automation platforms (Ignition, FactoryTalk, Copia, etc.).

  • Experience with multiple public cloud platforms (AWS, Azure, GCP).

  • Full stack web development experience.

  • Demonstrates curiosity and continuous learning and self-improvement. 

  • Ability to influence architectural decisions and advocate for best reliability practices.

  • Skills in Datadog monitoring and alerting and instrumentation with OpenTelemetry.

  • Contributions to reliability-related open-source projects or technical communities.

Education:

Bachelor’s degree in Computer Science, Information Systems, Software, Electrical or Electronics Engineering, or comparable field of study, and/or equivalent work experience

#DISNEYTECH

The hiring range for this position in Glendale, CA is $141,900.00 to $190,300.00 per year. The base pay actually offered will take into account internal equity and also may vary depending on the candidate’s geographic region, job-related knowledge, skills, and experience among other factors. A bonus and/or long-term incentive units may be provided as part of the compensation package, in addition to the full range of medical, financial, and/or other benefits, dependent on the level and position offered.

Job Posting Segment:

Enterprise Technology

Job Posting Primary Business:

Services and Platforms

Primary Job Posting Category:

Site/System Reliability Engineer

Employment Type:

Full time

Primary City, State, Region, Postal Code:

Glendale, CA, USA

Alternate City, State, Region, Postal Code:

Date Posted:

2025-12-12