Las Vegas Sands Corp.

Director - Incident Engineering & Reliability

Dallas, Texas Full time

Job Description:

Position Overview

The primary responsibility of the Director – Incident Engineering & Reliability is to lead the teams of engineers who provide deep technical troubleshooting, root cause analysis, and on-call response for major incidents.  This team supplies the engineering muscle, escalation expertise, and continuous improvement engine that accelerates recovery and improves platform resilience.

This leader owns the enterprise on-call engineering operating model, talent depth, readiness, and the translation of incident learnings into reliability improvements.

All duties are to be performed in accordance with departmental and Las Vegas Sands Corp.’s policies, practices, and procedures. All Las Vegas Sands Corp. Team Members are expected to conduct and carry themselves in a professional manner at all times. Team Members are required to observe the company’s standards, work requirements and rules of conduct.

Essential Duties & Responsibilities

  • Engineering Response Leadership:

    • Lead teams of engineers who respond to P1/P0 outages, service disruptions, and operational failures.

    • Ensure the right SMEs are mobilized and equipped to support Major Incident Managers and Service Operations.

    • Provide technical support to the incident commander without owning the bridge or convening stakeholders.

    • Own the enterprise on-call model – rotations, runbooks, escalation paths, training, and tooling.

    • Improve burnout management, coverage, skill depth, and response readiness.

    • Develop technical playbooks, failure mode checklists, recovery patterns, and operational runbooks.

    • Maintain a “bench strength” model ensuring deep subject-matter coverage across platforms.

    • Partner with SRE and platform owners to build reliability guardrails, design patterns, and test strategies (SLOs, chaos, degradation handling).

    • Drive post-incident problem management handoffs and ensure technical RCAs feed architectural or engineering work.

    • Identify automation opportunities for detection, triage, and recovery.

    • Work with domain teams to deliver no-blame postmortems that translate into prioritized backlog items.

    • Build a pipeline of incident-capable engineers and full-stack responders.

    • Develop training, simulations, tabletop exercises, and technical onboarding for resilience roles.

  • Execution Partnership:

    • Serve as technical counterpart to the Director – Major Incident Command.

    • Support them with diagnosis clarity and explainability, without taking command of the incident lifecycle.

  • Perform job duties in a safe manner.

  • Attend work as scheduled on a consistent and regular basis.

  • Perform other related duties as assigned.

Minimum Qualifications

  • At least 21 years of age.

  • Proof of authorization to work in the United States.

  • Bachelor’s degree in Information Technology, Computer Science, or a related field (preferred).

  • Must be able to obtain and maintain any certification or license, as required by law or policy. 

  • 10+ years of experience in engineering operations, SRE, platform engineering, or infrastructure domain leadership.

  • Demonstrated experience running technical responders or on-call rotations.

  • Strong depth across cloud, infrastructure, network, virtualization, database, identity, and monitoring domains.

  • ITIL familiarity (not incident management ownership, but response integration).

  • Experience with chaos engineering, observability stacks, reliability engineering maturity programs.

  • Strong interpersonal skills with the ability to communicate effectively and interact appropriately with management, other Team Members and outside contacts of different backgrounds and levels of experience.

  • Leadership Competencies:

    • Calm, analytical, escalation-aware engineering leader.

    • Able to influence platform owners without positional authority.

    • Strong cross-domain technical translator capable of coaching engineers during high pressure.

    • Belief in blameless postmortems, automation culture, and service ownership.

  • Must be available to work varied shifts including nights, weekends, and holidays, to ensure 24/7 coverage.

  • Provide off-hours support on an infrequent, but as needed basis during critical incidents. (Potential shifts may run 24/7 due to the need of the business.) 

  • Ability to travel domestically and internationally.

  • Team Members are required to be on site within the IT Command Center.