Doubleverify

Sr. Incident Manager

United States Full Time

About us:
DoubleVerify (DV) is the leader in digital performance solutions, improving the impression quality and audience impact of digital advertising. Built on best practices, DV solutions create value for media buyers and sellers by bringing transparency and accountability to the market, ensuring ad viewability, brand safety, fraud protection, accurate impression delivery and audience quality across campaigns to drive performance. Since 2008, DV has helped hundreds of Fortune 500 companies gain the most value out of their media spend by delivering best in class solutions across the digital ecosystem that help build a better industry.

About the role:
The Senior Incident Manager serves as the strategic and operational leader for DoubleVerify's Major Incident Management program. This role owns the end-to-end incident lifecycle—from detection and technical mitigation through business impact assessment, stakeholder communication, and post-incident improvement. The position requires both deep technical acumen and strong business judgment to minimize customer impact, protect revenue, and ensure rapid, coordinated response across Engineering, Product, Commercial, and Executive teams. This is an individual contributor role focused on incident command and program excellence.

Key Responsibilities:

Incident Response & Coordination

  • Incident Command: Lead technical and corporate response to Sev1/Sev2/Sev3 incidents, serving as single point of accountability during major disruptions

  • Stakeholder Orchestration: Mobilize cross-functional teams (Engineering, Product, Commercial, QA, TechOps) based on incident severity and business impact using established RACI frameworks

  • Real-Time Decision Making: Make rapid severity classification decisions, approve escalations, and coordinate technical remediation efforts to restore service

  • Communication Leadership: Drive timely, accurate updates to internal stakeholders and executives; coordinate external client/partner communications when business impact warrants

Process Ownership & Continuous Improvement

  • Program Management: Own and evolve the Major Incident Management (MIM) process, ensuring adherence to defined standards, SLAs, and escalation procedures across all product lines

  • Metrics & Analytics: Track, analyze, and report on incident trends, MTTR, recurring issues, and process effectiveness; present quarterly insights to leadership

  • Retrospective Leadership: Facilitate post-incident reviews within 48 hours of resolution, driving accountability for action items and lessons learned

  • Documentation & Training: Maintain runbooks, response playbooks, and communication templates; train incident managers and technical leads on MIM best practices

Business Impact Management

  • Impact Analysis Oversight: Partner with TechOps, Product, and Commercial teams to assess customer, revenue, and reputational impact

  • Risk Assessment: Translate technical incidents into business language for executive stakeholders; recommend actions based on exposure, financial impact, and client churn risk

  • Client Communication Strategy: Determine when external communication is required; collaborate with SSO, Product Marketing, and Legal to draft and approve client-facing messaging

  • Billing & Revenue Protection: Work with Commercial and Billing leads to quantify financial impact and coordinate credit/refund decisions when revenue is affected

Strategic Initiatives

  • Automation & Tooling: Drive adoption of AI-powered incident automation, including auto-triage, impact analysis tools, and intelligent alerting

  • SLO Framework Integration: Partner with SRE teams to align incident response with Service Level Objectives and error budget policies

  • Problem Management Partnership: Collaborate with QA and Problem Management teams to identify systemic issues and drive preventative fixes

  • Vendor & Partner Coordination: Manage incident communication protocols with key partners (e.g., Amazon, TikTok, YouTube) to ensure rapid escalation and resolution


Qualifications:

Experience

  • 7+ years in technical operations, site reliability engineering, DevOps, or incident management roles

  • 3+ years in a program management or incident command capacity, with direct experience leading technical incident response

  • Proven track record of managing Sev1/Sev2 incidents in high-availability, customer-facing SaaS or AdTech environments

  • Experience coordinating cross-functional response teams during business-critical outages

Technical Expertise

  • Deep understanding of distributed systems, cloud infrastructure (GCP, AWS), and modern application architectures

  • Proficiency with monitoring and observability tools (Nagios, Prometheus, Grafana, Datadog, PagerDuty)

  • Familiarity with SQL, log analysis, and data integrity validation techniques

  • Knowledge of ITIL, SRE principles, SLO/SLI frameworks, and incident response best practices

Business & Communication Skills

  • Executive communication: Ability to brief C-level stakeholders during incidents, translating technical issues into business impact and risk exposure

  • Crisis management: Calm under pressure, with strong decision-making skills in ambiguous, high-stakes situations

  • Stakeholder management: Experience working with Product, Commercial, Legal, and Marketing teams to coordinate client/partner communications

  • Documentation & presentation: Strong written and verbal communication skills; ability to create clear, concise executive summaries and retrospectives

Leadership & Collaboration

  • Cross-functional influence: Proven ability to drive alignment across Engineering, Product, and Business teams without direct reporting authority

  • Process optimization: Track record of improving incident response processes through automation, metrics, and continuous improvement

  • Cultural awareness: Ability to work effectively with globally distributed teams across multiple time zones

  • Executive presence: Serve as trusted advisor to VP and SVP-level stakeholders during critical incidents; translate technical complexity into actionable business insights

Qualifications that are nice to have:

  • ITIL Foundation or Practitioner certification

  • Experience with AI-driven operations tools (Glean, ChatOps, automated runbook execution)

  • Background in AdTech, digital media, or data-intensive platforms

  • Familiarity with SOC 2, ISO 27001, or other compliance frameworks

  • Experience building and scaling incident management programs from the ground up