Company

Cox Automotive - USA

Job Family Group

Engineering / Product Development

Job Profile

Sr Software Engineer

Management Level

Individual Contributor

Flexible Work Option

Hybrid - Ability to work remotely part of the week

Travel %

Work Shift

Day

Compensation

Compensation includes a base salary of $99,000.00 - $165,000.00. The base salary may vary within the anticipated base pay range based on factors such as the ultimate location of the position and the selected candidate’s knowledge, skills, and abilities. Position may be eligible for additional compensation that may include an incentive program.

Job Description

The Site Reliability Engineer - Incident Response is a critical enterprise-level role responsible for accelerating incident resolution and enhancing the overall incident management process. This individual partners with engineering teams during active incidents to troubleshoot issues using monitoring and logging tools, and post-incident, delivers executive-level summaries that clearly communicate impact, root cause, and resolution. The SRE - Incident Response also plays a key role in analyzing incident response effectiveness and identifying opportunities for systemic improvements.

Core Competencies

Engineering/Tooling: Demonstrates the ability to design, build, and maintain engineering solutions and tools that enhance reliability, automate incident response, and reduce operational toil.
Incident Troubleshooting: Skilled in interpreting logs, metrics, and traces to assist in identifying root causes during live incidents.
Monitoring & Observability: Proficient in tools such as Datadog, Splunk, New Relic, or similar platforms.
AI Centric Engineering: Effectively leverages artificial intelligence (AI) and machine learning (ML) tools to automate, optimize, and enhance daily engineering and incident response tasks
Executive Communication: Ability to distill complex technical issues into concise, business-relevant summaries for senior leadership.
Analytical Rigor: Strong attention to detail in validating incident data and identifying trends or gaps in response.
DevOps & Architecture Knowledge: Understanding full-stack systems, CI/CD pipelines, caching, scaling, and cloud-native infrastructure.
Metrics & Reporting: Capable of calculating and interpreting key metrics like MTTA (Mean Time to Acknowledge) and MTTR (Mean Time to Resolve).

Key Responsibilities of This Role

Here’s how it typically looks when not tied to active on-call:

Post-Incident Review Development

Draft and deliver executive summaries post-incident
Develop and coach teams on blameless postmortems.
Create templates, train facilitators, and help guide root cause analysis (e.g., 5 Whys, fishbone diagrams).
Maintain a central library of learnings and cross-cutting themes.

Incident Process Improvement

Actively support engineering teams during incidents by helping diagnose and resolve issues quickly
Navigate and analyze data from observability platforms to make informed inferences about root causes
Analyze the effectiveness of incident response to identify systemic reliability gaps.
Standardize incident response workflows (incident roles, comms, escalation paths).
Create or refine runbooks, incident command frameworks, and severity classification guides.

Metrics and Insights

Build dashboards around incident frequency, MTTR, MTTA, and recurrence rates.
Use incident data to drive reliability of OKRs or engineering investments.

Tooling & AI Solutions

Partner with engineering teams to identify repetitive or high-impact tasks suitable for automation.
Develop, implement, and continuously improve custom scripts, bots, and AI-driven workflows for monitoring, alerting, and incident triage.
Evaluate and integrate emerging AI/ML technologies to optimize detection, root cause analysis, and reporting.
Ensure all tools and automations are secure, maintainable, and aligned with organizational standards and SRE best practices.
Document and socialize new tools and AI solutions, enabling adoption and knowledge sharing across teams.

Cross-Team Collaboration

Collaborate with Engineering Managers and Incident Commanders to gather and validate incident data
Partner with product teams, infra, and leadership to socialize reliability best practices.
Act as a reliability “consultant” to squads that have impactful incidents.
Recommend enhancements to monitoring, alerting, and response processes to reduce future incident impact

Drug Testing

To be employed in this role, you’ll need to clear a pre-employment drug test. Cox Automotive does not currently administer a pre-employment drug test for marijuana for this position. However, we are a drug-free workplace, so the possession, use or being under the influence of drugs illegal under federal or state law during work hours, on company property and/or in company vehicles is prohibited.

Benefits

The Company offers eligible employees the flexibility to take as much vacation with pay as they deem consistent with their duties, the company’s needs, and its obligations; seven paid holidays throughout the calendar year; and up to 160 hours of paid wellness annually for their own wellness or that of family members. Employees are also eligible for additional paid time off in the form of bereavement leave, time off to vote, jury duty leave, volunteer time off, military leave, and parental leave.

About Us

Through groundbreaking technology and a commitment to stellar experiences for drivers and dealers alike, Cox Automotive employees are transforming the way the world buys, owns, sells – or simply uses – cars. Cox Automotive employees get to work on iconic consumer brands like Autotrader and Kelley Blue Book and industry-leading dealer-facing companies like vAuto and Manheim, all while enjoying the people-centered atmosphere that is central to our life at Cox. Benefits of working at Cox may include health care insurance (medical, dental, vision), retirement planning (401(k)), and paid days off (sick leave, parental leave, flexible vacation/wellness days, and/or PTO). For more details on what benefits you may be offered, visit our benefits page. Cox is an Equal Employment Opportunity employer – All qualified applicants/employees will receive consideration for employment without regard to that individual’s age, race, color, religion or creed, national origin or ancestry, sex (including pregnancy), sexual orientation, gender, gender identity, physical or mental disability, veteran status, genetic information, ethnicity, citizenship, or any other characteristic protected by law. Cox provides reasonable accommodations when requested by a qualified applicant or employee with disability, unless such accommodations would cause an undue hardship.

Sr. Site Reliability Engineer - Incident Reporting

Core Competencies

Key Responsibilities of This Role

Related Jobs

Senior QA Analyst II

Production Planner II

Senior Engineer, Serdes Analog Design

Compliance Lead (Product)

Client Game Capture Artist - UFC

District Success Manager, Khan Kids