RELX

Senior Site Reliability Engineer I

California Full time

About the Business:

LexisNexis Risk Solutions is the essential partner in the assessment of risk. Within our Business Services vertical, we offer a multitude of solutions focused on helping businesses of all sizes drive higher revenue growth, maximize operational efficiencies, and improve customer experience. Our solutions help our customers solve difficult problems in the areas of Anti-Money Laundering/Counter Terrorist Financing, Identity Authentication & Verification, Fraud and Credit Risk mitigation and Customer Data Management. You can learn more about LexisNexis Risk at the link below, https://risk.lexisnexis.com

Role Definition

This is a developed professional level role for an SRE. Individuals are responsible for challenging reliability and toil reduction projects. At this level, SREs have hands-on experience across most SRE practices. They have a good understanding of how to observe distributed systems and their dependencies, and how to automate recovery to protect service levels. SREs are on-call and assist others during incidents. They contribute to process improvements through experience and knowledge.  Individuals in this role provide informal guidance to junior staff. 

 

Scope and Key Responsibilities

  • Creates monitoring queries and establishes service level baselines

  • Supports senior engineers during incidents

  • Makes contributions during post-mortems and RCAs

  • Participates in disaster recovery tests

  • Implements automation and executes code in production environments

  • Contributes to SRE knowledge documentation

 

Functional Competencies/Technical Skills:

 

Design for Reliability

  • Can support architecture and senior engineers in the creation of infrastructure topology drawings and deployment workflows.

  • Can carry out the testing of availability, reliability, and recoverability in non-production environments.

  • Able to benchmark and document test performance results to support production readiness reviews.

  • Has advanced hands-on experience of DevOps practices including monitoring, virtual networks, cloud storage, containers and orchestration, CI/CD, configuration management, and securing cloud applications.

 

Disaster Recovery

  • Capable of participating on-call to assist in the recovery of Major Incidents (for production environments).

  • Can test system and component failover within and between geographic regions (for production environments).

  • Able to automate the recovery of systems and components using Infrastructure-as-Code and Configuration Management scripts.

 

Incident Management

  • Has the ability to create and/or present RCAs including the executive summary, timeline, detailed impact statement, follow-on actions, and residual risks.

  • Can lead scenario modelling exercises and the creation of workflows which are triggered by a breach of SLO.

  • Able to participate on the on-call rotation and provide on-call support for other SRE engineers.

  • Can write advanced automation scripts for incident response including failovers and rollbacks.

 

Reliability Culture

  • Can contribute to SRE knowledge base articles and training material.

  • Able to analyze toil by looking at ticket trends and can make recommendation for the team on focus areas.

  • Can independently work on small toil elimination projects.

 

Observability

  • Has a deep technical understanding of observability techniques across the full stack and can bring clarity to complex incidents or performance issues.

  • Able to create templated observability dashboards and configuration using code so that others can implement quickly for their products.

  • Can influence the setting of appropriate SLOs and Error Budgets.

 

Platforms and Automation

  • Can work within or manage a cross-functional team in support of migrating applications to standard platforms.

  • Gives direction and consultancy to others when implementing new Paved Road features or Platforms.

  • Can analyze and make recommendations to improve the SDLC and CI/CD processes.

  • Able to create actionable reports on the operational health and lifecycle of platform and product components.

 

Technical Skills

  • Azure (AKS, etc)

  • Terraform

  • GitHub

  • CI/CD

  • Java debugging

  • Helm charts

  • JFrog

  • ELK

  • Akeyless or Vault

  • Others as assigned

U.S. National Base Pay Range: $95,300 - $158,800. Geographic differentials may apply in some locations to better reflect local market rates. This job is eligible for an annual incentive bonus.

We know your well-being and happiness are key to a long and successful career. We are delighted to offer country specific benefits. Click here to access benefits specific to your location.

We are committed to providing a fair and accessible hiring process. If you have a disability or other need that requires accommodation or adjustment, please let us know by completing our Applicant Request Support Form or please contact 1-855-833-5120.

Criminals may pose as recruiters asking for money or personal information. We never request money or banking details from job applicants. Learn more about spotting and avoiding scams here.

Please read our Candidate Privacy Policy.

We are an equal opportunity employer: qualified applicants are considered for and treated during employment without regard to race, color, creed, religion, sex, national origin, citizenship status, disability status, protected veteran status, age, marital status, sexual orientation, gender identity, genetic information, or any other characteristic protected by law.

USA Job Seekers:

EEO Know Your Rights.