Job Description:

The Site Reliability Engineering team provides leadership, direction and accountability for platform engineering, system design and end-to-end implementation to meet and exceed the product or platform non-functional requirements including quality, security, reliability, availability and performance. SREs enable development teams to focus on releasing products with reliability and velocity.

The Sr II Site Reliability Engineer (SRE) on our dynamic SRE team focuses on driving the SRE charter by using software engineering to enable automation and efficiency in all aspects of platform change management and operations. The main responsibilities include, but are not limited to, optimizing design and engineering for new system and enhancements, including processes and day to day activities, to reliably support product rollout and operation in production. He/she will mentor other staff SRE to adopt and implement the DevSecOps culture throughout the enterprise

You will have opportunities to gain experience and knowledge in different aspects of devops challenges and implementation to enhance developer workflow and production stability. You will collaborate with other senior team members to evangelize and drive adoption of the SRE mindset and system engineering practice in order to implement technology solutions that will maximize performance and availability in our environment

Responsibilities

Design and implement orchestration, and tooling solutions to ensure that repetitive administration tasks are performed at a high level of efficiency and free of defect

Design and implement monitoring and recovery tools to provide for site high availability (HA) and disaster recovery (DR)

Design and develop highly available infrastructure and platform components to meet the needs of our growing and evolving product lines

Design and implement security engineering best practices in all our deployed platform and environments

Triage alerts & diagnose/resolve critical issues, manage the implementation of changes

Manage the coordination, documentation, and tracking of critical incidents and corresponding root cause analysis, ensuring rapid and complete issue resolution and appropriate closed loop to customers and other key stakeholders.

Collaborate with Delivery Engineers and DevExp Engineers to enhance and implement continuous integration/continuous deployment orchestration system to reduce friction for software delivery to production

Evangelize the DevSecOps culture and SRE mindset, and mentor others about reliability and best practices.

Identify and work with other engineering discipline to implement opportunities for:

Automation

Signal to noise reduction

Prevention of recurring issues, and other actions to reduce time to mitigate service-impacting events and increase the productivity of cloud operations and development resources

Maintain a strong understanding of IaaS, PaaS, and SaaS offerings with building and maintaining a state-of-the-art, cloud-based environment for massive-scale data processing

Design and implement processes, technology and automation for performance testing.

Ensure that implementation and solution are fully documented, and solution deployed with fully operationalized processes to support the solution lifecycle

Other tasks as assigned

Minimum Qualifications

8-10 years of experience in infrastructure, system engineering, software engineering

Demonstrable experience in testing methodology, testing automation frameworks and tools for application and/or any-as-code (infrastructure, configuration, development tools such as documentation or diagram as code)

A systematic problem-solving approach, coupled with strong communications skills and a sense of ownership and drive.

Hands-on experience in designing, analyzing, scaling, and troubleshooting medium to large scale distributed systems.

Well-versed with SRE methodologies and passionate about solving operation problems through automation and software engineering.

Ability to communicate effectively vertically and horizontally within the organization via demonstrated written and verbal communication skills.

Hands-on experience supporting and implementing at least 2 of architecture development styles and its product lifecycle management including but not limited to: Microservices, Domain-driven, Event-Driven, Monolithic

Strong understanding of cloud native architecture and microservices design and deployment pattern

Hands-on experience in designing and implementing application and/or platform performance, load and stress testing

Skills Desired

Advanced experience designing and supporting one of the 3 major public cloud provider – AWS is a plus will consider any other public cloud providers experience

Full stack software engineering experience with a solid foundation of at least 2-3 of the following frontend and/or backend technologies: ReactJS (or similar frameworks), Java, Python, SQL, RDBMS or No-SQL Databases.

Hands on strong experience with at least one of configuration management tool experiences with Ansible, Salt, Puppet or Kubernetes configuration tools such as Helm

Hands on strong experience with performance testing tools such as LoadRunner, Jmeter, Blazemeter, Locust, LoadNinja

Advanced experience with at least 1 of Infrastructure as code tooling (IaC) such as Terraform/OpenTofu, Pulumi etc.

Advanced knowledge of at least 1 of release software tooling (e.g. Jenkins or Jenkins X, Spinnaker, Harness, Azure Devops or other Cloud specific cloud environment)

Advanced level of knowledge of Kubernetes and Docker, including experience in Docker image optimization and managing the Docker image lifecycle

Strong experience in at least 2 of the following sets of logging and monitoring tools: ELK stack, Prometheus, Grafana, Stackdriver, New Relic, Datadog, Dynatrace, Splunk or cloud native logging and monitoring in any of the 3 major providers

Advanced level of Linux/Unix/Window OS experience.

Base Pay Range:

The base pay range noted represents the company’s good faith minimum and maximum range for this role at the time of posting. The actual compensation offered to a candidate will be dependent upon several factors, including but not limited to experience, qualifications and geographic location. Also, most employees are eligible for additional incentive pay.

$167,670.00 - $204,930.00

Your Benefits Start Day 1

Your wellbeing is important to Pacific Life, and we’re committed to providing you with flexible benefits that you can tailor to meet your needs. Whether you are focusing on your physical, financial, emotional, or social wellbeing, we’ve got you covered.

Prioritization of your health and well-being including Medical, Dental, Vision, and Wellbeing Reimbursement Account that can be used on yourself or your eligible dependents
Generous paid time off options including: Paid Time Off, Holiday Schedules, and Financial Planning Time Off
Paid Parental Leave as well as an Adoption Assistance Program
Competitive 401k savings plan with company match and an additional contribution regardless of participation

You Can Be Who You Are

We are committed to a culture of diversity and inclusion that embraces the authenticity of all employees, partners and communities. We support all employees to thrive and achieve their fullest potential.

What’s life like at Pacific Life? Visit Instagram.com/lifeatpacificlife

EEO Statement:

Pacific Life Insurance Company is an Equal Opportunity /Affirmative Action Employer, M/F/D/V. If you are a qualified individual with a disability or a disabled veteran, you have the right to request an accommodation if you are unable or limited in your ability to use or access our career center as a result of your disability. To request an accommodation, contact a Human Resources Representative at Pacific Life Insurance Company.

Sr. Site Reliability Engineer II

Related Jobs

DATA SCIENTIST-DIRECT HIRE AUTHORITY

DATA SCIENTIST-DIRECT HIRE AUTHORITY

Staff AI Engineer

Enterprise Architect - Application

Software Engineer III, Ecosystem

Account Executive, Enterprise - Bay Area