Group 1001 is a consumer-centric, technology-driven family of insurance companies on a mission to deliver outstanding value and operational performance by combining financial strength and stability with deep insurance expertise and a can-do culture. Group1001’s culture emphasizes the importance of collaboration, communication, core business focus, risk management, and striving for outcomes. This goal extends to how we hire and onboard our most valuable assets – our employees.
Why This Role Matters:
As a Cloud Platform Site Reliability Engineer (SRE) at Group 1001, you will be responsible for ensuring the reliability, availability, and performance of our systems and applications. You will also have the opportunity to refine and engineer our core application hosting patterns, working closely with our engineering, operations, and security teams. We are an AWS-primary organization with footprints in GCP and Azure. The ideal candidate is passionate about automation, continuous improvement, and reducing toil.
How You’ll Contribute:
Design, implement, and maintain highly available and scalable infrastructure solutions in the Cloud.
Implement and manage DevSecOps practices for multi-Cloud, multli-region project lifecycle, enhancing collaboration and efficiency.
Experience with monitoring and observability tools (Grafana preferably) for real-time system monitoring and troubleshooting.
Strong Git skills, comfort in trunk-based workflows with semver or calver release tagging
Design and implement Infra CI/CD pipelines for automated geospatial software deployment and infrastructure management.
Conduct regular system audits to identify and address potential issues before they impact project delivery.
Provide technical guidance and mentorship to junior team members, fostering a culture of learning and growth.
Work on tasks such as preventing incidents with setting up alerts for symptoms.
Coordinating with multiple teams such as Data Platforms, NOC/SOC and IT security teams.
Building effective monitoring systems with proactive and reactive alerts.
Build system health dashboards.
Build end user monitoring dashboards.
Work with Delivery teams to provide insights into monitoring data.
Manage deployments and incidents.
What We’re Looking For:
Extensive experience with AWS
Git, GitLab, Infra CI/CD Pipelines
Terraform and/or Pulumi
Hands on experience as SRE
Experience automating Operational actions with CI/CD pipelines, generating runbooks and working handoffs to L1 teams
Nice to Have:
Extensive experience with Lambda-based applications
Experience in Service Meshes
Experience with Policy-as-Code (Rego, OPA)
Experience with ZTNA environments
Benefits Highlights:
Employees who meet benefit eligibility guidelines and work 30 hours or more weekly, have the ability to enroll in Group 1001’s benefits package. Employees (and their families) are eligible to participate in the Company’s comprehensive health, dental, and vision insurance plan options. Employees are also eligible for Basic and Supplemental Life Insurance, Short and Long-Term Disability. All employees (regardless of hours worked) have immediate access to the Company’s Employee Assistance Program and wellness programs—no enrollment is required. Employees may also participate in the Company’s 401K plan, with matching contributions by the Company.
Group 1001, and its affiliated companies, is strongly committed to providing a supportive work environment where employee differences are valued. Diversity is an essential ingredient in making Group 1001 a welcoming place to work and is fundamental in building a high-performance team. Diversity embodies all the differences that make us unique individuals. All employees share the responsibility for maintaining a workplace culture of dignity, respect, understanding and appreciation of individual and group differences.