About the Team/Role
We are looking for a highly motivated and high-potential mid-level Site Reliability Engineer (SRE) to join our team and help drive meaningful business impact while accelerating your growth as a reliability leader.
This is an exciting time to be part of the SRE evolution at WEX. Our complex systems and diverse product portfolio power a wide variety of customer businesses, generating rich operational and telemetry data across platforms and environments. As we scale, the need for resilient, observable, and efficient systems is greater than ever.
As a mid-level SRE, you’ll play a key role in shaping and building the next generation of WEX’s reliability practices, platforms, and tools. You’ll contribute to driving improvements across monitoring, alerting, performance optimization, incident response, and capacity management. You’ll also help reduce operational toil through automation and proactive engineering, while partnering closely with engineering and product teams to embed reliability into everything we build.
We take a modern approach to engineering—leveraging agile practices, a product-oriented mindset, and a strong focus on innovation, including the use of AI and automation to enhance observability and response.
You’ll face meaningful challenges that have significant impact, and be part of a high-performing team with experienced engineers and leaders ready to support your continued development.
If you're passionate about reliability, solving hard problems, and growing fast in a high-impact environment, this is a great opportunity for you!
How you’ll make an impact
Monitor and manage system health, availability, and performance of WEX’s Microsoft Azure Cloud ecosystem.
Actively identify and reduce “toil” (manual, repetitive work) by developing and maintaining automation tools
Participate in on-call rotations and respond to system alerts and incidents.
Collaborate with development teams to implement reliability-focused features.
Improve observability and logging for troubleshooting issues.
Follow IT security policies and compliance requirements.
Experience you’ll bring
2+ years of experience in system administration, DevOps, or SRE roles.
Proficiency in scripting and automation using Python, Bash, Go, Terraform.
Experience with monitoring and logging (Grafana, ELK stack, Splunk, etc.).
Knowledge of containerization and orchestration (Docker, Kubernetes).
Understanding of CI/CD pipelines and version control systems.
Understanding of monitoring tools such as Prometheus, Grafana, or Splunk.
Strong problem-solving skills and a willingness to learn.
Preferred Qualification
Hands-on experience with Azure cloud platforms
Familiarity with infrastructure as code (Terraform, Ansible, CloudFormation).
Knowledge of incident response processes and SLAs.
Experience with developing AI based solutions.
Ability to troubleshoot and resolve performance bottlenecks.
Strong communication skills and ability to work across teams.
Experience in healthcare, insurance, or benefits technology.
Experience working with compliance frameworks such as HIPAA, SOC 2, or HITRUST.