Qover

Site Reliability Engineer (SRE)

Brussels Full Time
Who we are:

We’re Qover, a leading insurtech scale-up that has raised $70 million.
With an international team of 140+ colleagues based in Brussels, we help companies orchestrate digital embedded insurance experiences.
 
Since Qover was founded in 2016, our co-founders had a clear vision of the future of insurance: it must be simple, transparent and accessible across borders – a global safety net 🌍
 
We’re well on our way to building that safety net, with a platform that protects millions of users across 32 European countries. We work with top brands like Revolut, Mastercard, BMW, Canyon, Monzo and many others.

Visit our website for more about what we do! 


Your Role:

As a Site Reliability Engineer, you’ll be a key driver in ensuring our platform is highly available, scalable, secure, and performant. You will bridge software development and operations to build reliable systems, optimize our infrastructure, and champion automation across the engineering lifecycle.

What You’ll Do:

      Build and maintain: Manage scalable infrastructure and applications to support millions of users across 32 European countries.
      Ensure reliability: Guarantee high availability and performance of services through rigorous monitoring, alerting, and tuning.
      Automate everything: Collaborate with engineering teams to automate deployments, capacity scaling, and incident responses.
      Define standards: Define and maintain Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets.
      Solve complex problems: Troubleshoot production issues and lead root-cause analysis post-mortems.
      Optimize developer experience: Design tools and processes that reduce toil and improve the developer experience.
      Enhance observability: Improve observability using metrics, logs, and distributed tracing.

 
Your Profile:
We’re looking for someone with an engineer’s mindset who loves to optimize systems.
      Cloud environments: Strong experience in cloud platforms, ideally GCP (as we are a Google Cloud environment ), or AWS/Azure.
      Container orchestration: Proficiency with Kubernetes and Docker.
      CI/CD automation: Experience building pipelines with tools like GitLab CI, GitHub Actions, or Jenkins.
      Infrastructure as Code (IaC): Mastery of Terraform or similar frameworks.
      Monitoring & observability: Familiarity with Prometheus, Grafana, Datadog, or the ELK stack.
      Programming/scripting skills: Strong scripting abilities in TypeScript (our main stack), Go, or Bash.
      Experience: Minimum 3+ years in SRE or similar roles.
      Distributed Systems: Proven experience operating and troubleshooting distributed systems.
      Reliability Mindset: Solid understanding of reliability best practices (monitoring, alerting, incident
      Soft Skills:
      You show strong soft skills—cultural fit is the most important for us.
      You are self-motivated, open-minded, curious, and flexible.
      You are structured yet pragmatic, paying close attention to details.
      You are ready to share your experience and mentor junior team members.
      You have a "can do" mentality and, above all, you are fun :).