AirAsia

Senior Software Engineer - SRE

KL Sentral - Redstation Full time


Job Description


Job Title: Senior Site Reliability Engineer
Location: Kuala Lumpur

About AirAsia MOVE
AirAsia MOVE is a leading ASEAN-focused budget travel OTA, part of the Capital A Group. We deliver customer-centric travel solutions by combining innovation with operational excellence. Our goal is to create seamless, reliable, and delightful journeys for travelers across the region.

About the Role

We’re looking for a Senior Site Reliability Engineer to help scale and stabilize our cloud infrastructure and reliability practices as we grow across multiple lines of business.

You’ll lead key initiatives around:

  • Cloud architecture modernization

  • Multi-region reliability

  • Observability and incident response

  • Reducing toil through automation and self-service

This is a hands-on technical role, where you’ll work across platform, SRE, and application teams to build scalable systems that are resilient, cost-aware, and developer-friendly.


What You’ll Do

  • Design and implement secure, scalable infrastructure on Google Cloud Platform (GCP)

  • Lead efforts to build and evolve MOVE’s GCP Landing Zone, including Shared VPC, org structure, IAM, and policy guardrails

  • Build and improve multi-region architectures for high availability and disaster recovery

  • Drive infrastructure automation using Terraform, CI/CD, and GitOps practices

  • Improve observability across teams by standardizing monitoring, tracing, and alerting

  • Collaborate on incident response and postmortems to reduce MTTR and build resilience

  • Enforce tagging, FinOps controls, and security policies across GCP projects

  • Contribute to platform engineering initiatives and developer self-service tools




 What We’re Looking For

  • 5+ years in SRE, DevOps, or cloud infrastructure roles

  • Solid experience with GCP, Terraform, Kubernetes (GKE), or similar cloud providers

  • Strong hands-on experience in automation and multi-region architecture design

  • Experience in networking (VPCs, NAT, PSC), IAM, and cloud-native security

  • Proven ability to debug and support production systems under pressure

  • Familiarity with monitoring and tracing tools like Cloud Monitoring, OpenTelemetry, Signoz

  • Exposure to using AI/anomaly detection for alert tuning or reliability insights

  • Clear communicator who works well with developers, product, and other infra teams