Job Description
Job Title: Senior Site Reliability Engineer
Location: Kuala Lumpur
About AirAsia MOVE
AirAsia MOVE is a leading ASEAN-focused budget travel OTA, part of the Capital A Group. We deliver customer-centric travel solutions by combining innovation with operational excellence. Our goal is to create seamless, reliable, and delightful journeys for travelers across the region.
We’re looking for a Senior Site Reliability Engineer to help scale and stabilize our cloud infrastructure and reliability practices as we grow across multiple lines of business.
You’ll lead key initiatives around:
Cloud architecture modernization
Multi-region reliability
Observability and incident response
Reducing toil through automation and self-service
This is a hands-on technical role, where you’ll work across platform, SRE, and application teams to build scalable systems that are resilient, cost-aware, and developer-friendly.
Design and implement secure, scalable infrastructure on Google Cloud Platform (GCP)
Lead efforts to build and evolve MOVE’s GCP Landing Zone, including Shared VPC, org structure, IAM, and policy guardrails
Build and improve multi-region architectures for high availability and disaster recovery
Drive infrastructure automation using Terraform, CI/CD, and GitOps practices
Improve observability across teams by standardizing monitoring, tracing, and alerting
Collaborate on incident response and postmortems to reduce MTTR and build resilience
Enforce tagging, FinOps controls, and security policies across GCP projects
Contribute to platform engineering initiatives and developer self-service tools
5+ years in SRE, DevOps, or cloud infrastructure roles
Solid experience with GCP, Terraform, Kubernetes (GKE), or similar cloud providers
Strong hands-on experience in automation and multi-region architecture design
Experience in networking (VPCs, NAT, PSC), IAM, and cloud-native security
Proven ability to debug and support production systems under pressure
Familiarity with monitoring and tracing tools like Cloud Monitoring, OpenTelemetry, Signoz
Exposure to using AI/anomaly detection for alert tuning or reliability insights
Clear communicator who works well with developers, product, and other infra teams