Workday

SRE / DevOps Engineer (GoLang)

New Zealand, Auckland Full Time

Your work days are brighter here.

We’re obsessed with making hard work pay off, for our people, our customers, and the world around us. As a Fortune 500 company and a leading AI platform for managing people, money, and agents, we’re shaping the future of work so teams can reach their potential and focus on what matters most. The minute you join, you’ll feel it. Not just in the products we build, but in how we show up for each other. Our culture is rooted in integrity, empathy, and shared enthusiasm. We’re in this together, tackling big challenges with bold ideas and genuine care. We look for curious minds and courageous collaborators who bring sun-drenched optimism and drive. Whether you're building smarter solutions, supporting customers, or creating a space where everyone belongs, you’ll do meaningful work with Workmates who’ve got your back. In return, we’ll give you the trust to take risks, the tools to grow, the skills to develop and the support of a company invested in you for the long haul. So, if you want to inspire a brighter work day for everyone, including yourself, you’ve found a match in Workday, and we hope to be a match for you too.

About the Team

The Cloud Platform SRE team in New Zealand is part of a global Cloud Platform organisation with counterparts in the US and Ireland. Our mission is to ensure the reliability and availability of the cloud platform that hosts Workday's engineering and customer environments. We are dedicated to improving platform reliability, observability, and delivering operational success at scale.

We are a team of versatile SREs who blend software engineering with operations, with a focus on reducing operational toil. Our tech stack is cloud-native — built on Kubernetes, Istio, OPA, GoLang, Prometheus, Grafana, and more. We plan automation and improvement work using scrum practices with two-week sprints, and operate as an autonomous team with a follow-the-sun on-call model across three time zones (NZT, GMT, PT).

We are responsible for the safe change and reliability of customer environments, using SLO-gated multi-stage deployment automation. We partner with platform service teams to define and implement SRE standards, set benchmarks, and qualify services for production. Engineers from this team have shared their experiences at Cloud Native conferences, including KubeCon — so if you're excited about contributing to the community and building infrastructure that powers a global platform, this is the team for you.

About the Role

As a Cloud Platform Site Reliability Engineer, you will join a growing Auckland-based team. This role is instrumental in helping the team scale its impact across the global Cloud Platform organisation. You will bring a passion for identifying and solving problems in distributed environments spanning configuration, Linux operating systems, and networking.

You will be hands-on with distributed cloud-native environments, including Kubernetes-based infrastructure across public clouds (AWS, GCP). You believe that automation is the key to operating large-scale systems and are driven to ensure customer success. The platform you support is built using Cloud Native technologies (CNCF), providing a secure foundation on which Workday service teams and platform development teams can build, test, and deploy continuously.
 

Responsibilities include:

  • Identify, diagnose, and resolve reliability and performance issues across distributed cloud environments, including Kubernetes clusters

  • Design and implement automation solutions that improve operational efficiency, reduce manual effort, and enable the team to operate at scale

  • Develop and launch effective SLIs to ensure that SLOs are achieved through building an extendable observability architecture, runbook automation, and establishing new processes

  • Partner with platform service teams to craft and implement SRE standards for their respective services, defining benchmarks and automation to qualify services for production environments

  • Collaborate with global SRE counterparts in Pleasanton, Atlanta, and Dublin to share knowledge, align on best practices, and deliver consistent operational outcomes

  • Contribute to incident response, root cause analysis, and the development of detailed runbooks and processes that improve the team's ability to respond to and prevent future issues

  • Support and sustain the cloud platform infrastructure, ensuring high availability and reliability for engineering and customer environments

  • Participate in an on-call roster as part of a follow-the-sun model across three time zones, ensuring continuous platform coverage and rapid incident response around the clock

About You

  • 1 to 8 years of experience in site reliability engineering, DevOps, or a related cloud-native infrastructure role

  • Hands-on experience managing distributed systems in a public cloud environment (AWS, GCP, or Azure)

  • BS in Computer Science or a related field, or equivalent practical experience

    Other Qualifications

  • Proficient in cloud-native computing, whether gained through an SRE role, a DevOps position, or through dedicated career development — with a deep understanding of how to build and operate reliable infrastructure using CNCF technologies in cloud environments

  • Experienced with Kubernetes, ideally EKS and GKE, with the ability to manage, troubleshoot, and optimize containerised workloads at scale. Familiarity with related technologies such as Istio, OPA, Prometheus, and Grafana is a plus

  • Solid Linux/Unix background, with comfort navigating, configuring, and troubleshooting Linux-based operating systems in production environments

  • Proficient in programming languages such as GoLang, Python, or Ruby — with a preference for Go — and the ability to write automation, tooling, and scripts that improve platform reliability

  • Familiar with software development practices, including iterative delivery approaches, agile methodologies, CI/CD pipelines, and end-to-end testing, with the ability to contribute to code management and build pipeline processes

  • Excellent documentation and communication skills, with experience developing detailed runbooks, operational processes, and technical guides that enable the team and broader organization to learn and improve

  • Able to collaborate effectively as part of a globally distributed, multi-functional team while also working independently with minimal instruction. You are comfortable partnering with colleagues across diverse backgrounds and time zones

Pursuant to applicable Fair Chance law, Workday will consider for employment qualified applicants with arrest and conviction records.

Workday is an Equal Opportunity Employer including individuals with disabilities and protected veterans.

Are you being referred to one of our roles? If so, ask your connection at Workday about our Employee Referral process!