Job Title: Senior Consultant - AI DevOps Engineer – AI Platforms
Career Level: D2
Introduction to role
Are you ready to redefine the future of technology in healthcare? As a Senior Consultant - AI DevOps Engineer at AstraZeneca, you'll be at the forefront of embedding AI across our value chain, from discovering novel compounds to enhancing patient safety and optimizing commercial outcomes. Our Enterprise AI organization is committed to delivering Connected Intelligence, Reusable AI, and AI Platforms. In this role, you'll go beyond traditional support to build, operate, and continuously improve production-grade AI platforms and pipelines. You'll be the first responder for incidents and service requests, designing and automating infrastructure, deployment workflows, observability, and governance guardrails. Collaborate with multidisciplinary engineers, data scientists, ML engineers, and platform engineers to advance healthcare for millions of patients. Are you ready to make a difference?
Accountabilities
- Platform Engineering & Reliability: Build and operate AWS-based AI platform services (including Kubernetes, GPU workloads, storage, networking, service mesh). Own SLA/SLOs, capacity planning, cost optimization, and performance tuning for AI workloads.
- MLOps & CI/CD: Design and implement end-to-end CI/CD for ML systems (data, models, services), including feature stores, model registries, artifact/version management, and automated model deployment to batch, streaming, and real-time endpoints.
- Automation & Infrastructure as Code: Create reproducible environments with Infrastructure as Code (e.g., Terraform, CloudFormation, CDK). Automate environment provisioning, cluster upgrades, dependency management, and blue/green or canary deployments.
- Observability & Incident Response: Implement logging, metrics, tracing, model/data drift monitoring, and alerting. Lead L1–L3 incident response, root cause analysis, and postmortems. Continuously improve runbooks and self-healing mechanisms.
- Security, Compliance & Governance: Partner with Cyber Security, Data Privacy, and internal governance to implement guardrails (identity, secrets, encryption, vulnerability management).
- Scalability & Performance: Optimize distributed training/inference across GPU, multicore SMP, and distributed clusters. Guide ML engineers on parallelization, resource quotas, and cost/performance trade-offs.
- Operational Excellence: Champion a production-first attitude and streamline pathways for exploratory research to production through golden paths, templates, and platform enablement.
- Collaboration & Enablement: Work closely with Connected Intelligence, Reusable AI, and AI Platforms teams. Provide training, documentation, and developer experience improvements.
Essential Skills/Experience
- Education: B. Tech/M. Tech in Computer Science, Engineering, or a related quantitative field.
- Cloud Expertise: 5-7 years of hands-on experience with AWS (or equivalent cloud), including core services (compute, storage, networking), IAM, and cost management.
- AI Platform experience: Experience with provisioning and managing enterprise AI platforms (Databricks, Domino etc) at scale is a plus.
- Kubernetes at Scale: 5-7 years working with Kubernetes and containerized applications; 5+ years administering production clusters with understanding of operators, storage classes, GPU scheduling, and autoscaling.
- Programming: 3+ years building and delivering software in Python; strong skills in another language (e.g., Go, Java) are valued. Ability to write robust, testable, and observable services.
- Infrastructure as Code & Automation: 3+ years implementing Terraform/CloudFormation/CDK, GitOps workflows (e.g., Argo CD, Flux), and CI/CD systems (e.g., GitHub Actions, GitLab CI, Jenkins).
- MLOps Tooling: Experience with ML orchestration and model lifecycle tools (e.g., MLflow, Kubeflow). Familiarity with feature stores, model registries, A/B testing, and shadow deployments for ML.
- Observability: Proficiency with Prometheus/Grafana, ELK/OpenSearch and incident management.
- Security & Compliance: Experience implementing security controls (secrets management, KMS, encryption, RBAC) and aligning to internal security standards; GxP experience is a plus.
- Agile & ITIL: Comfortable working in Agile teams; experience in support environments or ITIL is beneficial, with a strong focus on automation over manual operations.
- DevOps Perspective: Demonstrated use of DevOps practices to enable automation strategies, improve developer experience, and reduce time-to-production.
- Soft Skills: Creative, collaborative, resilient, with excellent communication and the ability to translate complex technical topics for diverse stakeholders.
Desirable Skills/Experience
- Data & Streaming: Experience with Spark, Databricks, Kafka/Kinesis, and scalable data pipelines.
- GenAI/LLM Ops: Familiarity with LLM serving, prompt/response safety, retrieval-augmented generation (RAG), vector databases, and token-aware scaling.
- Cost Optimization: Rightsizing, spot/fleet strategies, and chargeback/showback practices.
When we put unexpected teams in the same room, we ignite bold thinking with the power to inspire life-changing medicines. In-person working gives us the platform we need to connect, work at pace and challenge perceptions. That's why we work, on average, a minimum of three days per week from the office. But that doesn't mean we're not flexible. We balance the expectation of being in the office while respecting individual flexibility. Join us in our unique and ambitious world.
At AstraZeneca, your work directly impacts patients by transforming our ability to develop life-changing medicines. We empower the business to perform at its peak by combining pioneering science with leading digital technology platforms. Join us at a crucial stage of our journey as we become a digital and data-led enterprise. With a passion for impacting lives through data analytics and AI technologies like machine learning—there's no better time to join us!
Ready to make an impact? Apply now to be part of our innovative team!
Date Posted
11-Nov-2025
Closing Date
AstraZeneca embraces diversity and equality of opportunity. We are committed to building an inclusive and diverse team representing all backgrounds, with as wide a range of perspectives as possible, and harnessing industry-leading skills. We believe that the more inclusive we are, the better our work will be. We welcome and consider applications to join our team from all qualified candidates, regardless of their characteristics. We comply with all applicable laws and regulations on non-discrimination in employment (and recruitment), as well as work authorization and employment eligibility verification requirements.