Job Title: Senior Data Engineer - Data Pipelines

Introduction to role:

Are you ready to architect FAIR data platforms that accelerate discovery and turn complex science into deployable insights? Do you want your engineering decisions to remove data friction, power analytics, and help deliver life-changing medicines faster?

In this role, you will design and operate the data foundations our scientists and analysts rely on to explore disease biology, generate evidence, and make bold decisions. You will work across high-performance computing and cloud environments to create secure, scalable pathways for data to move from experiments to models to actionable results.

You will join a collaborative, curious team that fuses data and technology with cutting-edge science. By building canonical models, trusted pipelines, and resilient infrastructure, you will help reduce time-to-insight, improve reproducibility, and enable the next wave of breakthroughs.

Accountabilities:Data Platform Architecture: Design and implement robust, secure, and scalable data platforms and services that enable discovery, access, and reuse (FAIR) and remove barriers to scientific analysis.
Modeling and Warehousing: Define canonical data models and dimensional schemas; build lakehouse/warehouse layers that optimize storage and query performance to speed up evidence generation.
Data Integration: Create reliable ingestion frameworks for structured and unstructured data; standardize metadata, lineage, and cataloging to make data findable and trustworthy.
Governance and Quality: Establish and enforce standards for data quality, access control, retention, and compliance; implement monitoring and observability for proactive issue detection and continuous improvement.
Infrastructure Engineering: Operate solutions across Unix/Linux HPC and AWS cloud environments; engineer for reliability, cost efficiency, scalability, and sustainable performance.
Collaboration and Stakeholder Engagement: Translate scientific and business requirements into clear architectural designs; partner with CPSS stakeholders, R&D IT, and DS&AI to co-create solutions that deliver measurable value.
Engineering Excellence: Apply version control, CI/CD, automated testing, design patterns, and code review to ensure maintainability, resilience, and a high bar for software craftsmanship.
Enablement and Information Exchange: Produce documentation, reusable components, and mentorship that uplift data engineering practices across teams; mentor peers and champion platform adoption.

Essential Skills/Experience:Data platform architecture: Design and implement robust, secure, and scalable data platforms and services that enable discovery, access, and reuse (FAIR).
Modeling and warehousing: Develop canonical data models, dimensional schemas, and lakehouse/warehouse layers; optimize storage and query performance.
Data integration: Build reliable ingestion frameworks for structured and unstructured data; standardize metadata, lineage, and cataloging.
Governance and quality: Establish standards for data quality, access control, retention, and compliance; implement monitoring and observability.
Infrastructure engineering: Operate solutions across Unix/Linux HPC and cloud environments (AWS preferred); ensure reliability, cost efficiency, and scalability.
Collaboration: Translate scientific and business requirements into architectural builds; partner with CPSS collaborators, R&D IT, and DS&AI to co-create solutions.
Engineering excellence: Apply version control, CI/CD, automated testing, design patterns, and code review to ensure maintainability and resilience.
Enablement: Produce documentation, reusable components, and guidance to uplift data engineering practices across teams.

Desirable Skills/Experience:

Hands-on expertise with Python or Scala and distributed data processing frameworks (Spark, PySpark); experience with SQL at scale.
Experience with modern lakehouse and warehouse technologies (Delta Lake, Apache Iceberg or Hudi, Redshift, Snowflake, Athena, BigQuery) and data modeling tools and practices (Dimensional, Data Vault).
Familiarity with orchestration and data workflow tools (Airflow, Argo, Dagster), event streaming (Kafka, Kinesis), and metadata/governance platforms (Collibra, Alation, AWS Glue).
Cloud engineering skills in AWS services relevant to data (S3, EMR, Glue, Lambda, Step Functions, ECS/EKS) and infrastructure-as-code (Terraform, CloudFormation).
Operating experience in Unix/Linux HPC environments, job schedulers (SLURM), containerization, and secure data access patterns for scientific workloads.
Observability and reliability practices (Prometheus, Grafana, CloudWatch), cost optimization, and performance tuning for large-scale analytics.
Strong communication skills to align diverse collaborators, translate domain concepts into technical builds, and drive adoption through documentation and enablement.
Relevant certifications or demonstrated leadership in data platform architecture, governance, or cloud engineering.

When we put unexpected teams in the same room, we unleash bold thinking with the power to
inspire life-changing medicines. In-person working gives us the platform we need to connect, work at pace and challenge
perceptions. That's why we work, on average, a minimum of three days per week from the office. But that
doesn't mean we're not flexible. We balance the expectation of being in the office while respecting individual
flexibility. Join us in our unique and ambitious world.

Why AstraZeneca:
At AstraZeneca you will engineer where impact is immediate and visible—your pipelines will shape evidence, accelerate decisions and help bring new treatments to people sooner. We bring experts from different fields together to solve hard problems quickly, backed by modern platforms across HPC and public cloud so your work runs at scale. Leaders remove barriers, teams share knowledge openly and we value kindness alongside ambition, giving you room to innovate while staying grounded in real patient outcomes.

Call to Action:
If you are ready to architect the data flows that move science into the clinic, send us your CV and tell us about the toughest pipeline you have built and scaled.

Date Posted

24-Dec-2025

Closing Date

05-Jan-2026

AstraZeneca embraces diversity and equality of opportunity. We are committed to building an inclusive and diverse team representing all backgrounds, with as wide a range of perspectives as possible, and harnessing industry-leading skills. We believe that the more inclusive we are, the better our work will be. We welcome and consider applications to join our team from all qualified candidates, regardless of their characteristics. We comply with all applicable laws and regulations on non-discrimination in employment (and recruitment), as well as work authorization and employment eligibility verification requirements.

Senior Data Engineer - Senior Data Engineer - Data & Infrastructure

Related Jobs

Software Engineer - Beijing

Software Engineer II, Frontend Platform

Software Engineer II, Frontend Platform

Software Engineer II, Frontend Platform

Data Scientist III (Product Analytics)

Senior Implementation Lead (m/f/d)