Roche

Data Scientist - Pharma R&D

Pune Full time

At Roche you can show up as yourself, embraced for the unique qualities you bring. Our culture encourages personal expression, open dialogue, and genuine connections,  where you are valued, accepted and respected for who you are, allowing you to thrive both personally and professionally. This is how we aim to prevent, stop and cure diseases and ensure everyone has access to healthcare today and for generations to come. Join Roche, where every voice matters.

The Position

The Principal IT Data Scientist will contribute to complex data science projects end-to-end. This role is critical to accelerating clinical development timelines by transitioning manual, rule-based clinical data programming into an AI-augmented, multi-agentic workflow with Human-in-the-Loop (HITL) oversight.

Operating in a highly regulated GxP environment, you will build specialized AI agents that assist Analytical Data Scientists across the clinical data lifecycle. This requires deep expertise in advanced generative AI, multi-agent architectures (e.g., LangGraph), Graph-based Retrieval-Augmented Generation (GraphRAG), and LLMOps. You will balance hands-on technical work with leadership responsibilities, frequently contributing to shaping the team's technical roadmap to advance Roche's digital transformation and deliver patient benefits faster.

Job Responsibilities

  • Scope / (Content Leadership): Innovates and leads cutting-edge GenAI initiatives from concept to production. Designs and builds specialized, stateful AI agents (e.g., Planner, Code Refiner, Summarization) that can reason, critique, and route tasks autonomously before presenting them to a human.

  • Accountability / Problem Solving: Tackles complex clinical programming challenges by developing bespoke multi-agent workflows. Examples include building proactive coding debuggers that parse verbose logs, classify errors, and automatically propose code/configuration fixes via GitLab Merge Requests.

  • Clinical Data Automation: Develops algorithms and LLM agents to automate the mapping of raw clinical trial data to highly structured CDISC formats (SDTMv and ADaM) and assists in generating R/SAS code for statistical analysis outputs (TLG generation).

  • Impact / Strategy: Leads high-impact projects that significantly influence clinical development timelines. Designs continuous learning loops and database schemas to capture implicit and explicit human feedback (HITL), continuously fine-tuning retrieval accuracy and diagnostic confidence.

  • Complexity / (Product Size): Builds and maintains production-ready end-to-end pipelines. Creates automated AI evaluation pipelines and "Gold Datasets" to track metrics like context recall, accuracy, and hallucination rates for deployed models.

  • Stakeholder Management: Works closely with senior leadership and cross-functional agile teams. Acts as a coach, mentor, or buddy to help junior data scientists navigate complex technical and regulatory guardrails.

Qualifications

Education / Experience

  • B. SC or M.Sc. in Data Science, Computer Science, Statistics, Mathematics, Physics, or a related quantitative field, with 5 to 7+ years of industrial experience leading complex data science/AI engineering projects from end-to-end.

  • Demonstrated experience moving beyond simple LLM API calls into building complex, stateful multi-agent architectures and production-grade RAG systems.

  • Clinical Domain Adaptability: Ability to understand clinical programming concepts and operate within strict regulatory/GxP guardrails. Prior exposure to clinical data interoperability standards (CDISC, SDTM, ADaM) is highly preferred.

Technical Skills

  • Programming & Frameworks: Deep expertise in Python (for AI/ML backend/agent logic) and R (heavy focus on clinical statistical programming, NextGen toolkits like Admiral). Familiarity with SAS is a plus.

  • AI / ML & Multi-Agent Frameworks: Hands-on capability with LangChain, LangGraph, AWS AgentCore, and Model Context Protocol (MCP) servers. Expertise in advanced RAG methodologies, including chunking strategies, embeddings, and GraphRAG.

  • LLMOps & Observability: Strong experience tracing, evaluating, and managing prompts in production using tools like Portkey, LangSmith.

  • Cloud, DevOps & Software Engineering: Strong background in MLOps and scalable deployment on AWS (EKS). Experience interacting with REST APIs, GitLab integration (for automated branch/MR creation), Docker, and AI coding assistants (Positron IDE, GitHub Copilot).

  • Data Engineering: Ability to build ingestion and curation pipelines (using AWS Glue, MWAA/Airflow) to hydrate databases. Experience with AWS OpenSearch Serverless (Vector DB), AWS DocumentDB, Neptune (Graph DB), and Aurora PostgreSQL.

Additional Qualifications

  • Leadership: Strong leadership capabilities, with a proven ability to oversee multiple projects simultaneously and guide junior to mid-level AI engineers.

  • Strategic Vision: Strategic mindset to promote advanced data science methodologies as a key enterprise competency.

  • Communication: Effective data storytelling skills to present complex AI evaluation metrics and insights to non-technical stakeholders and clinical programming teams.

  • Data storytelling skills and using visualisation tools to communicate data and results with a non-technical audience.

  • International, goal oriented mindset with can do attitude.

  • Fluency in written and spoken English.

 

 

Who we are

A healthier future drives us to innovate. Together, more than 100’000 employees across the globe are dedicated to advance science, ensuring everyone has access to healthcare today and for generations to come. Our efforts result in more than 26 million people treated with our medicines and over 30 billion tests conducted using our Diagnostics products. We empower each other to explore new possibilities, foster creativity, and keep our ambitions high, so we can deliver life-changing healthcare solutions that make a global impact.


Let’s build a healthier future, together.

Roche is an Equal Opportunity Employer.