At Lilly, we unite caring with discovery to make life better for people around the world. We are a global healthcare leader headquartered in Indianapolis, Indiana. Our employees around the world work to discover and bring life-changing medicines to those who need them, improve the understanding and management of disease, and give back to our communities through philanthropy and volunteerism. We give our best effort to our work, and we put people first. We’re looking for people who are determined to make life better for people around the world.

Role Overview

Tech@Lilly is looking for a Data Scientist to join the technology team supporting Global Regulatory Affairs (GRA), Global Scientific Communications (GSC), and Global Statistical Sciences (GSS). You will apply data science, machine learning, and AI techniques to solve real business problems — from building agents to automate manual regulatory workflows and optimizing clinical trial processes to building predictive models that drive smarter decisions across the portfolio. This role sits at the intersection of data, domain expertise, and engineering. You won’t just build models in notebooks — you’ll partner with business stakeholders to understand the problem and with leadership to quantify the impact of your work.

What You’ll Be Doing

As a Data Scientist, you will work across the portfolio to identify opportunities where data science and AI can create measurable business value. You’ll analyze complex datasets — regulatory documents, clinical trial data, submission timelines, operational metrics — to uncover patterns, build predictive models, and develop AI-powered solutions. You’ll design and evaluate machine learning models, build NLP and generative AI applications for regulatory and scientific content, and collaborate closely with full stack engineers to move your work from prototype to production. You operate in a regulated, GxP environment where data integrity, reproducibility, and validation are not optional.

How You’ll Succeed

Partnering with business stakeholders across GRA, GSC, and GSS to understand their workflows, identify high-impact problems, and frame them as data science opportunities — translating business questions into analytical approaches.
Developing and deploying machine learning models — classification, regression, clustering, time-series forecasting — to solve problems such as submission timeline prediction, document classification, regulatory risk scoring, and resource optimization.
Building and evaluating NLP and generative AI solutions — leveraging LLMs, RAG architectures, text extraction, entity recognition, and document summarization to automate regulatory authoring, scientific literature analysis, and content generation workflows.
Designing and executing experiments to evaluate model performance — using rigorous statistical methods, A/B testing, and evaluation frameworks (including RAGAS for RAG systems) to ensure solutions meet quality and accuracy thresholds before deployment.
Designing and building AI agents and agentic workflows — creating multi-step, tool-using systems that can autonomously execute complex tasks such as regulatory document drafting, data extraction and transformation, and cross-system orchestration — moving beyond single-prompt interactions to production-grade agent architectures that operate reliably in a validated environment.
Collaborating with full stack engineers and platform teams to productionize models — building APIs, integrating into existing applications, deploying on AWS infrastructure (Lambda, EKS, SageMaker, Databricks), and monitoring model performance in production.
Communicating findings and recommendations to both technical and non-technical audiences — using data visualization, storytelling, and clear business-impact framing to ensure your work drives actual decisions.
Staying current with emerging techniques in machine learning, generative AI, and data science — evaluating new tools, frameworks, and approaches for applicability to the GRA/GSC/GSS portfolio and sharing knowledge with the broader team.

Basic Qualifications

Bachelor’s degree in Data Science, Statistics, Computer Science, Mathematics, or a related quantitative field
1 years of professional data science experience in Python, R and core data science libraries

Additional Skills & Preferences

Experience with machine learning frameworks and model deployment patterns
Academic Background in Data Science
Hands-on experience with NLP techniques and/or generative AI — LLM APIs (OpenAI, Anthropic), RAG architectures, vector databases, prompt engineering
Familiarity with cloud data platforms — AWS (SageMaker, Lambda, S3), Databricks, or similar
Knowledge of statistical methods — hypothesis testing, experimental design, Bayesian methods, regression analysis
Experience with SAS programming
Strong communication skills — ability to present technical findings to non-technical audiences and translate business questions into analytical frameworks
Collaborative mindset and experience working with cross-functional teams including engineers, product owners, and business partners

Lilly is dedicated to helping individuals with disabilities to actively engage in the workforce, ensuring equal opportunities when vying for positions. If you require accommodation to submit a resume for a position at Lilly, please complete the accommodation request form (https://careers.lilly.com/us/en/workplace-accommodation) for further assistance. Please note this is for individuals to request an accommodation as part of the application process and any other correspondence will not receive a response.

Lilly is proud to be an EEO Employer and does not discriminate on the basis of age, race, color, religion, gender identity, sex, gender expression, sexual orientation, genetic information, ancestry, national origin, protected veteran status, disability, or any other legally protected status.

Our employee resource groups (ERGs) offer strong support networks for their members and are open to all employees. Our current groups include: Africa, Middle East, Central Asia Network, Black Employees at Lilly, Chinese Culture Network, Japanese International Leadership Network (JILN), Lilly India Network, Organization of Latinx at Lilly (OLA), PRIDE (LGBTQ+ Allies), Veterans Leadership Network (VLN), Women’s Initiative for Leading at Lilly (WILL), enAble (for people with disabilities). Learn more about all of our groups.

Actual compensation will depend on a candidate’s education, experience, skills, and geographic location. The anticipated wage for this position is

$66,000 - $165,000

Full-time equivalent employees also will be eligible for a company bonus (depending, in part, on company and individual performance). In addition, Lilly offers a comprehensive benefit program to eligible employees, including eligibility to participate in a company-sponsored 401(k); pension; vacation benefits; eligibility for medical, dental, vision and prescription drug benefits; flexible benefits (e.g., healthcare and/or dependent day care flexible spending accounts); life insurance and death benefits; certain time off and leave of absence benefits; and well-being benefits (e.g., employee assistance program, fitness benefits, and employee clubs and activities).Lilly reserves the right to amend, modify, or terminate its compensation and benefit programs in its sole discretion and Lilly’s compensation practices and guidelines will apply regarding the details of any promotion or transfer of Lilly employees.

#WeAreLilly

Data Scientist

Related Jobs

Physical Therapist National Traveler

Physical Therapist National Traveler

Intern - Software Engineer (Full-stack)

Intern - Software Engineer (Backend)

Intern - Data & Business Intelligence

Senior Data Scientist (m/f/d)