At Lilly, we unite caring with discovery to make life better for people around the world. We are a global healthcare leader headquartered in Indianapolis, Indiana. Our employees around the world work to discover and bring life-changing medicines to those who need them, improve the understanding and management of disease, and give back to our communities through philanthropy and volunteerism. We give our best effort to our work, and we put people first. We’re looking for people who are determined to make life better for people around the world.
Data Steward Engineer – Pharma Manufacturing Analytics
Role: Data Steward Engineer Level: P2
What You'll Be Doing
You will serve as the technical and functional data steward for our pharma manufacturing data analytics platform, built on Azure Databricks. You will work at the intersection of manufacturing science, data engineering, and data governance to ensure that data flowing from systems such as MES, LIMS, PI, ERP, QMS, and Batch Records is accurate, understood, clean, and fit for analytical use. You will partner closely with manufacturing scientists, quality engineers, data engineers, and business analysts to continuously improve data trust and usability across the analytics organization.
How You will Succeed
You will develop and maintain deep subject matter expertise across pharma manufacturing data domains—including MES (Manufacturing Execution Systems), LIMS (Laboratory Information Management Systems), Stability studies, OSI PI/AVEVA, Process Monitoring, QA/QC systems, Certificate of Analysis (CoA), ERP (SAP), and batch records—enabling the team to correctly interpret, contextualize, and use these datasets.
You will conduct systematic data profiling, data quality assessments, and root cause analysis across Azure Databricks data assets, identifying and resolving data anomalies, gaps, duplicates, and inconsistencies at the source and in the data pipeline.
You will design and implement data cleansing, standardization, and transformation pipelines in Azure Databricks (PySpark / SQL / Delta Lake) to convert raw, heterogeneous manufacturing data into curated, analytics-ready datasets.
You will create and maintain data dictionaries, business glossaries, data lineage maps, and metadata documentation for all major manufacturing data domains, making data discoverable and understandable for scientists, engineers, and analysts.
By collaborating with source system owners and IT teams, you will trace data provenance, map data flows from source systems into the Databricks Lakehouse, and document transformation logic to support data traceability for GMP compliance.
You will define and enforce data quality rules, thresholds, and monitoring dashboards within the Azure Databricks environment, and establish alerting mechanisms to detect data quality degradations proactively.
You will act as the primary point of contact ("data translator") between data consumers (analytics, data science, reporting) and data producers (manufacturing operations, quality, lab systems), bridging the gap between technical and business language.
You will partner with Data Governance, Master Data Management (MDM), and Information Management functions to align manufacturing data stewardship practices with enterprise-wide data governance policies and GxP/regulatory requirements.
What You Should Bring
Deep knowledge of pharma manufacturing data systems and their data structures.
Strong proficiency in Azure Databricks, including Delta Lake, PySpark, Spark SQL, notebooks, and Unity CatLog for data governance.
Hands-on experience with data profiling and data quality tooling
Familiarity with GMP data integrity principles (FDA 21 CFR Part 11, ALCOA+) and their implications on data lifecycle management in a regulated manufacturing environment.
Experience developing data lineage and metadata documentation using tools such as Azure Purview / Microsoft Purview etc
Strong SQL and Python skills with the ability to write complex queries, transformations, and automation scripts in a cloud-based big data environment.
Excellent communication skills—ability to translate complex data concepts and quality findings into actionable insights for both technical engineers and non-technical business stakeholders.
Basic Qualifications
Bachelor's degree in computer science, Information Systems, Chemical Engineering, Life Sciences, or a related field (or equivalent practical experience).
6+ years of experience in data engineering, data stewardship, or data management roles, with a significant portion in the pharmaceutical, biotech, or regulated manufacturing industry.
Minimum 5+ years of hands-on experience with manufacturing data systems (MES, LIMS, ERP, Historian/PI, or equivalent).
Minimum 3+ years of experience with Azure Databricks or a comparable cloud data lakehouse platform (AWS, GCP).
Demonstrated experience designing and executing data cleansing, data quality, and data governance programs at scale.
Proficiency in Python and SQL for data transformation and automation.
Experience working with and interpreting batch manufacturing records, laboratory data, and process control data in the context of analytical workflows.
Success Metrics and Additional Preferences
Improved data quality score across manufacturing datasets.
Reduced data reconciliation issues between source systems.
Clear data lineage and governance framework established.
Increased trust and adoption of analytics by QA/QC and Manufacturing teams.
CDMP (Certified Data Management Professional) or equivalent data governance certification.
Prior experience in a GxP-regulated data environment with exposure to CSV (Computer System Validation).
Lilly is dedicated to helping individuals with disabilities to actively engage in the workforce, ensuring equal opportunities when vying for positions. If you require accommodation to submit a resume for a position at Lilly, please complete the accommodation request form (https://careers.lilly.com/us/en/workplace-accommodation) for further assistance. Please note this is for individuals to request an accommodation as part of the application process and any other correspondence will not receive a response.
Lilly does not discriminate on the basis of age, race, color, religion, gender, sexual orientation, gender identity, gender expression, national origin, protected veteran status, disability or any other legally protected status.
#WeAreLilly