We are looking for Senior Lead Data Engineer, you will architect the highly scalable, cloud-native data platforms that power our Real-World Data (RWD) and DRG (Decision Resources Group) analytics solutions—critical tools that help researchers, clinicians, scientists, and business leaders make faster, more confident decisions. You’ll help build the data engine behind products used to accelerate drug discovery, evaluate treatment effectiveness, model patient journeys, and bring life-saving innovations to market.
This is an opportunity to build data systems that not only drive next-generation AI but also create measurable impact in healthcare and life sciences globally.
If you’re passionate about data engineering and excited to work on platforms that enable next-generation AI, this role is for you.
About You – Experience, Education, Skills, and Accomplishments
- Bachelor’s degree in computer science, Engineering, or related field.
- Minimum 8 years of experience building scalable, production-grade data systems.
- Proven ability to design massively scalable distributed data processing pipelines.
- Strong background in database design, schema modelling, and performance tuning.
- Hands-on expertise building and optimizing complex ETL/ELT pipelines that power ML and analytics workloads.
- Ability to research and work independently, & working with remote team in different time-zones
- Experience working on interactive speed query engines like StarRocks, ClickHouse, Druid etc
- Experience designing resilient, fault-tolerant, cloud-native data platforms with automated disaster recovery.
- Hands-on background in Agile delivery, CI/CD, and containerized workflows.
- Strong understanding of data versioning, lineage, reproducibility, and metadata management — critical for AI governance.
Technical Skills
- Big Data, PySpark, Databricks, Snowflake
- Interactive query engines like StarRocks/ClickHouse/Druid
- Exposure to open-source technologies like DuckDB, Polars
- Optimize Transformations: Refine complex logic, often the most resource-intensive part, using efficient code and techniques.
- AWS Glue, AWS EMR, Delta Lake, Iceberg
- Parquet, RDBMS (PostgreSQL)
- Experience designing data flows that serve AI, GenAI, and algorithmic workloads
Languages
- Proficient in Python, SQL, and PySpark
- Bonus: experience building data prep scripts for ML model training
Cloud Technologies & Tools
- Strong experience with AWS: EMR, Glue, S3, EC2, RDS, Aurora PostgreSQL, Lambda
- Ability to evaluate and integrate AI-friendly tools (feature stores, vector databases, ML workflow orchestration, etc.)
It Would Be Great If You Also Have
- Exposure to GenAI technologies, LLM data pipelines, or vector embeddings
- Experience supporting data needs for ML, LLMs, or analytics teams
- Experience collaborating with distributed, high-velocity global teams
- Experience building end-to-end RAG pipelines, advanced RAG like Fusion RAG and applying Query transformation to improve the Retrieval process.
- Experience in Python frameworks like LangChain, LlamaIndex used to build GenAI application
- Exposure to Vector databases like Chroma, Pinecone, Milvus, Weaviate, LanceDB
What You Will Be Doing in This Role
AI-Ready Data Architecture & Technical Leadership
- Architect and deliver a future-proof data lake platform optimized for analytics, ML, and GenAI workloads.
- Design intelligent, automated, highly scalable data pipelines that support model training, inference, and continuous learning.
- Provide thought leadership on emerging AI-driven data patterns such as feature stores, vectorized pipelines, and streaming ingestion.
- Evaluate modern technologies (Delta Lake, Iceberg, Databricks ML, AWS AI services) to ensure the platform stays ahead of the curve.
- Own the end-to-end data lake solution design, ensuring scalability, reliability, and AI-readiness.
- Collaborate well with colleagues & business stakeholders to define and execute on technical strategy.
- Be an active stakeholder throughout the software development life cycle, overseeing the software design & ensuring the project maintains its technical direction, while adjusting the technical design to mitigate unexpected blockers during the project.
Data Engineering & Platform Delivery
- Build high-performance, cloud-native ETL & ELT pipelines using AWS Glue, EMR, and Databricks.
- Ensure data quality, lineage, auditability, and governance to support trustworthy AI and analytics.
- Embed standards for data observability, automated quality checks, and ML-ready feature transformations.
- Help implement robust SLAs for AI data services, ensuring fast, deterministic, and reliable data flows.
- Act as a key contributor in architectural decisions, data modelling, workflow optimization, and platform enhancements.
Innovation, GenAI Integration & Customer Impact
- Drive R&D explorations across new AI/GenAI enablers such as automated data labelling, embeddings, or intelligent data preparation.
- Partner with Product and Technology leaders to translate business problems into AI-ready data solutions.
- Lead initiatives to make the data platform more “AI-native,” enabling advanced analytics, LLM-driven insights, and real-time intelligence.
- Continuously explore how emerging AI tools can reduce operational overhead and automate previously manual processes.
- Create technical documentation and knowledge assets to scale AI-ready engineering practices across the organization.
About the Team
You will join the RWD DRG Fusion team, a global engineering organization focused on powering the next generation of healthcare and life sciences insights. The team thrives on innovation, collaboration, diversity, and a strong sense of mission. You’ll work with product owners, scientists, data scientists, ML engineers, and architects shaping the future of our AI-driven products.
Hours of Work
- Full-time (IST)
- 40 hours per week
- Hybrid working environment
At Clarivate, we are committed to providing equal employment opportunities for all qualified persons with respect to hiring, compensation, promotion, training, and other terms, conditions, and privileges of employment. We comply with applicable laws and regulations governing non-discrimination in all locations.