Lilly

High Performance Compute Intern

India, Bengaluru Full time

At Lilly, we unite caring with discovery to make life better for people around the world. We are a global healthcare leader headquartered in Indianapolis, Indiana. Our employees around the world work to discover and bring life-changing medicines to those who need them, improve the understanding and management of disease, and give back to our communities through philanthropy and volunteerism. We give our best effort to our work, and we put people first. We’re looking for people who are determined to make life better for people around the world.

Job Summary

At Lilly, we unite caring with discovery to make life better for people around the world. We are a global healthcare leader headquartered in Indianapolis, Indiana. Our 39,000 employees work to discover and bring life-changing medicines to those who need them, improve the understanding and management of disease, and give back to our communities through philanthropy and volunteerism. We give our best effort to our work, and we put people first. We’re looking for people who are determined to make life better for people around the globe. 

Competency Summary

We are seeking a HPC Performance engineer intern to join our team of scientists and engineers passionate about building the next generation of scientific machine learning (ML) frameworks. The High Performance Compute Engineer will support and advance LillyPod, Lilly's internal GPU compute cluster for AI/ML workloads.It uses Run:ai as its orchestration layer and supports three main workload types: interactive workspaces for development, batch training jobs, and distributed training across multiple nodes.

Key Objectives/Deliverables

  • Manage and optimize GPU cluster resources using the Run:ai orchestration platform, including scheduling, quota management, and workload prioritization
  • Support users running workspace (interactive), training (batch), and distributed training workloads across multi-node GPU environments
  • Design and implement computationally performant features for large scale, CUDA-backed ML training frameworks, using low level acceleration and scaling strategies such as kernel design, GPU porting, data structure innovations, distributed learning technologies
  • Optimize computational performance of wide range of business-critical ML models via accelerated hardware and software stack, as well as algorithmic improvements
  • Design and maintain shared and project-specific filesystems for large-scale data and model storage
  • Build and maintain container images via an internal registry, ensuring reproducibility and security of ML environments
  • Develop and maintain CLI tooling and automation for workload submission, monitoring, and lifecycle management
  • Implement and tune preemptible workload strategies to maximize cluster utilization

Minimum Position Requirements

  • Education Requirements: Masters/PhD

  • Strong Linux systems administration skills (networking, storage, process management)
  • Experience with Kubernetes and container orchestration in a GPU-accelerated environment
  • Deep knowledge and expertise of all kinds of high performance infra, including NVIDIA series clusters H200s & B200s.
  • Hands-on experience with Run:ai or similar ML workload schedulers (SLURM, PBS, Volcano)
  • Proficiency building and managing Docker/OCI container images and private registries
  • Familiarity with distributed training frameworks (PyTorch DDP, DeepSpeed, FSDP, Horovod)
  • Understanding of shared filesystem architectures (NFS, Lustre, GPFS/Spectrum Scale, or similar)
  • Experience with CLI tool development (Python, Go, or Bash)
  • Comfort working with ML/DL practitioners and translating their compute needs into infrastructure solutions

Lilly is dedicated to helping individuals with disabilities to actively engage in the workforce, ensuring equal opportunities when vying for positions. If you require accommodation to submit a resume for a position at Lilly, please complete the accommodation request form (https://careers.lilly.com/us/en/workplace-accommodation) for further assistance. Please note this is for individuals to request an accommodation as part of the application process and any other correspondence will not receive a response.

Lilly does not discriminate on the basis of age, race, color, religion, gender, sexual orientation, gender identity, gender expression, national origin, protected veteran status, disability or any other legally protected status.

#WeAreLilly