Guardian Life

Lead Data Engineer

Chennai Full time

Job Description:

You will 

  • Lead technical design and implementation of data engineering and MLOps solutions, ensuring best practices and high-quality deliverables. 

  • Mentor and guide junior engineers, conducting code reviews and technical sessions to foster team growth. 

  • Perform detailed analysis of raw data sources by applying business context and collaborate with cross-functional teams to transform raw data into curated & certified data assets for ML and BI use cases. 

  • Create scalable and trusted data pipelines which generate curated data assets in centralized data lake/data warehouse ecosystems. 

  • Monitor and troubleshoot data pipeline performance, identifying and resolving bottlenecks and issues. 

  • Extract text data from a variety of sources (documents, logs, databases, web scraping) to support development of NLP/LLM solutions. 

  • Collaborate with data science and data engineering teams to build scalable and reproducible machine learning pipelines for training and inference. 

  • Lead development and maintenance of end-to-end MLOps lifecycle to automate machine learning solutions development and delivery. 

  • Implement robust data drift and model monitoring frameworks across pipelines. 

  • Develop real-time data solutions by creating new API endpoints or streaming frameworks. 

  • Develop, test, and maintain robust tools, frameworks, and libraries that standardize and streamline the data & machine learning lifecycle. 

  • Leverage public/private APIs for extracting data and invoking functionalities as required for use cases. 

  • Collaborate with cross-functional teams of Data Science, Data Engineering, business units, and IT teams. 

  • Create and maintain effective documentation for projects and practices, ensuring transparency and effective team communication. 

  • Provide technical leadership and mentorship on continuous improvement in building reusable and scalable solutions. 

  • Contribute to enhancing strategy for advanced data & ML engineering practices and lead execution of key initiatives of technical strategy. 

  • Stay up-to-date with the latest trends in modern data engineering, machine learning & AI. 

 

You have 

  • Bachelor’s or Master’s degree with 8+ years of experience in Computer Science, Data Science, Engineering, or a related field. 

  • 5+ years of experience working with Python, SQL, PySpark, and bash scripts. Proficient in software development lifecycle and software engineering practices. 

  • 3+ years of experience developing and maintaining robust data pipelines for both structured and unstructured data to be used by Data Scientists to build ML Models. 

  • 3+ years of experience working with Cloud Data Warehousing (Redshift, Snowflake, Databricks SQL or equivalent) platforms and distributed frameworks like Spark. 

  • 2+ years of hands-on experience using Databricks platform for data engineering and MLOps, including MLFlow, Model Registry, Databricks Workflow, Job Clusters, Databricks CLI, and Workspace. 

  • 2+ years of experience leading a team of engineers and a track record of delivering robust and scalable data solutions with highest quality. 

  • Solid understanding of machine learning lifecycle, data mining, and ETL techniques. 

  • Experience with machine learning frameworks (scikit-learn, xgboost, Keras, PyTorch) and operationalizing models in production. 

  • Proficiency in understanding REST APIs, experience using different types of APIs to extract data or perform functionalities. 

  • Familiarity with Pythonic API development frameworks like Flask/FastAPI and containerization frameworks like Docker/Kubernetes. 

  • Hands-on experience building and maintaining tools and libraries used by multiple teams across the organization (e.g., Data Engineering utility libraries, DQ Libraries). 

  • Proficient in understanding and incorporating software engineering principles in design & development process. 

  • Hands-on experience with CI/CD tools (e.g., Jenkins or equivalent), version control (Github, Bitbucket), orchestration (Airflow, Prefect or equivalent). 

  • Excellent communication skills and ability to work and collaborate with cross-functional teams across technology and business. 

 

 

 

Good to have 

  • Understanding of Large Language Models (LLMs) and MLOps lifecycle for operationalizing LLM models. 

  • Familiarity with GPU compute for model training or inference. 

  • Familiarity with deep learning frameworks and deploying deep learning models for production use cases. 

Location:

This position can be based in any of the following locations:

Chennai

Current Guardian Colleagues: Please apply through the internal Jobs Hub in Workday