You will
Lead technical design and implementation of data engineering and MLOps solutions, ensuring best practices and high-quality deliverables.
Mentor and guide junior engineers, conducting code reviews and technical sessions to foster team growth.
Perform detailed analysis of raw data sources by applying business context and collaborate with cross-functional teams to transform raw data into curated & certified data assets for ML and BI use cases.
Create scalable and trusted data pipelines which generate curated data assets in centralized data lake/data warehouse ecosystems.
Monitor and troubleshoot data pipeline performance, identifying and resolving bottlenecks and issues.
Extract text data from a variety of sources (documents, logs, databases, web scraping) to support development of NLP/LLM solutions.
Collaborate with data science and data engineering teams to build scalable and reproducible machine learning pipelines for training and inference.
Lead development and maintenance of end-to-end MLOps lifecycle to automate machine learning solutions development and delivery.
Implement robust data drift and model monitoring frameworks across pipelines.
Develop real-time data solutions by creating new API endpoints or streaming frameworks.
Develop, test, and maintain robust tools, frameworks, and libraries that standardize and streamline the data & machine learning lifecycle.
Leverage public/private APIs for extracting data and invoking functionalities as required for use cases.
Collaborate with cross-functional teams of Data Science, Data Engineering, business units, and IT teams.
Create and maintain effective documentation for projects and practices, ensuring transparency and effective team communication.
Provide technical leadership and mentorship on continuous improvement in building reusable and scalable solutions.
Contribute to enhancing strategy for advanced data & ML engineering practices and lead execution of key initiatives of technical strategy.
Stay up-to-date with the latest trends in modern data engineering, machine learning & AI.
You have
Bachelor’s or Master’s degree with 8+ years of experience in Computer Science, Data Science, Engineering, or a related field.
5+ years of experience working with Python, SQL, PySpark, and bash scripts. Proficient in software development lifecycle and software engineering practices.
3+ years of experience developing and maintaining robust data pipelines for both structured and unstructured data to be used by Data Scientists to build ML Models.
3+ years of experience working with Cloud Data Warehousing (Redshift, Snowflake, Databricks SQL or equivalent) platforms and distributed frameworks like Spark.
2+ years of hands-on experience using Databricks platform for data engineering and MLOps, including MLFlow, Model Registry, Databricks Workflow, Job Clusters, Databricks CLI, and Workspace.
2+ years of experience leading a team of engineers and a track record of delivering robust and scalable data solutions with highest quality.
Solid understanding of machine learning lifecycle, data mining, and ETL techniques.
Experience with machine learning frameworks (scikit-learn, xgboost, Keras, PyTorch) and operationalizing models in production.
Proficiency in understanding REST APIs, experience using different types of APIs to extract data or perform functionalities.
Familiarity with Pythonic API development frameworks like Flask/FastAPI and containerization frameworks like Docker/Kubernetes.
Hands-on experience building and maintaining tools and libraries used by multiple teams across the organization (e.g., Data Engineering utility libraries, DQ Libraries).
Proficient in understanding and incorporating software engineering principles in design & development process.
Hands-on experience with CI/CD tools (e.g., Jenkins or equivalent), version control (Github, Bitbucket), orchestration (Airflow, Prefect or equivalent).
Excellent communication skills and ability to work and collaborate with cross-functional teams across technology and business.
Good to have
Understanding of Large Language Models (LLMs) and MLOps lifecycle for operationalizing LLM models.
Familiarity with GPU compute for model training or inference.
Familiarity with deep learning frameworks and deploying deep learning models for production use cases.
This position can be based in any of the following locations:
ChennaiCurrent Guardian Colleagues: Please apply through the internal Jobs Hub in Workday