Quantiphi

Architect - Machine Learning (MLOps Specialist)

IN KA Bengaluru Full time

While technology is the heart of our business, a global and diverse culture is the heart of our success. We love our people and we take pride in catering them to a culture built on transparency, diversity, integrity, learning and growth.


If working in an environment that encourages you to innovate and excel, not just in professional but personal life, interests you- you would enjoy your career with Quantiphi!

Role : Architect - Machine Learning

Experience: 7-14 Years

Location: Mumbai/Bangalore

Must have skills & Qualifications:

  • 8+ years working in ML/AI engineering or MLOps roles with strong architecture exposure.

  • Strong expertise in AWS cloud-native ML stack, including: EKS (primary), ECS, Lambda, API Gateway, CI/CD (CodeBuild/CodePipeline or equivalent)

  • Hands-on experience with at least one major MLOps toolset and awareness of alternatives: MLflow, Kubeflow, SageMaker Pipelines, Airflow, BentoML, KServe, Seldon

  • Deep understanding of model lifecycle management (training → registry → deployment → monitoring).

  • Experience implementing or supporting LLMOps pipelines, including: prompt versioning, evaluation metrics, automation frameworks

  • Deep understanding of ML lifecycle: data ingestion, feature engineering, training, evaluation, model packaging, CI/CD, drift detection, monitoring, and governance.

  • Strong experience with AWS SageMaker (Training, Processing, Batch Transform, Pipelines, Feature Store, Model Registry, Model Monitor).

  • Experience implementing ML CI/CD pipelines including automated training, testing, validation, model promotion, and endpoint deployment.

  • Ability to build dynamic and versioned pipelines using SageMaker Pipelines, Step Functions, or Kubeflow.

  • Strong SQL and data transformation experience using Snowflake, Databricks, Spark, or EMR.

  • Experience with feature engineering pipelines and Feature Store management (SageMaker or Feast).

  • Understanding of lineage tracking: training data snapshot, feature versions, code versioning, metadata tracking, reproducibility.

  • Hands-on experience with Bedrock, OpenAI, Anthropic, or Llama models.

  • Experience with CloudWatch, SageMaker Model Monitor, Prometheus/Grafana, or Datadog.

  • Strong foundation in Python and cloud-native development patterns.

  • Solid understanding of security best practices, IAM, secrets management, and artifact governance.

Good to have skills:

  • Experience with vector databases, RAG pipelines, or multi-agent AI systems.

  • Exposure to DevOps and infrastructure-as-code (Terraform, Helm, CDK).

  • Hands-on understanding of model drift detection, A/B testing, canary rollouts, and blue-green deployments.

  • Familiarity with Observability stacks (Prometheus, Grafana, CloudWatch, OpenTelemetry).

  • Knowledge of Lakehouse (Delta/Iceberg/Hudi) architecture.

  • Ability to translate business goals into scalable AI/ML platform designs.

  • Strong communication and cross-team collaboration skills.

  • Ability to guide engineering teams through technical uncertainty and design choices.

Key Responsibilities:

  • Architect and implement the MLOps strategy for the EVOKE Phase-2 programme, ensuring alignment with the project proposal and delivery roadmap.

  • Design and own enterprise-grade ML/LLM pipelines covering model training, validation, deployment, versioning, monitoring, and CI/CD automation.

  • Build container-oriented ML platforms (EKS-first) while evaluating alternative orchestration tools with similar capabilities (Kubeflow, SageMaker, MLflow, Airflow, etc.).

  • Implement hybrid MLOps + LLMOps workflows, including prompt/version governance, evaluation frameworks, and monitoring for LLM-based systems.

  • Serve as a technical authority across multiple internal and customer projects, not limited to EVOKE, contributing architectural patterns, best practices, and reusable frameworks.

  • Enable observability, monitoring, drift detection, lineage tracking, and auditability across ML/LLM systems.

  • Collaborate with cross-functional teams — data engineering, platform, DevOps, and client stakeholders — to deliver production-ready ML solutions.

  • Ensure all solutions adhere to security, governance, and compliance expectations, particularly around handling cloud services, Kubernetes workloads, and MLOps tools.

  • Conduct architecture reviews, troubleshoot complex ML system issues, and guide teams through implementation across cloud-native ML platforms.

  • Mentor engineers and provide guidance on modern MLOps tools, platform capabilities, and best practices.

If you like wild growth and working with happy, enthusiastic over-achievers, you'll enjoy your career with us!