Zoox

Machine Learning Engineer - Multi-Modality Foundation Model

Foster City, CA / Boston, MA Full Time
The Perception team is pioneering the development of a multi-modality foundation model to drive the next generation of autonomous system intelligence.  As a Multi-modality Foundation Model Engineer, you will focus on building highly efficient, production-ready multi-modality models. We are looking for experts who have hands-on experience building multi-modality foundation models—whether that involves AV-centric modalities (Vision, LiDAR, Radar) or broader domains (Vision, Language, Text, Audio). You will design, train, and deploy these models using Knowledge Distillation (KD) to transfer capabilities from large-scale proprietary teacher models to efficient student models capable of real-time, on-vehicle inference.