Toyota research institute

Robotics Machine Learning Engineer - Platforms for Vision Language Action Foundation Models

Los Altos, CA Full Time
At Toyota Research Institute (TRI), we’re on a mission to improve the quality of human life. We’re developing new tools and capabilities to amplify the human experience. To lead this transformative shift in mobility, we’ve built a world-class team in Automated Driving, Energy & Materials, Human-Centered AI, Human-Interactive Driving, and Robotics.

We are looking for a machine learning engineer to develop our infrastructure and support researchers in the development of foundation models for robotics.

The Mission
We are working to create general-purpose robots capable of accomplishing a wide variety of dexterous tasks. To do this, our team is building general-purpose machine learning foundation models for dexterous robot manipulation. These models, which we call Large Behavior Models (LBMs), use generative AI techniques to produce robot action from sensor data and human request. To accomplish this, we are creating a large curriculum of embodied robot demonstration data and combining that data with a rich corpus of internet-scale text, image, and video data. We are also using high-quality simulation to augment real world robot data with procedurally-generated synthetic demonstrations.

The Team
The Robotics Machine Learning Team’s charter is to push the frontiers of research in robotics and machine learning to develop the future capabilities required for general-purpose robots able to operate in realistic environments such as homes or factories.

The Job
We have several research thrusts under our broad mission, and we are looking for a machine learning engineer to contribute to some of the following objectives:

Hardware Infrastructure: Develop our hardware platform, making sure the robots and software stack are state-of-the-art, operational, and continuously improved with new functionalities. This includes the robot hardware (YAM, Franka, and custom), the sensors (monocular, stereo, depth, etc), the robot/computer interface, the human/robot interface, the data logging, and the controls.

Inference & Deployment: Build APIs and systems for high-throughput inference and logging in simulation and on real robot platforms. Enable low-latency model serving and robust policy–environment communication.

Evaluation & Monitoring: Design metrics pipelines for quantitative and qualitative evaluation. Build tools for experiment tracking, logging, visualization, and leaderboard management using systems like Weights & Biases, MLflow, or ClearML.

Data Infrastructure: Build scalable pipelines for heterogeneous multimodal data (images, text, video, touch, depth, proprioception). Work with data storage, versioning, streaming, and visualization systems optimized for throughput and accessibility.

The machine learning engineer who joins our team will be expected to create working code, and interact frequently with researchers. They will run experiments with both simulated and real (physical) robots, and participate in publishing the work to peer-reviewed venues. We’re looking for an engineer who is comfortable working with multiple robotic embodiments and stacks as well as a growing dynamic corpus of robot data.