Waabi

Software Engineer, Labelling, Data & Automation

Toronto, ON / San Francisco, CA / Pittsburgh, PA / Remote US & Canada Full Time
Waabi, founded by AI visionary Raquel Urtasun, is the leader in Physical AI. With a world-class team, we're unlocking the next era of autonomous transportation with technology that's powering commercial autonomous trucks and robotaxis. Waabi is backed by and partners with world leaders in AI, automotive, logistics, and deep tech.

With offices in Toronto, San Francisco, Dallas, and Pittsburgh, Waabi is growing quickly and looking for diverse, innovative and collaborative candidates who want to impact the world in a positive way. To learn more visit: www.waabi.ai

As a Software Engineer on our Labelling and Data Automation team, you will build pipelines, tools, and workflows to accelerate the development of our AI-first self-driving technology.  The Labelling & Data Automation team plays a critical role in Waabi’s training pipeline.  We are one of the first steps in the process of training machine learning models and are responsible for building practical, scalable tools and automation for extracting interesting and relevant data, creating high-quality ground truth labels, and providing this data to our expert machine learning scientists and engineers.
 

You will..

- Design and implement tools, pipelines, and metrics to accelerate the development of our AI-first autonomy system and generative AI simulator. 

- Own the process, criteria, and tooling for efficiently finding interesting and relevant data across the petabytes of real world data that Waabi has collected

- Build high reliability systems for extracting and labelling the interesting data with various vendors and integrate it back into our system

- Work with both internal and third party stakeholders to define taxonomy, validation rules and success criteria for our labelling projects

- Design and manage the end-to-end deployment of data solutions to deliver high quality labelled data for various ML teams to use in experiments and model improvement

- Deploy open-set / embedding models to our production environment - empowering new search and curation modalities

- Champion engineering excellence, ensuring high-quality, well structured, and tested code.

- Contribute to project roadmap planning, prioritization, and delivery.

 

Qualifications: 

- 4+ years of industry experience.

- Bachelor's in computer science, engineering, machine learning, or a related technical discipline.

- Proficient in Python programming and strong software engineering fundamentals with real-world experience writing high quality, well-structured, and well-tested code.  

- A willingness and ability to learn new skills, technologies, and software libraries as required. 

- Strong experience with data pipelines for large-scale processing and analysis.

- Strong communication and organizational skills 

- Understanding of cloud job orchestration, monitoring, and instrumentation best-practices.

- Open-minded and collaborative team player with the willingness to help others.

- Passionate about self-driving technologies, solving hard problems, and creating innovative solutions.

 

Bonus Points:

- Experience with ML pipelines, including dataset curation, labelling, training and evaluation. 

- Previous experience in self-driving technology or related fields.

- Familiar with linear algebra (projections, transforms) and 3D geometry.

- Experience with MapReduce frameworks (Apache Hadoop/Spark) or orchestration frameworks (Apache Airflow/Apache Beam/Google Dataflow/AWS Step Functions).

- Experience with with front end development

- Experience working with open-set / embedding models and deploying them in a production setting

- Experience working with infra as code (Terraform, CloudFormation, etc)