You will..
- Design and implement tools, pipelines, and metrics to accelerate the development of our AI-first autonomy system and generative AI simulator.
- Own the process, criteria, and tooling for efficiently finding interesting and relevant data across the petabytes of real world data that Waabi has collected
- Build high reliability systems for extracting and labelling the interesting data with various vendors and integrate it back into our system
- Work with both internal and third party stakeholders to define taxonomy, validation rules and success criteria for our labelling projects
- Design and manage the end-to-end deployment of data solutions to deliver high quality labelled data for various ML teams to use in experiments and model improvement
- Deploy open-set / embedding models to our production environment - empowering new search and curation modalities
- Champion engineering excellence, ensuring high-quality, well structured, and tested code.
- Contribute to project roadmap planning, prioritization, and delivery.
Qualifications:
- 4+ years of industry experience.
- Bachelor's in computer science, engineering, machine learning, or a related technical discipline.
- Proficient in Python programming and strong software engineering fundamentals with real-world experience writing high quality, well-structured, and well-tested code.
- A willingness and ability to learn new skills, technologies, and software libraries as required.
- Strong experience with data pipelines for large-scale processing and analysis.
- Strong communication and organizational skills
- Understanding of cloud job orchestration, monitoring, and instrumentation best-practices.
- Open-minded and collaborative team player with the willingness to help others.
- Passionate about self-driving technologies, solving hard problems, and creating innovative solutions.
Bonus Points:
- Experience with ML pipelines, including dataset curation, labelling, training and evaluation.
- Previous experience in self-driving technology or related fields.
- Familiar with linear algebra (projections, transforms) and 3D geometry.
- Experience with MapReduce frameworks (Apache Hadoop/Spark) or orchestration frameworks (Apache Airflow/Apache Beam/Google Dataflow/AWS Step Functions).
- Experience with with front end development
- Experience working with open-set / embedding models and deploying them in a production setting
- Experience working with infra as code (Terraform, CloudFormation, etc)