Turing

Head of Data Quality - RL Gyms

San Francisco, California, United States Full Time

About Turing

Based in San Francisco, California, Turing is the world’s leading research accelerator for frontier AI labs and a trusted partner for global enterprises looking to deploy advanced AI systems. Turing accelerates frontier research with high-quality data, specialized talent, and training pipelines that advance thinking, reasoning, coding, multimodality, and STEM. For enterprises, Turing builds proprietary intelligence systems that integrate AI into mission-critical workflows, unlock transformative outcomes, and drive lasting competitive advantage.

Recognized by Forbes, The Information, and Fast Company among the world’s top innovators, Turing’s leadership team includes AI technologists from Meta, Google, Microsoft, Apple, Amazon, McKinsey, Bain, Stanford, Caltech, and MIT. Learn more at www.turing.com

Role Overview

Turing is looking for a Head of Data Quality, RL Environments to build and lead the quality function for all reinforcement learning (RL) environment and trajectory data used to train and evaluate models at frontier AI labs.

You will manage a team of Data Quality Leads who operate like researchers in a frontier AI lab—designing tasks, stress tests, and evaluation protocols for complex RL environments (simulated, real-world, and tool-based). Your role is to set the bar for what “high-quality RL environment data” means and ensure our environments, trajectories, rewards, and evaluations are robust, diverse, and aligned with cutting-edge GenAI and RL research.

You’ll bring together:

  • Deep understanding of RL environments, agents, and trajectories,
  • Prior experience with ML/AI / RL / GenAI systems, and
  • Strong organizational and people leadership

to create a research-grade quality organization for RL environments and agent interaction data.

Key Responsibilities

1. Own the RL Environment Data Quality Vision & Strategy

  • Define the end-to-end strategy for data quality across all RL environment–related projects:
    • Environment/task definitions
    • Reward functions and signals
    • Scenario generation and curriculum
    • Agent behavior trajectories and evaluations
  • Translate GenAI and agent/RL trends (e.g., tool-using agents, multi-step reasoning, multi-agent systems, simulated worlds, web environments) into actionable environment and data requirements.
  • Set clear quality standards, rubrics, and KPIs for environments and trajectories, calibrated to frontier AI research expectations.

2. Lead & Develop Data Quality Leads

  • Hire, manage, and mentor Data Quality Leads overseeing RL environment projects and annotation streams (e.g., trajectory labeling, human evaluations of agent behavior, reward calibration).
  • Build a culture where leads think and act like researchers:
    • Formulating hypotheses about what environments and tasks are needed
    • Designing experiments and ablations
    • Iterating based on empirical evidence
  • Provide technical and methodological guidance on:
    • Task and environment spec design
    • Reward design and evaluation frameworks
    • Quality review protocols for trajectories, states, and behaviors
  • Establish performance expectations, feedback loops, and growth paths for quality leads and extended quality teams.

3. Design Research-Grade Evaluation & Quality Systems for RL Environments

  • Oversee the design of evaluation frameworks for RL agents and environments, covering:
    • Task success metrics and reward adequacy
    • Robustness and generalization tests
    • Safety, constraint satisfaction, and failure mode analysis
  • Ensure quality leads apply rigorous experimental design:
    • Proper sampling of scenarios and environment configurations
    • A/B testing of reward functions, curriculum strategies, and environment variants
    • Statistically sound comparisons between agent versions and evaluation suites
  • Implement continual monitoring systems:
    • Environment correctness and stability (e.g., no broken tasks, bugs, or unintended shortcuts)
    • Drift detection in distribution of tasks, states, and trajectories
    • Root cause analysis for degradation in performance or evaluation signal quality

4. Translate AI & RL Research Trends into Environment and Data Requirements

  • Stay current on frontier AI research in:
    • RL, RLHF/RLAIF, and preference-based training
    • Tool and web agents, multi-step decision-making, and planning
    • Simulated environments and benchmarks for agents
  • Convert evolving research directions into concrete workstreams for quality leads:
    • New environment types, tasks, and stress tests
    • New annotation schemas for human feedback and preferences
    • Adversarial / long-tail scenarios to probe failure modes
  • Partner closely with researchers and engineers to co-design:
    • Benchmarks and testbeds for agents
    • Environment variants and curricula that unlock new capabilities
    • Human evaluation and preference data pipelines to guide training and fine-tuning

5. Partner Across Operations, Product, and Customers

  • Collaborate with Data Operations / Labeling / Simulation teams to make quality processes:
    • Operationally scalable across annotators, vendors, and environment types
    • Efficient while preserving research-grade rigor
  • Work with Product, Research, and Sales teams to:
    • Understand agent use cases, safety constraints, and reliability expectations
    • Define SLAs and data quality KPIs for environments, reward schemes, and evaluations
    • Present methodology, metrics, and findings to internal and external stakeholders
  • Represent the RL environment quality function in client and partner conversations, including:
    • Explaining how environments and tasks are designed and validated
    • Demonstrating evaluation rigor and coverage
    • Integrating feedback into future environment and data strategies

6. Build Tools, Processes, and Documentation

  • Partner with engineering and platform teams to influence internal tools for:
    • Environment/task configuration and versioning
    • Trajectory capture, replay, and annotation
    • Evaluation dashboards and monitoring systems
  • Standardize playbooks and SOPs for:
    • Environment onboarding and validation before production use
    • Reward function design and review
    • Human evaluation and annotation workflows for agent behavior
  • Foster knowledge sharing via:
    • Internal talks, best-practice documents, and postmortems
    • Calibration sessions for annotators and evaluators
    • Cross-team forums on RL environment design and evaluation

Minimum Qualifications

Educational / Technical Background

  • Bachelor’s degree in Computer Science, Mathematics, Engineering, or a related field; or equivalent practical experience.
  • Strong technical background, including experience with:
    • Python as a primary language
    • RL or simulation frameworks (e.g., OpenAI Gym / Gymnasium–style APIs, custom simulators, or game engines)

Experience

  • 7+ years total experience in software engineering, ML/AI, RL, simulation, or related fields.
  • 3+ years managing technical teams (e.g., research, data science, RL / simulation, data quality, or engineering).
  • Hands-on experience with ML/AI systems, with a strong preference for:
    • RL, RLHF/RLAIF, or agent-like systems (tool-using, web, or embodied agents)
    • Environment or benchmark design, or large-scale agent evaluation
  • Prior exposure to data annotation / human feedback / human evaluation processes, including:
    • Designing rubrics and tasks for human raters
    • Working with preference data or trajectory labeling

Knowledge & Skills

  • High-level understanding of modern GenAI and RL / agents trends, such as:
    • LLM-based agents interacting with tools or environments
    • Reward shaping, curriculum learning, and preference modeling
    • Safety, alignment, and robustness for agents in complex environments
  • Strong grasp of data and environment quality principles:
    • Environment correctness, coverage, and diversity
    • Reward design pitfalls and reward hacking detection
    • Human evaluation quality, calibration, and inter-rater reliability
  • Ability to read ML/RL/AI research papers and translate them into:
    • New environment or task requirements
    • Evaluation and benchmarking strategies
    • Concrete annotation and quality-control workflows
  • Excellent communication and leadership skills; comfortable setting direction and making tradeoff decisions in ambiguous, fast-changing domains.

Nice To Have

  • Graduate degree (MS/PhD) in Computer Science, Machine Learning, Robotics, or related field.
  • Experience working in or closely with a research lab or frontier AI organization focused on RL, agents, or aligned systems.
  • Direct experience with:
    • Designing RL benchmarks, simulators, or environment suites
    • RLHF/RLAIF pipelines or large-scale human feedback collection
    • Multi-agent or multi-task environments
  • Familiarity with game engines or simulation platforms (e.g., Unity, Unreal, MuJoCo, Isaac, Habitat, or similar).
  • Background in statistics and experimental design, especially for:
    • Human feedback experiments
    • A/B testing of environment or reward variants
  • Experience in high-growth startup or similarly dynamic environments.

 

How We Work

  • High-ownership, low-drama execution.
  • Ship small, validate fast, scale what works.
  • Ruthless clarity on quality definitions and measurement.

Values:

  • We are client first: We put our clients at the center of everything we do, because their success is the ultimate measure of our value.
  • We work at Start-Up Speed: We move fast, stay agile and favor action because momentum is the foundation of perfection
  • We are Al forward: We help our clients build the future of Al and implement it in our own roles and workflow to amplify productivity.

Advantages of joining Turing:

  • Amazing work culture (Super collaborative & supportive work environment; 5 days a week)
  • Awesome colleagues (Surround yourself with top talent from Meta, Google, LinkedIn etc. as well as people with deep startup experience)
  • Competitive compensation
  • Flexible working hours

Don’t meet every single requirement? Studies have shown that women and people of color are less likely to apply to jobs unless they meet every single qualification. Turing is proud to be an equal opportunity employer. We do not discriminate on the basis of race, religion, color, national origin, gender, gender identity, sexual orientation, age, marital status, disability, protected veteran status, or any other legally protected characteristics. At Turing we are dedicated to building a diverse, inclusive and authentic workplace  and celebrate authenticity, so if you’re excited about this role but your past experience doesn’t align perfectly with every qualification in the job description, we encourage you to apply anyways. You may be just the right candidate for this or other roles.

For applicants from the European Union, please review Turing's GDPR notice here.