ROLE SUMMARY

As a Lead Responsible AI (RAI) Engineer, you will lead the design, implementation, and operationalization of responsible AI, safety, quality, evaluation, and governance practices across GenAI and agentic AI solutions. This role sits at the intersection of AI engineering, quality engineering, governance, and production operations, ensuring that AI systems are not only functional, but also safe, reliable, auditable, and fit for enterprise use.

You will work closely with AI engineers, architects, Team Leads, product owners, integration teams, and platform teams to embed responsible AI controls into the full product lifecycle. This role requires strong understanding of LLM systems, RAG pipelines, agentic workflows, prompt safety, guardrails, testing strategies, observability, and enterprise risk considerations. It is not a purely policy or compliance role — it is a hands-on technical leadership role that ensures AI systems can be trusted in production.

The ideal candidate will combine technical depth in AI-enabled systems, practical experience with testing and quality frameworks, and the ability to guide teams toward safe, scalable, and supportable implementation patterns.

KEY RESPONSIBILITIES

Lead the design and implementation of responsible AI, safety, and quality engineering practices for GenAI and agentic AI products
Define and operationalize AI validation strategies covering functional correctness, factual reliability, hallucination risk, retrieval quality, prompt safety, agent behavior, tool use, and failure handling
Establish test approaches for LLM-powered applications, RAG systems, agentic workflows, multi-step reasoning, tool orchestration, and context-sensitive AI interactions
Design and implement evaluation frameworks for relevance, safety, groundedness, consistency, latency, token usage, and business outcome alignment
Work with AI engineers and architects to embed guardrails, prompt controls, model usage boundaries, escalation paths, fallback strategies, and safety-oriented design patterns
Drive adoption of governance and auditability practices across AI solution development, including development of automated test scripts, enable traceability, review checkpoints, risk controls, and evidence collection
Partner with engineering and platform teams to implement AI observability, logging, trace analysis, usage monitoring, cost awareness, and incident diagnostics
Review solution designs to ensure they account for responsible AI, data sensitivity, model behavior risks, compliance expectations, and operational resilience
Guide teams on how to test and validate agentic AI systems, including state management, tool-calling reliability, context integrity, multi-agent coordination, and autonomous decision boundaries
Contribute to the definition of engineering standards for prompt lifecycle management, evaluation automation, red-teaming, adversarial testing, and regression prevention
Collaborate with product, process, and engineering stakeholders to balance innovation speed with risk management, trust, and enterprise readiness
Mentor engineers and quality professionals on AI testing, safety validation, responsible AI engineering practices, and production monitoring approaches
Contribute reusable assets such as test harnesses, prompt evaluation templates, governance checklists, safety review frameworks, red-team patterns, and validation accelerators

Required Qualifications

8+ years of experience in software engineering, AI engineering, quality engineering, test engineering, AI governance, or related technical roles, including strong experience working with AI-enabled or ML-driven systems
Proven experience designing or implementing quality, safety, validation, governance, risk, compliance, and auditability practices for AI-enabled systems in enterprise or production environments
Strong understanding of GenAI and agentic AI systems, including LLM workflows, retrieval-augmented generation (RAG), prompt engineering, model / tool interaction, context-aware behavior, and the failure modes associated with autonomous or semi-autonomous AI systems
Practical knowledge of agentic AI patterns, including orchestration workflows, tool-calling behavior, context management, memory / state handling, and multi-step AI interactions
Experience defining and operationalizing test and evaluation strategies for AI systems, including functional testing, non-functional testing, hallucination analysis, retrieval validation, regression testing, groundedness evaluation, red-teaming, and scenario-based validation
Strong understanding of responsible AI concerns, including model safety, fairness, explainability, transparency, content controls, enterprise risk management, data sensitivity, human oversight, and governance-by-design
Hands-on familiarity with enterprise responsible AI and governance frameworks such as NIST AI RMF and related GenAI / AI risk management approaches, and the ability to translate such frameworks into practical engineering controls, review criteria, and release-readiness expectations
Practical experience with fairness and bias evaluation tooling such as Fairlearn, AIF360, or equivalent approaches used to assess fairness, bias, or model behavior in enterprise environments
Hands-on experience with LLM evaluation, tracing, and quality analysis tools such as LangSmith, TruLens, DeepEval, or equivalent frameworks used for prompt evaluation, trace analysis, groundedness checks, workflow quality review, and regression monitoring
Familiarity with runtime safeguard and guardrail tooling such as Guardrails AI, NeMo Guardrails, or equivalent approaches used to enforce policy, constrain unsafe behavior, validate outputs, and improve runtime reliability of GenAI or agentic AI systems
Practical experience with enterprise AI governance, monitoring, and deployment tooling such as Azure AI Content Safety, Azure Monitor, Application Insights, OpenTelemetry, MLflow, or equivalent platforms used for lifecycle tracking, observability, reproducibility, monitoring, and enterprise governance of AI systems
Familiarity with broader enterprise AI governance and model oversight platforms such as Vertex AI monitoring, watsonx.governance, or equivalent governance and monitoring ecosystems, especially where multiple model, policy, and auditability controls are required
Experience working with test automation, validation frameworks, CI/CD-driven quality controls, and engineering delivery tooling such as Azure DevOps, GitHub, Git, Python, pytest, API testing tools, automated release controls, and scenario-based validation approaches (or equivalent tools and frameworks)
Ability to work closely with AI engineers, architects, Team Leads, and product teams to convert responsible AI principles into implementable engineering controls, validation strategies, safety checks, and production monitoring practices
Strong SDLC ownership mindset, including testing strategy, release quality, production monitoring, governance checkpoints, incident learning, audit readiness, and continuous improvement
Strong communication and stakeholder alignment skills, including the ability to discuss risk, quality, safety, governance, and production-readiness topics with both technical and non-technical stakeholders

Preferred Qualifications

Experience testing or governing agentic AI workflows involving multi-step reasoning, tool orchestration, memory/context management, or multi-agent coordination
Exposure to Model Context Protocol (MCP), agent-to-agent (A2A) interaction patterns, or equivalent context-sharing and distributed agent communication approaches
Experience working with RAG evaluation, retrieval quality measurement, grounding validation, chunking / retrieval strategy testing, or vector search quality assessment
Familiarity with Azure AI Search, Pinecone, Weaviate, FAISS, or equivalent retrieval and vector-enabled platforms
Experience contributing reusable safety controls, evaluation accelerators, governance playbooks, review templates, or AI quality engineering frameworks
Familiarity with token usage analytics, cost controls, prompt safety checks, output filtering, and human-in-the-loop escalation models
Experience supporting technical direction for engineering teams through quality reviews, design reviews, validation sign-offs, and mentoring
Experience working in a build-own-operate product model, where long-term supportability, reliability, and controlled evolution of AI systems are critical
Ability to communicate technical risks and responsible AI decisions clearly to engineering leaders, product stakeholders, and governance team

RAI Engineer

ROLE SUMMARY

KEY RESPONSIBILITIES

Required Qualifications

Preferred Qualifications

Related Jobs

Culinary Lead (Sur La Table)

Client Care Analyst - International (Remote)

Installation Coordinator (Urgently Hiring) - Portland, OR

Safety Administrative Specialist

Services to Senior Refugees Program Specialist #2025687

Associate Director, Programmatic (Kia)