Voice AI Engineering Principal
About the Role
Zendesk is seeking an innovative and visionary Voice AI Engineering Director to lead and accelerate our voice and conversational AI initiatives. In this pivotal role, you will spearhead the development and deployment of cutting-edge AI/ML technologies focused on Speech and Natural Language Processing (NLP), shaping the future of voice-enabled customer experiences at scale.
You will oversee researchers innovating across Automatic Speech Recognition (ASR), Text-to-Speech (TTS), Large Language Models (LLM), and voice conversational systems, driving impactful solutions that power Zendesk’s intelligent voice products.
What You'll Do
Lead the research, design, and engineering of next-generation Voice AI solutions including noise-robust multilingual ASR, neural TTS, and advanced QA dialog systems fine-tuned with state-of-the-art pretrained models (e.g., BERT, GPT).
Build, mentor, and scale a high-performing AI/ML engineering team specialized in speech processing, NLP, and deep learning while fostering an innovative, research-driven culture.
Drive collaboration across research scientists, software engineers, and product teams to transform advanced AI models into robust, scalable production systems.
Oversee large-scale AI research and development projects, ensuring delivery of high-quality, real-world solutions optimized for diverse tasks and computing environments.
Architect and implement AI models leveraging deep learning algorithms such as DNNs, CNNs, RNNs, and Transformer-based architectures across speech and NLP pipelines.
Champion best practices in software development, including CI/CD, code reviews, version control (Git), and refactoring to support efficient and maintainable codebases.
Stay ahead of the curve by continuously researching and applying the latest breakthroughs in AI/ML to enhance Zendesk’s voice capabilities.
Collaborate with stakeholders to define technical vision, roadmap, and strategy for voice AI products that deliver superior user experiences and business impact.
Who You Are
Passionate about the frontiers of AI/ML and driven to apply breakthrough technologies to real-world voice and language problems.
Proven expertise developing and applying speech and NLP models, with extensive hands-on experience using DL frameworks such as PyTorch, TensorFlow, Keras, and Huggingface Transformers.
Deep knowledge of AI architectures including DNN, CNN, RNN, Transformers, and experience fine-tuning large pre-trained models (e.g., BERT, GPT).
Skilled in programming languages and tools including Python, C++, Java, R, Linux/Shell scripting, with strong engineering discipline in software development lifecycle.
Demonstrated leadership in building and guiding AI/ML teams, through complex research and engineering challenges.
Experience deploying voice AI systems in production, including ASR, diarization, TTS, NMT, and dialog systems with a focus on noise robustness and multilingual capabilities.
Track record of managing large-scale research projects with real-world impact, combining fundamental research with prototyping and product delivery.
Background in developing AI-driven speech technologies for complex domains such as autonomous pilot systems or court reporting is a highly valued asset.
Hold an M.S. in Engineering, Computer Science, or a related field, with a strong foundation in machine learning, speech processing, and on-device AI for real-time and low-power applications.
–
Alternative for a more hands-on role matching the team size now and in the future:
The Agentic Tribe is revolutionizing the voice assistance landscape with Gen3, a cutting-edge AI Agent system that is pushing the boundaries of conversational AI. Gen3 is a goal-oriented, dynamic, and truly conversational system capable of complex reasoning, planning, and adapting to user needs in real-time spoken dialogue.
As a Staff AI Agent Engineer & Team Lead specializing in Voice AI, you will be the definitive technical authority and hands-on leader for the Voice AI Agent platform. You will be responsible for defining the architecture, setting the technical direction for the team, leading major cross-functional initiatives, and mentoring senior engineers. This role requires an individual who can balance deep technical work with strategic leadership, ensuring our Voice AI system is not only robust and low-latency but also scalable, safe, and aligned with the company's long-term product vision.
Technical Leadership & Architecture
Architectural Ownership: Define the technical vision and architect the next generation of our voice-first AI Agent platform, ensuring it meets extreme requirements for low-latency, high availability, and scalability for millions of concurrent voice interactions.
Technical Roadmap: Own and drive complex, multi-quarter technical initiatives from concept to production, solving ambiguous or highly complex challenges that impact multiple engineering teams across the organization.
Core Systems Design: Lead the design and development of critical, real-time voice components, including the strategic selection and integration of best-in-class real-time Speech-to-Text (STT), Text-to-Speech (TTS), and Voice Activity Detection (VAD) services.
Define Standards: Establish and enforce engineering best practices, design patterns, and coding standards for Python-based voice agent development, focusing on robust state management, dynamic tool use, and sophisticated reasoning models (e.g., Tree-of-Thought, CoT).
Team Lead & Mentorship
Team Leadership: Provide technical leadership and guidance to a dedicated project team, including task delegation, daily technical direction, and ensuring high-quality, on-time project delivery.
Mentorship: Actively mentor Senior and mid-level engineers, fostering a culture of technical excellence, deep ownership, and continuous learning within the Voice AI team and the broader engineering organization.
Cross-Functional Strategy: Serve as the primary technical partner for Product Leadership, ML Science, and Infrastructure teams, aligning technical implementation plans with product strategy and influencing the long-term Voice AI roadmap.
Evaluation & Reliability
Evaluation Platform: Design, establish, and continuously improve the organizational platforms and methodologies for evaluating voice agent performance and behavior, setting key success metrics (e.g., WER, conversational naturalness, latency budget adherence), and driving iterative improvements across the Agentic Tribe.
Safety & Defense: Architect and implement advanced safety and reliability mechanisms, including robust prompt injection defenses, comprehensive LLM guardrails, sophisticated fallback strategies, and advanced error-handling to manage noisy audio input and speech recognition inaccuracies at scale.
10+ years of progressive experience in software engineering, with 4+ years focused on AI/ML applications, and 2+ years operating in a Staff, Principal, or equivalent technical leadership capacity.
Expertise in LLM-Oriented System Architecture: Proven ability to architect and lead the development of complex, multi-step, tool-using agents (e.g., LangChain, Autogen, custom orchestrators).
Mastery in Voice AI/Spoken Dialogue Systems: Extensive, hands-on experience building mission-critical, low-latency, streaming voice applications. This includes deep proficiency with:
Integrating and managing real-time STT/TTS models and APIs.
Advanced techniques for Voice Activity Detection (VAD) and noise suppression.
Architecting robust barge-in and interruption logic in real-time voice streams.
Platform & Deployment Expertise: Deep expertise in deploying complex, large-scale AI applications to cloud platforms (AWS, GCP, or Azure) using advanced infrastructure-as-code and CI/CD best practices. Proven experience optimizing LLM token budgets, latency, and cost through sophisticated model routing, caching (e.g., Redis), and quantization techniques.
Advanced ML & System Knowledge: Comprehensive understanding of foundational ML concepts, Retrieval-Augmented Generation (RAG) pipelines, vector databases, and advanced context management to ensure deterministic and accurate agent behavior in complex production environments.
Programming Mastery: Expert-level proficiency in Python and modern web frameworks (e.g., FastAPI, gRPC for streaming services).
M.S. or Ph.D. in Computer Science, NLP, Machine Learning, or a related technical field.
Experience with real-time streaming architectures, such as WebRTC or gRPC.
A track record of technical presentation, publication, or open-source contribution in the field of conversational AI or generative agents.
Experience driving organizational adoption of new technologies and influencing company-wide architectural decisions.
Hybrid: In this role, our hybrid experience is designed at the team level to give you a rich onsite experience packed with connection, collaboration, learning, and celebration - while also giving you flexibility to work remotely for part of the week. This role must attend our local office for part of the week. The specific in-office schedule is to be determined by the hiring manager.
The intelligent heart of customer experience
Zendesk software was built to bring a sense of calm to the chaotic world of customer service. Today we power billions of conversations with brands you know and love.
Zendesk believes in offering our people a fulfilling and inclusive experience. Our hybrid way of working, enables us to purposefully come together in person, at one of our many Zendesk offices around the world, to connect, collaborate and learn whilst also giving our people the flexibility to work remotely for part of the week.
As part of our commitment to fairness and transparency, we inform all applicants that artificial intelligence (AI) or automated decision systems may be used to screen or evaluate applications for this position, in accordance with Company guidelines and applicable law.
Zendesk is an equal opportunity employer, and we’re proud of our ongoing efforts to foster global diversity, equity, & inclusion in the workplace. Individuals seeking employment and employees at Zendesk are considered without regard to race, color, religion, national origin, age, sex, gender, gender identity, gender expression, sexual orientation, marital status, medical condition, ancestry, disability, military or veteran status, or any other characteristic protected by applicable law. We are an AA/EEO/Veterans/Disabled employer. If you are based in the United States and would like more information about your EEO rights under the law, please click here.
Zendesk endeavors to make reasonable accommodations for applicants with disabilities and disabled veterans pursuant to applicable federal and state law. If you are an individual with a disability and require a reasonable accommodation to submit this application, complete any pre-employment testing, or otherwise participate in the employee selection process, please send an e-mail to peopleandplaces@zendesk.com with your specific accommodation request.