Contextualai

Member of Technical Staff (Research Engineer - LLM Systems & Performance)

Mountain View, CA Full Time

About Contextual AI

We're revolutionizing how AI Agents work by solving AI's most critical challenge: context. The right context at the right time unlocks the accuracy and production scale that enterprises leveraging AI require. Our enterprise AI development platform sits at the intersection of breakthrough AI research and practical developer needs. Our end-to-end platform allows AI developers to easily and accurately ingest and query documents from enterprise data sources and easily embed retrieval results into their business workflows.  

Contextual AI was founded by the pioneers of Retrieval-Augmented Generation (RAG), the foundational technique behind the context layer, connecting foundation models to current and relevant information. Backed by the industry's most forward-thinking venture capitalists, we're not just participating in the enterprise AI revolution, we're defining it. Join us in building a future where AI doesn't just answer questions, it transforms businesses.

About the role

As a a Member of Technical Staff specializing in Research Engineer – LLM Systems & Performance, you will be part of a small, high-impact team building and optimizing LLM systems end-to-end, from Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) pipelines to high-throughput Inference clusters in production. You will collaborate closely with researchers and engineers to develop advanced models and infrastructure for the context layer.

What you'll do

  • Implement and improve components of our SFT and RL training pipelines (e.g., Verl, SkyRL), including data loading, training loops, logging, and evaluation.

  • Contribute to LLM inference infrastructure (e.g., vLLM, SGLang), including batching, KV-cache management, scheduling, and serving optimizations.

  • Profile and optimize end-to-end performance (throughput, latency, compute/memory/bandwidth), using tools like Nsight and profilers to identify and fix bottlenecks.

  • Work with distributed training and inference setups using NCCL, NVLink, and data/tensor/pipeline/expert/context parallelism on multi-GPU clusters.

  • Help experiment with and productionize quantization (e.g., INT8, FP8, FP4, mixed-precision) for both training and inference.

  • Write and optimize GPU kernels using tools like CUDA or Triton, and leverage techniques such as FlashAttention and Tensor Cores where appropriate.

  • Collaborate with researchers to take ideas from paper → prototype → scaled experiments → production.

  • Write clean, well-tested, and well-documented code that can be shared across multiple teams (Research, Platform and Products).

What we're seeking

  • Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, or a related technical field (or equivalent practical experience).

  • Strong programming skills in Python.

  • Experience with at least one major ML framework: PyTorch or JAX.

  • Solid understanding of GPU computing fundamentals (threads/warps/blocks, memory hierarchy, bandwidth vs compute, etc.).

  • Familiarity with distributed training or inference concepts (e.g., model parallelism, collective communication, disaggregated serving, KV caching).

  • Interest in performance engineering: profiling, kernel fusion, memory layout, and end-to-end system efficiency.

  • Ability to work in a fast-paced environment, communicate clearly, and collaborate closely with other engineers and researchers.
Location: Mountain View, CA.
 
Salary Range for California Based Applicants: $170,000 - $200,000 + equity + benefits (actual compensation will be determined based on experience, location, and other factors permitted by law).

Equal Opportunity

Contextual AI is an equal opportunity employer and complies with all applicable federal, state, and local fair employment practices laws. All qualified applicants will receive consideration for employment without regard to race, color, religion, national origin, ancestry, sex, sexual orientation, gender, gender expression, gender identity, genetic information or characteristics, physical or mental disability, marital/domestic partner status, age, military/veteran status, medical condition, or any other characteristic protected by law.