Xai

Member of Technical Staff, Frontiers of Deep Learning Scaling

Palo Alto, CA Full Time

About xAI

xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All engineers are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.

About the Role

Pretrain team at xAI aims to answer the question: How to scale up intelligence by scaling up compute effectively? 

This question can be further broken down into two sub-questions:

  • What to scale up
  • How to scale up

What to scale up:

  • Next-token prediction is a meaningful target for the time where online data is large enough, but model size can not grow as much. As we enter the new phase, model size is growing faster than data, therefore we need a new scaling paradigm.
  • At xAI, our compute grows much faster than other companies. We believe scaling up effective compute / useful data is the best path to achieve next-level intelligence.
  • What is  “effective compute” or “useful data”? This is the first question this role is expected to explore and answer. It could be solid data cleaning and scaling, could be discovering new knowledge via self-improvement, could be a new learning paradigm like continual learning, could be unified models of text / code / images / videos understanding and generation, could be new model architectures / attention / non-autoregressive models... Anything that has the potential to be the next scaling paradigm is open to exploration.

How to scale up:

  • Remember we are aiming at several hundreds of millions GPU hours of training, any tiny training stability issue will ruin the big run.
  • So this role also needs to explore how to do large-scale and long-time training. For example, most reasoning and postraining phases <1k steps, think about scaling to 10k to 100k steps of training and the model still stably improves like a typical successful pretraining run. This is the goal we also want to achieve.

Tech Stack

  • Python
  • JAX
  • PyTorch
  • Rust

Focus

  • Bold and visional ideas, remember you are at the frontier of deep learning scaling, you have more compute than anywhere else.
  • Hands-on works, all xAI members are engineers. 
  • Full-stack ownership, starting from eval building and new data preparation, to implement your crazy ideas, to evaluating results, then redesign next experiments and iterate.
  • Collaboration, communication, spark new ideas.
  • Solid execution and results analysis.
  • Fast iteration, consistent and gradual improvement.

Ideal Experience

  • Strong engineering skills with passion on model-hardware co-design.
  • Expert in ML and large model scaling, familiar with all kinds of scaling laws.
  • Familiar with distributed training, multi-GPU neural network training and experience on optimizing ML training efficiency.

Location

The role is based in the Bay Area [San Francisco and Palo Alto]. Candidates are expected to be located near the Bay Area or open to relocation.

Annual Salary Range

$180,000 - $440,000 USD

Benefits

Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.

xAI is an equal opportunity employer. For details on data processing, view our Recruitment Privacy Notice.