Member of Technical Staff, Frontiers of Deep Learning Scaling
Listed on 2026-01-06
-
Software Development
Software Engineer, Machine Learning/ ML Engineer
Member of Technical Staff, Frontiers of Deep Learning Scaling About xAI
xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands‑on and to contribute directly to the company’s mission.
Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All engineers are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.
Pretrain team at xAI aims to answer the question:
How to scale up intelligence by scaling up compute effectively?
This question can be further broken down into two sub‑questions:
- What to scale up
- How to scale up
What to scale up:
- Next‑token prediction is a meaningful target for the time where online data is large enough, but model size can not grow as much. As we enter the new phase, model size is growing faster than data, therefore we need a new scaling paradigm.
- At xAI, our compute grows much faster than other companies. We believe scaling up effective compute / useful data is the best path to achieve next‑level intelligence.
- What is “effective compute” or “useful data”? This is the first question this role is expected to explore and answer. It could be solid data cleaning and scaling, could be discovering new knowledge via self‑improvement, could be a new learning paradigm like continual learning, could be unified models of text / code / images / videos understanding and generation, could be new model architectures / attention / non‑autoregressive models… Anything that has the potential to be the next scaling paradigm is open to exploration.
How to scale up:
- Remember we are aiming at several hundreds of millions GPU hours of training, any tiny training stability issue will ruin the big run.
- So this role also needs to explore how to do large‑scale and long‑time training. For example, most reasoning and postraining phases
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).