More jobs:
Member of Technical Staff, Evals
Job in
San Francisco, San Francisco County, California, 94118, USA
Listed on 2026-06-04
Listing for:
Magic AI Inc.
Full Time
position Listed on 2026-06-04
Job specializations:
-
Software Development
Software Engineer, Data Scientist, Data Engineer
Job Description & How to Apply Below
About the role
Evals builds the internal platform that teams across Magic use to evaluate the performance of internal and external models. The team supports pre-training, post-training, data, inference, and product, and sits on the critical path of many of the company's most important decisions.
As a Member of Technical Staff on Evals, you will build both the platform and the evaluations themselves. You'll develop infrastructure for large-scale evaluations, data ablations, and dataset quality analysis, while designing and validating the methodologies used to measure model performance.
Sweating the details matters on this team. Many benchmarks, papers, and open-source evaluation frameworks contain subtle bugs or flawed assumptions that lead to misleading conclusions. We care deeply about correctness, reproducibility, and measurement quality.
Evals are essential to the success of the company. By building trustworthy evaluation systems, you will help Magic make better research decisions, build better datasets, and ship better products.
What you'll work on
* Build and maintain the internal evals platform used across Magic
* Design, implement, and validate eval tasks for pre-training, post-training, reinforcement learning, inference, and product systems
* Develop infrastructure for running large-scale evaluations
* Build systems to measure dataset quality and identify opportunities to improve training data
* Improve evaluation correctness, reproducibility, and reliability
* Audit and improve upon public benchmarks, evaluation methodologies, and open-source implementations
* Partner with research, data, inference, and product teams to define metrics that accurately reflect model quality
* Build tooling and frameworks that enable teams across Magic to make decisions based on trustworthy measurements
What we're looking for
* Experience building production systems, internal platforms, or developer infrastructure
* Experience working with machine learning systems, evaluation frameworks, data infrastructure, or research tooling
* Track record of owning technical projects end-to-end
* Skepticism toward results that cannot be reproduced, validated, or explained
* Ability to reason critically about benchmarks, metrics, and experimental methodology
* Experience designing, implementing, or operating systems that run at scale
* Comfortable navigating ambiguity and determining whether a measurement is actually capturing the behavior it claims to measure
* Excitement about helping researchers and engineers make better decisions through trustworthy measurements
Compensation, benefits, and perks (US)
* Annual salary range between $200K - $550K depending on experience
* Equity is a significant part of total compensation, in addition to salary
* 401(k) plan with 6% salary matching
* Generous health, dental, and vision insurance for you and your dependents
* Unlimited paid time off
* Visa sponsorship and relocation support for candidates moving to San Francisco
* A small, fast-moving, highly collaborative team working on frontier AI systems
Magic strives to be the place where high-potential individuals can do their best work. We value quick learning and grit just as much as skill and experience.
Our culture
* Integrity. Words and actions should be aligned
* Hands-on. At Magic, everyone is building
* Teamwork. We move as one team, not N individuals
* Focus. Safely deploy AGI. Everything else is noise
* Quality. Magic should feel like magic
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×