AI Research Engineer Job San Francisco area,California USA,Software Development

Job Title:

AI Research Engineer

About Xterra

Xterra is a Khosla Ventures-backed company building AI agents that reason about complex scientific problems. We’re not a wrapper around existing models, we’re training our own foundation models on top of large-scale proprietary datasets. This is a rare intersection of frontier AI and real-world scientific impact.

Xterra is still in stealth mode. Please reach out to us for a full picture.

The Role

Xterra is building the infrastructure behind a new generation of geospatial and geophysics intelligence systems — and we're looking for a Research Engineer to help build it. You'll work across the stack that supports our research: the agents that power our products, the data systems that feed them, the evaluation frameworks that tell us whether they're working, and the simulation environments where we test what's next.

This is a broad, high-ownership role for someone who thinks like a software engineer first and is energized by working at the edge of applied AI. You'll design and ship systems that other engineers and researchers depend on, and you'll have significant latitude in how those systems are built.

What You'll Work On

Our research engineers don't sit in a single lane. On any given month, you might be:

Building agent infrastructure — the runtimes, tool interfaces, memory systems, and orchestration layers that our agents are built on top of. As our agent capabilities grow, so does the surface area of what this infrastructure needs to support.
Developing evaluation frameworks — designing how we measure agent performance, building the harnesses that run evals at scale, and making sure our researchers can trust the signal they're getting. Evaluation is a first-class engineering problem here, not an afterthought.
Supporting simulation efforts — building and extending the simulated environments we use to train, test, and stress our systems.
Designing data systems — building pipelines and infrastructure that handle geospatial, geophysics, and sensor data s includes using AI and agent-based approaches to automate ingestion, labeling, quality monitoring, and schema adaptation — so that adding a new data source is a configuration change, not a project.
Working as a team — working directly with both AI researchers and domain experts (geoscientists and others) to solve one of the hardest problems in earth science.

What We're Looking For

Strong software engineering fundamentals. When AI fails, you can dive in to debug and write software from scratch if necessary. You care about design, you think about interfaces before implementations, and you can navigate ambiguity without losing momentum.
Experience building agent systems. You've built, shipped, or meaningfully contributed to software agents — not just called an LLM API. You understand tool use, orchestration, context management, and the failure modes that come with non-deterministic systems.
Product sense and autonomy. You can take a vague problem, figure out what's actually worth building, and ship it. You don't need a PM to hand you a spec.
Comfort with AI-native development. You use modern AI tools fluently in your own workflow and have opinions about where they help and where they don't.
Range. You're as comfortable building data infrastructure as you are building evaluation tooling or agent runtimes — or you're excited to become comfortable with all of them.

Nice to Have

Background in ML infrastructure, managing cloud infrastructure, and distributed systems
Experience with geospatial, geophysics, or time-series sensor data
Experience with pipeline orchestration tooling (Airflow, dbt, Prefect, etc.) and modern data stack components
Experience with multimodal VLMs, RL fine-tuning, and evaluation methodology

What Success Looks Like

A year in, the infrastructure you've built is load-bearing. Our researchers can run evals without thinking about the harness. Our agents run on runtimes you designed. New data sources come online in hours instead of weeks. You've made the rest of the team faster, and you're working on problems that didn't exist when you started.

#J-18808-Ljbffr