×
Register Here to Apply for Jobs or Post Jobs. X

Founding Agent Harness Engineer

Job in Palo Alto, Santa Clara County, California, 94306, USA
Listing for: Model AI
Full Time position
Listed on 2026-06-24
Job specializations:
  • Software Development
    AI Engineer (Applied/Software), AI Reliability/ Performance Engineer, Backend Developer, AI QA / Validation Engineer
Salary/Wage Range or Industry Benchmark: 125000 - 150000 USD Yearly USD 125000.00 150000.00 YEAR
Job Description & How to Apply Below

Founding Agent Systems Engineer

Location:

Onsite in Palo Alto
Compensation:
Competitive Salary + Equity

About Model AI

Model AI is building the infrastructure and application stack for the next generation of agentic AI systems.

We believe future AI applications will involve many agents working together: using tools, sharing context, evaluating their own behavior, modifying systems, and improving over time. On top of our Agent Cloud infrastructure, we are building an agentic system: an application layer that can continuously optimize models, codebases, workflows, and software systems.

In the near term, this means building agent systems that help codebases evolve with minimal human supervision: cleaner abstractions, stronger tests, lower technical debt, faster iteration, and better engineering velocity. Over time, we believe the same approach can extend to cloud optimization, database optimization, enterprise workflows, and more general applications.

About This Role

We are looking for a Founding Agent Systems Engineer to build the agent harnesses, evaluation systems, workflows, and feedback loops behind our product.

This role is ideal for someone who is strong in Python and systems engineering, understands LLM agents and evaluation, and enjoys turning research ideas into working products. You will work on the systems that allow agents to collaborate, inspect code, make changes, run tests, measure progress, and improve over time.

This is a deeply technical and product-oriented role at the intersection of LLM agents, evaluation, developer tools, workflow automation, and applied AI systems.

What You'll Do
  • Build agent harnesses for multi-agent workflows and ralph loops.
  • Design evaluation environments to measure agent performance, reliability, cost, latency, and quality.
  • Build testing systems that connect agent behavior, code changes, and evaluation results.
  • Create workflows where agents can inspect code, make changes, run tests, and determine whether performance improved.
  • Develop shared-memory, coordination, and task-management systems for teams of agents.
  • Build infrastructure for tracking task success, regressions, cost, latency, quality, and long‑term progress.
  • Work on applied coding‑agent workflows, including refactoring, testing, debugging, code review, and technical‑debt reduction.
  • Support customer demos and applied workflows that turn agent research into usable product experiences.
  • Collaborate closely with ML systems engineers to make agent workloads run efficiently on Agent Cloud.
  • Help turn research prototypes into reliable, measurable, production‑quality systems.
Qualifications
  • Strong Python engineering skills.
  • Experience building LLM agents, evaluation harnesses, developer tools, applied AI systems, or automation workflows.
  • Strong systems instincts and the ability to build reliable, debuggable tooling.
  • Experience with testing, benchmarking, CI, code analysis, or automated software workflows.
  • Ability to reason about agent behavior, failure modes, evaluation quality, and product usefulness.
  • Comfort working across agent logic, backend systems, infrastructure, and user‑facing product requirements.
  • Strong product intuition and the ability to build demos that can evolve into real products.
  • High ownership and the ability to operate effectively in an early‑stage startup environment.
Nice to Have
  • Coding agents, eval harnesses, SWE‑bench‑style environments, or tool‑use systems.
  • Multi‑agent or long‑running agent workflows.
  • LLM observability, tracing, evals, or RL workflows.
  • CI/CD, automated testing, static analysis, or developer tooling.
Cultural Fit
  • Hands‑on technical excellence and strong engineering judgment.
  • End‑to‑end ownership, from design to implementation to production outcomes.
  • Bias for action: ship quickly, learn from failures, and iterate.
  • High intensity during critical milestones, with a focus on real customer impact.
  • Ability to do deep, focused work and sustain execution.
  • Clear communication with teammates, customers, and stakeholders.
  • Comfort with ambiguity, rapid change, and wearing multiple hats.
  • Low ego, high integrity, high accountability, and strong collaboration.
  • Continuous learning and a belief that judgment, intelligence, and capability compound over time.
If you are excited to build the agent systems behind the next generation of AI applications, create fully‑automated software workflows, and turn ambitious research ideas into real‑world impact, Model AI is the place for you.#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary