AI Engineer Job Los Angeles area,California USA,Software Development

A builder with a high technical bar whose leverage is judgment, not keystrokes — in the Office of the CEO.

Product.ai is the verified truth layer for shopping — the intelligence that tells you what's actually true about a product, including when not to buy. Profitable. Bootstrapped. No outside investors. No board. 20 people outbuilding companies 10× our size.

Strong people find us and keep finding us — they apply over months and years, because the field moves fast and the exact profile we need moves with it.

Why This Role Exists

Most of our software is now written by AI agents. So the job that matters is no longer typing the code — it's deciding what to build, designing the systems the agents run inside, and knowing how you'll prove the work is correct. You'll be a builder closer to a product engineer than a heads‑down coder: your leverage is judgment and taste, not typing speed.

Your first surface is the agent harness behind the Office of the CEO — the live automation that lets 20 people move like a company many times our size. A recruiting‑evaluation pipeline scoring 30+ candidates a day across open roles. A merchant‑discovery pipeline landing ~1,200 merchants a day. The content, data, and ops automation underneath it all. This year these crossed a threshold: agent runs that go 1‑4 hours unattended became a normal unit of work for us, and that fleet now deserves a dedicated owner.

But you won't stay boxed into one surface. This is a generalist builder's seat, working directly with the founder. The harness is where you start, not the ceiling of what you'll touch.

The System You'll Need to Model

A fleet of production automations whose failure mode is silent death. Pipelines here rarely fail loudly — they stop, and the cost accrues invisibly until someone notices days later. The real engineering problem is liveness: designing alarms and deterministic checks so that no automation in the fleet can die unnoticed.
Long‑lived agent runs that go 1‑4 hours unattended. They hold together not because someone watches them, but because they run on architectural law (the rules an agent run is bound by), a fuel budget, and verification built in. You design what governs a run, what it's allowed to spend, and what proves it worked.
Verification the agent cannot author. A generative model cannot reliably grade its own output, so a verifier that shares the generator's context will launder its own mistakes. The architecture is external truth anchors, regression corpora, and oracle‑separated checkers — a separate judge that never sees what the builder saw. This separation is the whole game, and it's also the company's thesis: verified truth a model can't fake.
Token spend judged by what it moved, not what it cost. Every run is instrumented for the outcome it produced, and budget gets redirected toward what's working while the run is still going. We're quality‑maximalist: the expensive thing is a redo cycle, never tokens.
A shared knowledge base the agents stand on — a brain of 8,600+ indexed documents your automations query to answer their own questions, governed by the same architectural law your work is. Your systems read from it, feed it, and are bound by it.
Architecture that moves weekly. We built this harness before unattended runs were even possible at scale, and the ground keeps shifting. You'll model where it's going and act without waiting for a brief.

If reading that energizes you, keep going. If it feels overwhelming or under specified, this isn't the right fit.

What You Will Own

The agent harness for the Office of the CEO. Recruiting evaluations, merchant discovery, ops automation — the run designs, the runtime they execute in, and the architectural law that governs them. When a new automation is needed, you decide how it runs, what governs it, and what proves it correct.
Verification that scales as the work compounds. Ad‑hoc human review collapses somewhere around 100‑150 artifacts a day, and we're heading straight through that ceiling. You build what replaces it: regression corpora, oracle‑separated checkers, sampling protocols, and escalation paths that put a human in the loop only where real judgment is…