Senior AI Engineer — Inference & Agent Systems
Listed on 2026-06-04
-
Software Development
AI Engineer, Machine Learning/ ML Engineer
Senior AI Engineer — Inference & Agent Systems
United States
Title: Applied AI Engineer — Inference & Agent Systems
Location: United States
What We're Building
Arcana is building AI agents that synthesize information across heterogeneous sources and deliver structured, reasoned answers in real time. The product only works if the agents are fast, reliable, and correct, not approximately correct.
The Work
Inference Optimization
- Drive TTFT below 400ms for multi-step agent pipelines
- Streaming optimization: first token to user while sub-agents are still running
- KV cache strategy, prompt compression, dynamic context window management
- Multi-provider routing: model selection by latency, cost, and task type across OpenAI, Anthropic, Gemini, and open-weight models
- Design and implement Plan-Execute-Synthesize pipelines that run sub-agents in parallel DAGs, not sequential chains
- Build reliable orchestration on top of Temporal: retries, timeouts, partial failure recovery, idempotency
- Structured output enforcement: JSON schema validation, retry loops on malformed LLM output, graceful degradation
- Tool call design: schema design that LLMs actually follow reliably across providers
- Own the eval framework end to end: ground truth datasets, automated scoring pipelines, regression detection on every PR
- LLM-as-judge pipelines for qualitative output assessment
- Latency regression testing - p50/p95/p99 tracked across every deployment
- Adversarial test case design: ambiguous queries, missing data, conflicting sources, malformed tool responses
Infrastructure
- Model serving and cold start optimization
- Async worker architecture for parallel sub-agent execution
- Observability: trace every token, every tool call, every synthesis step
What We're Looking For
You've built something that runs in production at a meaningful scale and you understand why it's fast (or why it isn't).
Strong signal
:
- You've worked on inference pipelines where TTFT was the primary metric and you moved it meaningfully
- You've built multi-step agent systems and you know where they break not from reading papers but from watching them fail in production
- You've written eval harnesses from scratch and you have opinions about what makes a ground truth dataset actually useful
- You've debugged LLM non-determinism in production and built systems resilient to it
- You've worked with streaming LLM responses and built infrastructure around partial output handling
Weaker signal (but not disqualifying):
- You've fine-tuned models but haven't shipped inference systems
- You've used Lang Chain/Llama Index but haven't built the layer underneath
- Strong ML research background without systems exposure
Stack familiarity (we care more about depth than match):
Go, Python, Temporal, Kafka, Postgre
SQL, Docker
Why This Role
The problems here don't have blog posts about them yet. Parallel agent DAG execution under hard latency budgets, streaming synthesis across partial sub-agent results, eval harnesses for non-deterministic multi-step systems: these are genuinely unsolved at production quality. Small team. High ownership. Every engineer's decisions ship to production.
Who We Want to Hear From
- You've shipped inference systems at A real-time AI product (search, coding assistant, chat at scale)
- You've shipped inference systems at An agent platform (any domain)
- Or you've built eval/harness infrastructure that a team of 10+ engineers actually trusted to catch regressions.
Apply
Send to: careers
Include:
- One system you built where latency was the primary constraint what you measured, what you changed, what moved
- Link to anything public (code, writing, talks)
We respond to every application.
Apply for this job*
indicates a required field
First Name *
Last Name *
Email *
Phone
Country
Phone
Resume/CV *
Enter manually
Accepted file types: pdf, doc, docx, txt, rtf
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).