Staff Engineer — Agentic AI Job San Francisco area,California USA,Software Development

Cosmon is hiring a senior technical leader to own the core agent intelligence that turns mechanical engineers' intent into reliable, cost‑efficient multi‑step workflows across desktop engineering tools—this role sits at the intersection of applied agentic AI, user research, and product delivery and will determine the product's real‑world value to enterprise customers.

What you'll do

Lead development of the core agent intelligence layer that executes multi‑step workflows across complex desktop engineering software.
Report to the CTO and serve as technical lead for a small team of AI engineers, a user researcher, and domain expert contractors.
Own the full product loop: define agent capabilities from user stories, build implementations, and benchmark against real workflows.
Drive agent task success rate by defining evaluation frameworks, establishing baselines, and iterating to improve completion metrics.
Set and enforce per‑task token budgets and track cost per completed workflow to ensure commercial viability.
Build rigorous, reproducible evaluation infrastructure grounded in validated user stories.
Lead user story mapping and validation through interviews and close collaboration with domain experts.
Translate validated user stories into testable evals and close the loop between research and benchmarking.
Own agent architecture decisions including tool‑calling, state management, error recovery, model routing, and context management.
Act as a player‑coach: write production code, review designs, unblock the team, and raise engineering standards.
Collaborate cross‑functionally with integrations, product, and customers during POCs to align agent behavior with real‑world usage.
Operate in an early‑stage, high‑impact environment (small team, Series A, Fortune 100 customers, direct line to the CTO).

What Cosmon is looking for

7+ years in software engineering, including at least 2 years building agentic LLM‑based agents that act in the real world.
Deep experience designing LLM application architectures, including model selection, context/window management, retrieval, and orchestration patterns.
Proven ability to build evaluation and benchmarking frameworks measuring task completion, cost efficiency, and failure modes.
Technical leadership experience setting direction for small teams (3–6 engineers) and performing meaningful code review.
Strong Python skills and familiarity with LLM tooling (function calling, tool APIs, observability/tracing, evaluation frameworks).
Experience with desktop automation or programmatic control of applications (COM or similar).
Domain experience in mechanical engineering, CAD/CAE, PLM, or adjacent industries.
Understanding of enterprise deployment constraints on locked‑down corporate workstations.
Track record contributing to public benchmarks, publications, or open‑source agentic AI projects.

#J-18808-Ljbffr