Software Engineer,Agentic AI — Nexus Job Milpitas area,California USA,Software Development

Position: Staff Software Engineer, Agentic AI — Nexus

Staff Software Engineer – Nexus AI

Arlo is building Nexus, our next‑generation agentic chat experience embedded in the Arlo app. Nexus helps customers interact with their devices, troubleshoot issues, and get more out of their Arlo ecosystem through natural conversation — backed by a growing system of LLM‑powered agents, tools, and integrations. Engineers who build systems that reason about physical devices will find a lot of interesting problems here.

We’re hiring a Staff Software Engineer to join the team building Nexus and to take ownership of expanding what our agents can do. This is a deeply technical Staff‑level role with real autonomy – you’ll work across the agent stack from prompt and tool design through orchestration, evals, and production hardening, and set technical direction that other engineers follow.

Key Responsibilities

Design and ship new agent capabilities for Nexus – new tools, skills, integrations, and conversational flows that meaningfully expand what users can accomplish through chat.
Build and own production‑grade Python services (FastAPI, async patterns) that power Nexus’s agent runtime, tool execution, and orchestration logic.
Extend our orchestration layer (Lang Graph / Lang Chain or equivalent) with new agent topologies, routing logic, and tool‑use patterns.
Design tool‑use and function‑calling interfaces – including MCP servers – that let Nexus safely interact with Arlo platform APIs, device telemetry, and partner systems.
Build the evals and observability that make agent behavior measurable: offline test suites, online quality metrics, trace tooling, regression detection, and dashboards engineers and PMs actually use.
Own the testing strategy for AI experiences – design and build the test harnesses, golden datasets, scenario suites, adversarial/red‑team tests, and CI gates that catch agent regressions before they reach users.
Define what “good” looks like for conversational quality, tool‑use correctness, and task completion.
Partner closely with product, design, and platform teams to turn user needs into shipped agent features – and bring engineering judgment to scoping, sequencing, and trade‑offs.
Set technical direction for agent development practices at Arlo: patterns, frameworks, code review standards, and the playbook other engineers follow when they build on Nexus.
Mentor mid and senior engineers on LLM systems, prompt design, and production AI engineering.

Minimum Qualifications

8+ years of software engineering experience, with at least 1–2 years building production LLM‑powered systems – ideally agentic chat, copilots, or multi‑step agent workflows.
Strong production Python skills (FastAPI, asyncio, type hints, testing discipline) and experience building and operating Python services at meaningful scale.
Hands‑on experience with LLM orchestration frameworks like Lang Graph, Lang Chain, Llama Index or equivalent – with an opinion on when to use them vs. build your own.
Deep familiarity with tool‑use / function‑calling patterns; bonus if MCP (Model Context Protocol) servers have been built or integrated.
Experience designing multi‑agent or multi‑step workflows: planner/executor patterns, agent handoff, state management, error recovery, human‑in‑the‑loop.
A real point of view on evals and observability for LLM systems – built or advocated for feedback loops that keep agents from regressing in production.
Hands‑on experience testing AI/LLM experiences in production – building eval datasets, scoring rubrics (LLM‑as‑judge, human‑in‑the‑loop, deterministic checks), regression suites, and the discipline to know when each applies.
Track record of shipping at the Staff level – operated as a technical leader across teams, not just an individual contributor.

Nice‑to‑Haves

Experience with RAG, vector databases, embedding pipelines, and retrieval quality tuning.
Familiarity with Anthropic’s Claude API, OpenAI’s Responses API, or comparable provider SDKs at the level of tool use, structured outputs, and streaming.
Experience instrumenting LLM systems with tools such as Lang Smith, Langfuse, Arize, Braintrust, or homegrown tracing.
Experience with AI testing tooling (Braintrust, Langfuse,…