×
Register Here to Apply for Jobs or Post Jobs. X

Software Engineer, Agentic AI — Nexus

Job in Milpitas, Santa Clara County, California, 95035, USA
Listing for: Arlo Technologies, Inc.
Full Time position
Listed on 2026-06-02
Job specializations:
  • Software Development
    AI Engineer, Cloud Engineer - Software, DevOps, Machine Learning/ ML Engineer
Salary/Wage Range or Industry Benchmark: 175000 - 225000 USD Yearly USD 175000.00 225000.00 YEAR
Job Description & How to Apply Below
Position: Staff Software Engineer, Agentic AI — Nexus

Staff Software Engineer – Nexus AI

Arlo is building Nexus, our next‑generation agentic chat experience embedded in the Arlo app. Nexus helps customers interact with their devices, troubleshoot issues, and get more out of their Arlo ecosystem through natural conversation — backed by a growing system of LLM‑powered agents, tools, and integrations. Engineers who build systems that reason about physical devices will find a lot of interesting problems here.

We’re hiring a Staff Software Engineer to join the team building Nexus and to take ownership of expanding what our agents can do. This is a deeply technical Staff‑level role with real autonomy – you’ll work across the agent stack from prompt and tool design through orchestration, evals, and production hardening, and set technical direction that other engineers follow.

Key Responsibilities
  • Design and ship new agent capabilities for Nexus – new tools, skills, integrations, and conversational flows that meaningfully expand what users can accomplish through chat.
  • Build and own production‑grade Python services (FastAPI, async patterns) that power Nexus’s agent runtime, tool execution, and orchestration logic.
  • Extend our orchestration layer (Lang Graph / Lang Chain or equivalent) with new agent topologies, routing logic, and tool‑use patterns.
  • Design tool‑use and function‑calling interfaces – including MCP servers – that let Nexus safely interact with Arlo platform APIs, device telemetry, and partner systems.
  • Build the evals and observability that make agent behavior measurable: offline test suites, online quality metrics, trace tooling, regression detection, and dashboards engineers and PMs actually use.
  • Own the testing strategy for AI experiences – design and build the test harnesses, golden datasets, scenario suites, adversarial/red‑team tests, and CI gates that catch agent regressions before they reach users.
  • Define what “good” looks like for conversational quality, tool‑use correctness, and task completion.
  • Partner closely with product, design, and platform teams to turn user needs into shipped agent features – and bring engineering judgment to scoping, sequencing, and trade‑offs.
  • Set technical direction for agent development practices at Arlo: patterns, frameworks, code review standards, and the playbook other engineers follow when they build on Nexus.
  • Mentor mid and senior engineers on LLM systems, prompt design, and production AI engineering.
Minimum Qualifications
  • 8+ years of software engineering experience, with at least 1–2 years building production LLM‑powered systems – ideally agentic chat, copilots, or multi‑step agent workflows.
  • Strong production Python skills (FastAPI, asyncio, type hints, testing discipline) and experience building and operating Python services at meaningful scale.
  • Hands‑on experience with LLM orchestration frameworks like Lang Graph, Lang Chain, Llama Index or equivalent – with an opinion on when to use them vs. build your own.
  • Deep familiarity with tool‑use / function‑calling patterns; bonus if MCP (Model Context Protocol) servers have been built or integrated.
  • Experience designing multi‑agent or multi‑step workflows: planner/executor patterns, agent handoff, state management, error recovery, human‑in‑the‑loop.
  • A real point of view on evals and observability for LLM systems – built or advocated for feedback loops that keep agents from regressing in production.
  • Hands‑on experience testing AI/LLM experiences in production – building eval datasets, scoring rubrics (LLM‑as‑judge, human‑in‑the‑loop, deterministic checks), regression suites, and the discipline to know when each applies.
  • Track record of shipping at the Staff level – operated as a technical leader across teams, not just an individual contributor.
Nice‑to‑Haves
  • Experience with RAG, vector databases, embedding pipelines, and retrieval quality tuning.
  • Familiarity with Anthropic’s Claude API, OpenAI’s Responses API, or comparable provider SDKs at the level of tool use, structured outputs, and streaming.
  • Experience instrumenting LLM systems with tools such as Lang Smith, Langfuse, Arize, Braintrust, or homegrown tracing.
  • Experience with AI testing tooling (Braintrust, Langfuse,…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary