Agentic AI & LLM Software Development Engineer, Senior
Listed on 2026-06-28
-
Software Development
AI Engineer (Applied/Software), Backend Developer, Full Stack Developer, Software Engineer
Agentic Ai & Llm Applications Software Development Engineer, Senior
To achieve an organization's mission, leaders need strong team members who can build the next generation of agentic AI to transform how clients accelerate research, makes decisions, and ships products t is why we need you, an experienced Software Development Engineer who can operate at a system-of-systems level to support clients in advancing AI-enabled systems within an R&D environment.
As part of our team, you'll serve as a Software Development Engineer to the Advanced Research Projects Agency for Health (ARPA-H). ARPA-H has a small team that is building the next generation of agentic AI to transform how the agency accelerates research, makes decisions, and ships products team will evolve ARPA-H's production AI assistant into an ecosystem of autonomous, multi-agent systems.
You'll serve as a Software Development Engineer at the application layer to design and build agentic workflows, build LLM integrations, support tool-calling systems, and develop AI-powered features that users interact with every day. Your focus will be on what runs on top of the platform: the agents, the orchestration, the prompts, the pipelines, and the product. Your attention to detail, flexibility, communication skills, understanding of the client's mission, and problem-solving will enable the mission's success.
WhatYou'll Work On
- Support agentic AI systems and orchestration, LLM application development, features and products, observability and reliability, and engineering excellence
- Design and build core agentic workflows: multi-step reasoning, planning, memory, and tool-use across single and multi-agent systems
- Implement and evolve A2A communication patterns at the application layer, enabling agents to collaborate and hand off tasks, and build and maintain the tool-calling layer, including tool definitions, input and output schemas, error handling, retry logic, and result formatting
- Own the MCP client-side integration, including how agents discover, invoke, and compose tools exposed via MCP servers
- Design multi-agent workflows that are reliable, observable, and debuggable in production, not just in demos
- Own LLM orchestration at the application layer, including prompt construction, context management, model selection logic, and response parsing
- Build and maintain RAG features, including query formulation, result ranking, citation grounding, and hallucination mitigation; implement and iterate on prompt engineering patterns and system prompts that drive GRACE's quality and consistency across OpenAI GPT, Anthropic Claude, and Google Gemini
- Manage context window budgets and know when to truncate, summarize, or paginate, and build the logic that makes those decisions correctly
- Build evaluation pipelines for LLM quality, including grounding assessment, regression testing, safety checks, and A/B experimentation on prompt and model changes
- Stay sharp on token economics and write prompts and pipelines that are cost-efficient without sacrificing output quality
- Translate ambiguous product requirements into clear technical designs and ship them fast, build new product capabilities end-to-end, including from backend application logic through to the API contract the frontend consumes, and rapidly prototype new agentic features, run experiments, collect data, and iterate based on real user behavior
- Collaborate closely with product, UX, applied science, and operations, write tests, handle edge cases, and make sure features degrade gracefully when upstream dependencies fail
- Instrument agentic workflows with tracing, logging, and metrics so failures are diagnosable and regressions are caught before users report them
- Define and monitor application-level SLOs: tool call success rates, response quality, and latency from the user's perspective, build fallback and guardrail logic for AI services, including what happens when a model returns something unsafe, off-topic, or structurally wrong, and work closely with the infra engineer to understand system-level constraints and design application behavior that respects them
- Write production-quality code: readable, tested, reviewed, and…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).