Agentic AI & LLM Software Development Engineer,Senior Job Washington area,District of Columbia USA,Software Development

Position: Agentic AI & LLM Applications Software Development Engineer, Senior

Agentic Ai & Llm Applications Software Development Engineer, Senior

To achieve an organization's mission, leaders need strong team members who can build the next generation of agentic AI to transform how clients accelerate research, makes decisions, and ships products t is why we need you, an experienced Software Development Engineer who can operate at a system-of-systems level to support clients in advancing AI-enabled systems within an R&D environment.

As part of our team, you'll serve as a Software Development Engineer to the Advanced Research Projects Agency for Health (ARPA-H). ARPA-H has a small team that is building the next generation of agentic AI to transform how the agency accelerates research, makes decisions, and ships products team will evolve ARPA-H's production AI assistant into an ecosystem of autonomous, multi-agent systems.

You'll serve as a Software Development Engineer at the application layer to design and build agentic workflows, build LLM integrations, support tool-calling systems, and develop AI-powered features that users interact with every day. Your focus will be on what runs on top of the platform: the agents, the orchestration, the prompts, the pipelines, and the product. Your attention to detail, flexibility, communication skills, understanding of the client's mission, and problem-solving will enable the mission's success.

What

You'll Work On

Support agentic AI systems and orchestration, LLM application development, features and products, observability and reliability, and engineering excellence
Design and build core agentic workflows: multi-step reasoning, planning, memory, and tool-use across single and multi-agent systems
Implement and evolve A2A communication patterns at the application layer, enabling agents to collaborate and hand off tasks, and build and maintain the tool-calling layer, including tool definitions, input and output schemas, error handling, retry logic, and result formatting
Own the MCP client-side integration, including how agents discover, invoke, and compose tools exposed via MCP servers
Design multi-agent workflows that are reliable, observable, and debuggable in production, not just in demos
Own LLM orchestration at the application layer, including prompt construction, context management, model selection logic, and response parsing
Build and maintain RAG features, including query formulation, result ranking, citation grounding, and hallucination mitigation; implement and iterate on prompt engineering patterns and system prompts that drive GRACE's quality and consistency across OpenAI GPT, Anthropic Claude, and Google Gemini
Manage context window budgets and know when to truncate, summarize, or paginate, and build the logic that makes those decisions correctly
Build evaluation pipelines for LLM quality, including grounding assessment, regression testing, safety checks, and A/B experimentation on prompt and model changes
Stay sharp on token economics and write prompts and pipelines that are cost-efficient without sacrificing output quality
Translate ambiguous product requirements into clear technical designs and ship them fast, build new product capabilities end-to-end, including from backend application logic through to the API contract the frontend consumes, and rapidly prototype new agentic features, run experiments, collect data, and iterate based on real user behavior
Collaborate closely with product, UX, applied science, and operations, write tests, handle edge cases, and make sure features degrade gracefully when upstream dependencies fail
Instrument agentic workflows with tracing, logging, and metrics so failures are diagnosable and regressions are caught before users report them
Define and monitor application-level SLOs: tool call success rates, response quality, and latency from the user's perspective, build fallback and guardrail logic for AI services, including what happens when a model returns something unsafe, off-topic, or structurally wrong, and work closely with the infra engineer to understand system-level constraints and design application behavior that respects them
Write production-quality code: readable, tested, reviewed, and…

Agentic AI & LLM Software Development Engineer, Senior