Agentic AI & LLM Software Development Engineer, Senior
Listed on 2026-06-27
-
Software Development
AI Engineer (Applied/Software), Backend Developer, Software Engineer, Full Stack Developer
Agentic Ai & Llm Applications Software Development Engineer, Senior
The Opportunity:
To achieve an organization's mission, leaders need strong team members who can build the next generation of agentic AI to transform how clients accelerate research, makes decisions, and ships products t is why we need you, an experienced Software Development Engineer who can operate at a system-of-systems level to support clients in advancing AI-enabled systems within an R&D environment.
As part of our team, you'll serve as a Software Development Engineer to the Advanced Research Projects Agency for Health (ARPA-H). ARPA-H has a small team that is building the next generation of agentic AI to transform how the agency accelerates research, makes decisions, and ships products team will evolve ARPA-H's production AI assistant into an ecosystem of autonomous, multi-agent systems.
You'll serve as a Software Development Engineer at the application layer to design and build agentic workflows, build LLM integrations, support tool-calling systems, and develop AI-powered features that users interact with every day. Your focus will be on what runs on top of the platform: the agents, the orchestration, the prompts, the pipelines, and the product. Your attention to detail, flexibility, communication skills, understanding of the client's mission, and problem-solving will enable the mission's success.
What You'll Work On
- Support agentic AI systems and orchestration, LLM application development, features and products, observability and reliability, and engineering excellence
- Design and build core agentic workflows: multi-step reasoning, planning, memory, and tool-use across single and multi-agent systems
- Implement and evolve A2A communication patterns at the application layer, enabling agents to collaborate and hand off tasks, and build and maintain the tool-calling layer, including tool definitions, input and output schemas, error handling, retry logic, and result formatting
- Own the MCP client-side integration, including how agents discover, invoke, and compose tools exposed via MCP servers
- Design multi-agent workflows that are reliable, observable, and debuggable in production, not just in demos
- Own LLM orchestration at the application layer, including prompt construction, context management, model selection logic, and response parsing
- Build and maintain RAG features, including query formulation, result ranking, citation grounding, and hallucination mitigation; implement and iterate on prompt engineering patterns and system prompts that drive GRACE's quality and consistency across OpenAI GPT, Anthropic Claude, and Google Gemini
- Manage context window budgets and know when to truncate, summarize, or paginate, and build the logic that makes those decisions correctly
- Build evaluation pipelines for LLM quality, including grounding assessment, regression testing, safety checks, and A/B experimentation on prompt and model changes
- Stay sharp on token economics and write prompts and pipelines that are cost-efficient without sacrificing output quality
- Translate ambiguous product requirements into clear technical designs and ship them fast, build new product capabilities end-to-end, including from backend application logic through to the API contract the frontend consumes, and rapidly prototype new agentic features, run experiments, collect data, and iterate based on real user behavior
- Collaborate closely with product, UX, applied science, and operations, write tests, handle edge cases, and make sure features degrade gracefully when upstream dependencies fail
- Instrument agentic workflows with tracing, logging, and metrics so failures are diagnosable and regressions are caught before users report them
- Define and monitor application-level SLOs: tool call success rates, response quality, and latency from the user's perspective, build fallback and guardrail logic for AI services, including what happens when a model returns something unsafe, off-topic, or structurally wrong, and work closely with the infra engineer to understand system-level constraints and design application behavior that respects them
- Write production-quality code: readable, tested, reviewed, and documented
- Communicate technical decisions clearly to both engineers and non-engineers; no one should have to guess what you decided or why, participate actively in design reviews, and push back when something is over-engineered or under-specified
- Ensure strong privacy, security, and compliance in all application logic and data handling
Join us. The world can't wait.
You have:
- 7+ years of experience with software engineering, including building and operating production systems
- Experience in high-velocity environments where you owned and shipped complex products end-to-end
- Experience with at least 2 backend languages, including Python
- Experience building and operating systems on major cloud platforms, such as AWS, GCP, or Azure
- Experience with containerization and working within CI/CD pipelines
- Knowledge of modern…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).