Senior AI Agent & Evaluations Engineer
Listed on 2026-07-01
-
Software Development
AI Engineer (Applied/Software), AI Reliability/ Performance Engineer, AI QA / Validation Engineer
Senior AI Agent & Evals Engineer
Join Vacatia and help build the future of AI-powered vacation ownership.
Location:
Portland, OR (Hybrid – Three Days In Office) Remote considered for exceptional candidates.
Vacatia is building the future of vacation ownership. We operate in a fragmented, operationally complex industry where AI has the potential to fundamentally transform how decisions are made, how customers are supported, and how businesses scale.
We're developing AI agents that sit at the center of critical business workflows—helping owners, supporting operations, surfacing insights, and automating decisions that historically required significant human effort. These agents interact with real customers and influence real business outcomes, making reliability, safety, and performance essential.
We're looking for a hands-on Senior AI Agent & Evals Engineer to own the intelligence layer behind these systems. You'll be responsible for designing agent behavior, building evaluation frameworks, creating guardrails, and continuously improving agent performance as our AI footprint expands across the organization.
If you're passionate about prompt engineering, agent reliability, and creating measurable AI systems that solve meaningful business problems, we'd love to meet you.
Why You'll Love Working at Vacatia
- Build the Future of Applied AI
- Work on Problems That Matter
- Own the Intelligence Layer
- Measure What Matters
- Partner Across the Business
- Join a Small Team with Outsized Impact
Your Impact
- Design, refine, and optimize prompts, tool definitions, routing logic, and decision-making behavior across Vacatia's AI agent ecosystem
- Build and maintain evaluation frameworks, golden datasets, grading systems, and regression testing pipelines that measure agent quality and reliability
- Develop guardrails and safe-failure mechanisms that ensure agents operate responsibly in customer-facing and financially sensitive workflows
- Monitor production performance, investigate failures, identify edge cases, and continuously improve agent outcomes through data-driven iteration
- Partner with business stakeholders to translate policies, operational requirements, and domain expertise into measurable agent behavior
- Collaborate with engineering teams to define context requirements, tool contracts, and integration specifications that support agent success
- Create scalable frameworks and reusable patterns for deploying AI agents across new business workflows and use cases
- Establish best practices for prompt engineering, evaluation methodologies, observability, and agent operations
What You Bring
- Proven experience shipping and owning production AI agents or LLM-powered systems beyond proof-of-concept environments
- Deep expertise in prompt engineering, including system prompts, tool usage, context management, output constraints, and agent behavior design
- Hands-on experience building evaluation frameworks using golden datasets, scoring rubrics, LLM-as-judge methodologies, and regression testing
- Strong familiarity with modern AI development tools such as Claude Code, Codex, or similar coding agents
- Experience with agent observability and evaluation platforms such as Lang Smith, Langfuse, Arize, Galileo, or comparable solutions
- Ability to distinguish prompt issues from data, tooling, model, or evaluation failures and systematically improve agent performance
- Strong written and verbal communication skills with the ability to work effectively across engineering and business teams
- Demonstrated ownership mindset with a passion for building reliable, measurable, and continuously improving AI systems
Strongly Preferred
- Experience building agents that process communication-based workflows including emails, support tickets, chat interactions, or transcripts
- Experience with multiple agent frameworks and a practical understanding of their tradeoffs
- Familiarity with the evolving LLM landscape and model selection strategies
- Experience designing and implementing end-to-end evaluation pipelines and agent operations workflows
- Production experience with online evaluation systems and automated scoring of live traffic
Nice to Have
- Experience integrating AI systems with Salesforce, AWS Connect, or customer engagement platforms
- Background in customer-facing industries where accuracy, compliance, and communication quality are critical
- Contributions to open-source projects, technical writing, or public thought leadership in AI, prompt engineering, or agent development
Join us at the forefront of applied AI innovation. If you're excited about building intelligent systems that solve complex business problems, improving agent behavior through rigorous evaluation, and helping shape the future of vacation ownership, we'd love to hear from you.
At Vacatia, you'll have the opportunity to build AI solutions that matter, work alongside talented teammates, and create technology that drives real business impact.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).