×
Register Here to Apply for Jobs or Post Jobs. X

Senior Software Engineer II - Applied AI and Evaluations; Remote Eligible

Remote / Online - Candidates ideally in
Everett, Snohomish County, Washington, 98201, USA
Listing for: Smartsheet Inc
Remote/Work from Home position
Listed on 2026-05-30
Job specializations:
  • Software Development
    Software Engineer, AI Engineer
Salary/Wage Range or Industry Benchmark: 125000 - 150000 USD Yearly USD 125000.00 150000.00 YEAR
Job Description & How to Apply Below
Position: Senior Software Engineer II - Applied AI and Evaluations (Remote Eligible)

Senior Software Engineer II - Applied AI and Evaluations (Remote Eligible)

-REMOTE, USA-

For over 20 years, Smartsheet has helped people and teams achieve–well, anything. From seamless work management to smart, scalable solutions, we’ve always worked with flow. We’re building tools that empower teams to automate the manual, uncover insights, and scale smarter. But more than that, we’re creating space– space to think big, take action, and unlock the kind of work that truly matters.

Because when challenge meets purpose, and passion turns into progress, that’s magic at work, and it’s what we show up for everyday.

Smartsheet is building the next generation of AI‑powered work management through Smart Assist, our intelligent agent platform. As we scale from early demos to production‑grade agents, quality is the critical frontier and we are looking for an Agent Quality Engineer to own it.

This is not a QA role. It's a deeply technical, high‑autonomy position at the intersection of LLM evaluation, prompt and context engineering, and retrieval‑augmented generation. You will diagnose why our agents fail, design the systems that catch regressions, and drive measurable improvements across our orchestrator and subagent fleet.

You will work closely with our Agent Engineering and AI Platform teams, embedded in a team that has already shipped evaluation infrastructure on Databricks/MLflow and is building toward a mature Agent Development Lifecycle (ADLC).

You Will:
  • Own agent quality end‑to‑end: diagnosis, improvement, and validation across Smart Assist's orchestrator and subagents
  • Identify failure modes across quality dimensions factual accuracy, completeness, tone, actionability, and latency and prioritize what to fix
  • Drive quality improvements through prompt engineering, context engineering, and RAG retrieval tuning
  • Extend and mature our evaluation framework: scorers, golden datasets, regression gates, and online evaluation for production traffic
  • Close the feedback loop ensure that every change has a measurable, attributable quality signal
  • Collaborate with our Agent Architecture lead to distinguish quality problems that require prompt/context solutions from those that require structural fixes
  • Establish repeatable methodology that scales beyond any single agent or subagent
You Have:

Required:

  • 8+ years of software engineering experience, with at least 2 years working directly with LLMs in production
  • Deep, hands‑on experience with prompt engineering and context engineering, you understand how model behavior changes with framing, structure, and input design
  • Strong working knowledge of RAG architectures: chunking strategies, embedding models, retrieval evaluation, and failure diagnosis
  • Experience building or extending LLM evaluation frameworks, you have designed scorers, worked with golden datasets, and thought carefully about what good looks like
  • Fluency in agent system design, you don't need to own the architecture, but you can engage as a peer on architectural tradeoffs that affect quality
  • Strong Python skills; comfortable working in data‑heavy environments (Databricks, Delta tables, or equivalent)
  • Ability to communicate complex quality findings (written and verbal) to both technical and non‑technical stakeholders, you can explain what’s broke, why it matters, and what needs to happen next without losing the room
  • Strong cross‑functional judgment, you know when to elevate, when to resolve independently, and how to build credibility across engineering, product, and AI platform teams
  • A bias for clarity in ambiguous situations, when failure modes are murky and trade‑offs are real, you bring structure and a clear point of view rather than waiting for consensus
  • Legally eligible to work in the U.S. on an ongoing basis
  • BS or MS in Computer Science, a related field, or equivalent industry experience

Strong Plus:

  • Experience with MLflow or similar experiment tracking platforms
  • Familiarity with CI‑integrated evaluation pipelines
  • Experience with multi‑agent orchestration frameworks
  • Prior work in an Applied AI or LLMOps function within a product company
What Success Looks Like:

In your first 6 months, you will have:

  • Delivered measurable, validated…
Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary