×
Register Here to Apply for Jobs or Post Jobs. X

Applied AI​/Evaluation Engineer

Job in Portland, Multnomah County, Oregon, 97204, USA
Listing for: NAVEX
Full Time position
Listed on 2026-05-03
Job specializations:
  • IT/Tech
    AI Engineer (Applied/Software), Data Scientist, Machine Learning/ ML Engineer
Salary/Wage Range or Industry Benchmark: 80000 - 100000 USD Yearly USD 80000.00 100000.00 YEAR
Job Description & How to Apply Below
Position: Applied AI / Evaluation Engineer

At NAVEX, we’re transforming the world—making it safer, more ethical, and ensuring every voice is heard. That’s real impact.

Our high-performance culture is driven by our values. We move with speed, passion and purpose — as one team. We are bold in our ideas, accountable in our actions, and committed to doing the right things right.

You will join our Artificial Intelligence and Machine Learning team that shares a passion for designing quality solutions, embracing new technologies and delivering powerful products within our integrated risk and compliance management platform that help our customers protect their reputation and bottom line. We are changing the way people experience life at work!

As an Applied AI / Evaluation Engineer, you will own the quality, measurement, and behavioral assurance of the NAVEX AI Product System. You will build and operate evaluation harnesses, quality gating mechanisms, and human-in-the-loop tooling that ensure AI behavior is safe, consistent, and improving over time. In an agentic context, you will create the evaluation and regression testing systems that reduce drift and make agent behavior predictable—integrating continuous evaluation into CI/CD and production monitoring.

You will be the guardian of AI quality, ensuring that no AI capability reaches production without passing rigorous evaluation. If you want to ensure enterprise agentic AI systems are trustworthy and measurably excellent, this role is for you.

You’ll thrive in this hybrid role surrounded by an engaged, collaborative team deeply committed to your success. Join us and help shape what’s next!

What you’ll get:
  • Meaningful Purpose. Your work helps organizations operate with integrity and protect their people—at a scale few companies can match.
  • High-Performance Environment. We move with urgency, set ambitious goals, and expect excellence. You’ll be trusted with real ownership and supported to do the best work of your career.
  • Candid, Supportive Culture. We communicate openly, challenge ideas—not people—and value teammates who embrace bold thinking and continuous improvement.
  • Growth That Matters. You can count on authentic feedback, strong accountability, and leaders invested in your success so you can achieve real growth.
  • R
    ewards for Results. We provide clear, competitive compensation designed to recognize measurable outcomes and real impact.
What you’ll do:
  • Design, build, and operate the AI evaluation and regression harness that gates all AI releases—developing scenario suites, golden traces, and automated quality gates to reduce drift and make behavior predictable
  • Define and maintain evaluation dimensions including groundedness, accuracy, relevance, safety, and policy adherence
  • Build and curate versioned reference datasets (golden sets) covering common usage patterns and known failure modes
  • Implement LLM-as-judge evaluation pipelines and rationale validation frameworks
  • Develop and operate human-in-the-loop (HITL) tooling and signal capture systems
  • Build drift detection and regression tracking capabilities to monitor AI behavioral stability over time
  • Design quality gates that enforce measurable thresholds before AI capabilities are promoted to production
  • Instrument agent observability—including end-to-end tracing for agent runs (tool-call success rates, failure analysis, latency and cost monitoring)—and use observability to debug and continuously improve
  • Normalize and associate human review signals with AI interactions for continuous improvement
  • Collaborate with data scientists and platform engineers to instrument telemetry across AI system components
  • Produce evaluation reports and quality metrics that support governance, compliance, and leadership review
What you’ll bring:
  • Bachelor’s or Master’s degree in Computer Science, Data Science, Statistics, or a related STEM field
  • 5+ years’ experience in ML engineering, AI evaluation, or applied AI quality assurance
  • Strong experience building evaluation harnesses, regression testing frameworks, and quality gating pipelines for LLM-based systems
  • Experience with LLM-as-judge methodologies and automated evaluation techniques
  • Evaluation-first mindset—experience implementing continuous…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary