×
Register Here to Apply for Jobs or Post Jobs. X

Agent Evaluation Specialist

Job in San Francisco, San Francisco County, California, 94199, USA
Listing for: Prox
Full Time position
Listed on 2026-06-18
Job specializations:
  • IT/Tech
    Data Annotation/ AI Labeling, Technical Writer, Technical Support, AI Evaluation
Salary/Wage Range or Industry Benchmark: 60000 - 80000 USD Yearly USD 60000.00 80000.00 YEAR
Job Description & How to Apply Below

Big part of Prox is AI agents that process complex technical documents into structured knowledge. The agents are right most of the time. When they're wrong, we need you to catch it.

You’ll work inside a review platform we built. Each task shows you the source material, what the agent produced, and the steps it took to get there. You compare them and grade the agent's work.

What you do per task
  • Read the source and the agent's output side by side. Verify the content was captured accurately.
  • Review what the agent did. What it created, changed, or left out.
  • Score a short rubric covering accuracy, coverage, organization, and rule adherence. Full rubric provided at onboarding.
  • Write detailed feedback about the mistake.
  • Submit. Move to the next task.
Conditions
  • Subject matter shifts over time. You don't need prior knowledge of the subjects. You need to be able to compare two documents carefully and spot where they disagree.
  • Rate is fixed for the engagement. If it changes, it goes up, and we tell you before your next task.
  • Work product owned by Prox (work-for-hire).
  • Standard NDA at offer stage.
Requirements
  • Can read dense technical content for hours without losing focus
  • Consistent — your scoring on Monday matches your scoring on Friday
  • Clear, specific feedback: "section 4 dropped the key requirement from page 17", not "this is confusing"
  • Reliable on committed hours
Preferred
  • Prior work as an AI trainer, tutor, or evaluator (Outlier, Data Annotation, xAI, Surge, Mercor, Invisible, Toloka, etc.)
  • Technical writing, editing, QA, translation, paralegal, or research-assistant background
  • Markdown familiarity
The challenge below is the interview.

We don't do resume screens or vibe calls. Everyone who applies takes the same ~30 min challenge at  We read every submission.

If your submission is sharp, you start on paid tasks the same week after a short interview.

#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary