×
Register Here to Apply for Jobs or Post Jobs. X

Engineering Manager, Evaluation Platform

Job in Austin, Travis County, Texas, 78716, USA
Listing for: ChatGPT Jobs
Part Time position
Listed on 2026-06-12
Job specializations:
  • IT/Tech
    AI Engineer (Applied/Software), AI Evaluation
Salary/Wage Range or Industry Benchmark: 80000 - 100000 USD Yearly USD 80000.00 100000.00 YEAR
Job Description & How to Apply Below

Job Description

Engineering Manager, Evaluation Platform

Location: Austin, TX

On-site (2 days per week hybrid in Austin office)

Company: Procore (Construction Intelligence organization)

Reports to: Sr Director, Procore AI Engineering

Job Summary

Build infrastructure and tooling to measure, benchmark, and improve the quality of AI agents (Search Agent, RFI Create Agent, Invoice Agent, etc.). Own end-to-end evaluation lifecycle: defining quality metrics, building evaluation frameworks, and delivering interfaces for actionable insights.

What You'll Do
  • Lead and grow a team of engineers focused on evaluation infrastructure, quality measurement, and developer tooling for AI agents.
  • Define technical vision and roadmap for the Evaluation Platform (offline evaluations and online evaluations).
  • Partner with AI/ML, Product, and Agent teams to define quality metrics (relevance, accuracy, latency, safety, user satisfaction, token usage) and build automated pipelines.
  • Design and deliver user-facing evaluation tools for assessing agent output quality, comparing model versions, and identifying regressions.
  • Build frameworks for human-in-the-loop evaluation (annotation workflows, rating interfaces, inter-rater reliability).
  • Establish CI/CD quality gates for agent version releases.
  • Drive engineering excellence (code quality, system reliability, test coverage, on-call health, technical debt management).
  • Recruit, mentor, and develop engineers, fostering a culture of ownership and rigorous experimentation.
What We're Looking For
  • 5+ years managing engineering teams or as a technical lead, with 7+ years total in software engineering.
  • Experience building evaluation, quality measurement, or observability platforms for LLM-based or agentic systems (RAG pipelines, multi-step agents, tool-use agents).
  • Strong understanding of evaluation methodologies (precision/recall, LLM-as-judge, human annotation, A/B testing, statistical significance).
  • Proven ability to translate ambiguous problem spaces into clear technical strategies and executable roadmaps.
  • Hands‑on technical depth in backend systems, data pipelines, or distributed infrastructure (Python, Go, or similar).
  • Familiarity with evaluation frameworks such as RAGAS, Deep Eval, Lang Fuse, or custom eval harnesses.
  • Background in search relevance (NDCG, MRR) or information retrieval quality systems.
  • Experience with construction-tech, procurement, or enterprise B2B SaaS domains (preferred).
Compensation & Benefits

Base Pay Range: $ - $ USD Annual

Eligible for Equity Compensation and/or Bonus Incentive Compensation. Actual compensation based on job-related skills, experience, education/training, and location.

For Los Angeles County (unincorporated) Candidates:
Procore will consider for employment all qualified applicants, including those with arrest or conviction records, in accordance with applicable laws.

#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary