Engineering Manager, Evaluation Platform
Listed on 2026-06-12
-
IT/Tech
AI Engineer (Applied/Software), AI Evaluation
Job Description
Engineering Manager, Evaluation PlatformLocation: Austin, TX
On-site (2 days per week hybrid in Austin office)
Company: Procore (Construction Intelligence organization)
Reports to: Sr Director, Procore AI Engineering
Job Summary
Build infrastructure and tooling to measure, benchmark, and improve the quality of AI agents (Search Agent, RFI Create Agent, Invoice Agent, etc.). Own end-to-end evaluation lifecycle: defining quality metrics, building evaluation frameworks, and delivering interfaces for actionable insights.
What You'll Do- Lead and grow a team of engineers focused on evaluation infrastructure, quality measurement, and developer tooling for AI agents.
- Define technical vision and roadmap for the Evaluation Platform (offline evaluations and online evaluations).
- Partner with AI/ML, Product, and Agent teams to define quality metrics (relevance, accuracy, latency, safety, user satisfaction, token usage) and build automated pipelines.
- Design and deliver user-facing evaluation tools for assessing agent output quality, comparing model versions, and identifying regressions.
- Build frameworks for human-in-the-loop evaluation (annotation workflows, rating interfaces, inter-rater reliability).
- Establish CI/CD quality gates for agent version releases.
- Drive engineering excellence (code quality, system reliability, test coverage, on-call health, technical debt management).
- Recruit, mentor, and develop engineers, fostering a culture of ownership and rigorous experimentation.
- 5+ years managing engineering teams or as a technical lead, with 7+ years total in software engineering.
- Experience building evaluation, quality measurement, or observability platforms for LLM-based or agentic systems (RAG pipelines, multi-step agents, tool-use agents).
- Strong understanding of evaluation methodologies (precision/recall, LLM-as-judge, human annotation, A/B testing, statistical significance).
- Proven ability to translate ambiguous problem spaces into clear technical strategies and executable roadmaps.
- Hands‑on technical depth in backend systems, data pipelines, or distributed infrastructure (Python, Go, or similar).
- Familiarity with evaluation frameworks such as RAGAS, Deep Eval, Lang Fuse, or custom eval harnesses.
- Background in search relevance (NDCG, MRR) or information retrieval quality systems.
- Experience with construction-tech, procurement, or enterprise B2B SaaS domains (preferred).
Base Pay Range: $ - $ USD Annual
Eligible for Equity Compensation and/or Bonus Incentive Compensation. Actual compensation based on job-related skills, experience, education/training, and location.
For Los Angeles County (unincorporated) Candidates:
Procore will consider for employment all qualified applicants, including those with arrest or conviction records, in accordance with applicable laws.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).