×
Register Here to Apply for Jobs or Post Jobs. X

Machine Learning Researcher - RL and Agentic

Job in New City, Rockland County, New York, 10956, USA
Listing for: Protege
Full Time position
Listed on 2026-06-18
Job specializations:
  • IT/Tech
    AI Evaluation, Data Scientist, Data Annotation/ AI Labeling, Machine Learning/ ML Engineer
Salary/Wage Range or Industry Benchmark: 125000 - 150000 USD Yearly USD 125000.00 150000.00 YEAR
Job Description & How to Apply Below

Company Overview:

We are building Protege to solve the biggest unmet need in AI — getting access to the right training data. The process today is time intensive, incredibly expensive, and often ends in failure. The Protege platform facilitates the secure, efficient, and privacy-centric exchange of AI training data.

Solving AI’s data problem is a generational opportunity. We’re backed by world-class investors and already powering partnerships with some of the most ambitious teams in AI. The company that succeeds will be one of the largest in AI — and in tech.

We’re a lean, fast-moving, high-trust team of builders who are obsessed with velocity and impact. Our culture is built for people who thrive on ambiguity, own outcomes, and want to shape the future of data and AI.

About Data Lab

Data Lab exists because truly useful data is rare — and the frontier of AI development only moves forward when high-quality data makes it possible.

We believe data is one of the most underdeveloped layers of the AI stack. Our work focuses on building and evaluating high-value datasets grounded in real-world workflows and economically meaningful tasks.

We work across multiple domains to create safe, high-fidelity datasets that preserve the structure and context needed to train advanced AI systems.

Our research spans data quality, evaluation design, privacy-preserving transformation, workflow reconstruction, and task-grounded AI training data.

At Data Lab, applied research is tightly connected to real-world deployment. Researchers work directly with large-scale datasets, production systems, and frontier AI training problems.

Role Overview

Data is the foundation of AI performance, and we believe model quality starts with data quality. As AI systems become more agentic, a critical challenge is understanding which real-world datasets, tasks, and environments actually lead to better model behavior.

We’re seeking a Machine Learning Researcher focused on RL and agentic systems to help define, design, and evaluate the datasets, tasks, environments, and benchmarks used to assess advanced AI systems. In this role, you’ll work closely with research and engineering teams to translate real-world workflows into high-value datasets and evaluation assets: structured tasks, interactive environments, benchmark suites, and quality scorecards that help us understand how models perform in realistic settings.

You’ll help define what “high-quality agentic data” means in practice, using statistical, computational, and ML-driven methods to evaluate dataset quality, task design, environment fidelity, and downstream model performance. You’ll work on the core problems of benchmarking real-world data, measuring how well models perform on that data, and designing RL-style or agentic environments that capture the structure of meaningful work.

This is an ideal role for someone with a strong machine learning background who is excited by reinforcement learning, agentic systems, evaluation, and the role of data in shaping model behavior. You should be excited by the opportunity to build the datasets and benchmarks that help define what high-quality real-world data looks like for frontier AI systems.

What You’ll Do
Design and build datasets, tasks, and environments

Design and build datasets, tasks, environments, and evaluation assets for benchmarking agentic systems and multi-step model behavior.

Translate real-world workflows into structured tasks, interaction traces, trajectories, stateful environments, and verifiable outcomes that can be used to evaluate advanced AI systems.

Develop frameworks for evaluating real-world data quality

Develop frameworks that assess diversity, realism, coverage, fidelity, informativeness, and downstream usefulness of datasets for agentic systems.

Build quality scorecards and evaluation methods that make dataset strengths, weaknesses, and failure modes legible across teams.

Benchmark model behavior in RL and agentic settings

Evaluate planning, tool use, robustness, recovery from failure, task completion, and generalization behavior in RL-style or agentic environments.

Connect model failures back to concrete dataset, environment, or…

To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary