Remote Human Baseliner ML Research Tasks - AI Trainer
Concord, Cabarrus County, North Carolina, 28025, USA
Listed on 2026-07-03
-
Software Development
Machine Learning/ ML Engineer, AI Engineer (Applied/Software)
Overview
We are hiring experienced machine learning engineers and researchers to serve as human base liners for evaluations of open-ended machine learning research tasks. These evaluations measure how well AI agents perform on realistic AI R&D problems. To interpret agent performance, we also need strong human reference points: skilled practitioners attempting the same tasks under the same time and compute constraints. As a baseliner, you will complete self-contained ML research tasks in a sandboxed environment, working independently with your preferred tools and workflow.
Your performance will be used as a benchmark against which frontier-model agents are evaluated.
- Attempt open-ended machine learning research tasks under a fixed time and compute budget (work trial)
- Work independently in a sandboxed Linux environment with internet access
- Use your preferred tooling, including IDEs and AI coding assistants such as Cursor, Claude Code, and ChatGPT
- Record your full working session via screen recording
- Complete a short pre-task and post-task questionnaire
- Submit your final work product, screen recording, and completed questionnaires. Post this you will be hired for a longer commitment
Minimum 20 hours per week if selected
More availability is strongly preferred
RequirementsCandidates must meet all of the following:
- 3+ years of machine learning experience - Time spent in a PhD program counts toward this requirement
- Undergraduate and master’s experience does not count
- Attended a top-100 university or worked at FAANG or a comparable company
- Experience with at least one major ML framework such as PyTorch, JAX, or Tensor Flow
- Deep, hands‑on expertise in at least one of the focus areas below:
- Pretraining under tight data and compute budgets
- PPO, reward shaping, custom `gym` / `gymnasium` environments, and throughput tuning
- Full fine‑tuning, LoRA, QLoRA, DPO, RLHF, RLAIF, and distillation
- Large‑scale corpus filtering, deduplication, subsampling, and benchmark contamination avoidance
- Architecture design under strict parameter‑count or size constraints
- Modifying pretrained architectures, including attention patterns, pooling heads, or training objectives
- Contrastive training for embedding or retrieval models
- Generative vision or video modeling
- Multilingual or low‑resource language experience
- Image or video data pipelines at scale
- Experience balancing competing model objectives such as safety and capability
- Prior work as an ML evaluator, red‑teamer, or baseliner
Candidates must have strong practical experience in at least one of the following:
- Pretraining
: training transformer language models from scratch - Reinforcement learning
: training agents in custom or existing environments - Post‑training
: fine‑tuning and aligning LLMs - Dataset curation
: building and cleaning large text corpora for LLM training - Model architecture
: designing and modifying neural network architectures
- One baseline attempt per contractor per task
- Each task may only be attempted once by a given contractor
- All work is confidential and covered by NDA
- Compute and environment are provided; no personal GPU is required
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).