×
Register Here to Apply for Jobs or Post Jobs. X

Research Engineer, Pre-training Data

Remote / Online - Candidates ideally in
Coos Bay, Coos County, Oregon, 97458, USA
Listing for: Reddit
Apprenticeship/Internship, Remote/Work from Home position
Listed on 2026-02-03
Job specializations:
  • Software Development
    Machine Learning/ ML Engineer, Data Scientist, Data Engineer
Salary/Wage Range or Industry Benchmark: 322000 USD Yearly USD 322000.00 YEAR
Job Description & How to Apply Below
Position: Staff Research Engineer, Pre-training Data

Overview

Employer Industry: Social Media and Online Communities

Why consider this job opportunity
  • Salary up to $322,000
  • Opportunity for equity in the form of restricted stock units
  • Comprehensive healthcare benefits and income replacement programs
  • Flexible vacation and paid volunteer time off
  • Generous paid parental leave and family planning support
  • Remote work flexibility within the United States
Responsibilities
  • Architect and implement high-throughput, deterministic data sampling systems for distributed training clusters
  • Design and execute dynamic curriculum learning strategies for improved model stability and reasoning
  • Engineer logic for serializing Reddit’s complex conversational trees into optimal training contexts
  • Formulate and validate statistical hypotheses regarding data mixtures to minimize bias and maximize token quality
  • Mentor senior engineers and researchers on system design and performance optimization
Qualifications
  • 8+ years of software engineering experience focused on machine learning infrastructure or LLM pre-training
  • Expert proficiency in Python and distributed data processing frameworks (e.g., Ray Data, Spark)
  • Experience handling unstructured and semi-structured data at scale, including text, code, images, and audio/video
  • Strong mathematical foundation in probability, statistics, and importance sampling theory
  • Deep understanding of pre-training dynamics and data quality impacts on model performance
Preferred Qualifications
  • Experience with JAX or PyTorch internals related to distributed data loading
  • Experience with multimodal datasets (image/video + text) and vision-language preprocessing
  • Proficiency in Rust or C++ for performance-critical data path optimization
  • Published research or significant practical experience in active learning or automated data selection

#Social Media #Machine Learning #Remote Work #Career Opportunity #Healthcare Benefits

"We prioritize candidate privacy and champion equal-opportunity employment. Central to our mission is our partnership with companies that share this commitment. We aim to foster a fair, transparent, and secure hiring environment for all. If you encounter any employer not adhering to these principles, please bring it to our attention immediately. We are not the EOR (Employer of Record) for this position.

Our role in this specific opportunity is to connect outstanding candidates with a top-tier employer."

#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary