Research Engineer, Pre-training Data
Remote / Online - Candidates ideally in
Coos Bay, Coos County, Oregon, 97458, USA
Listed on 2026-02-03
Coos Bay, Coos County, Oregon, 97458, USA
Listing for:
Reddit
Apprenticeship/Internship, Remote/Work from Home
position Listed on 2026-02-03
Job specializations:
-
Software Development
Machine Learning/ ML Engineer, Data Scientist, Data Engineer
Job Description & How to Apply Below
Overview
Employer Industry: Social Media and Online Communities
Why consider this job opportunity- Salary up to $322,000
- Opportunity for equity in the form of restricted stock units
- Comprehensive healthcare benefits and income replacement programs
- Flexible vacation and paid volunteer time off
- Generous paid parental leave and family planning support
- Remote work flexibility within the United States
- Architect and implement high-throughput, deterministic data sampling systems for distributed training clusters
- Design and execute dynamic curriculum learning strategies for improved model stability and reasoning
- Engineer logic for serializing Reddit’s complex conversational trees into optimal training contexts
- Formulate and validate statistical hypotheses regarding data mixtures to minimize bias and maximize token quality
- Mentor senior engineers and researchers on system design and performance optimization
- 8+ years of software engineering experience focused on machine learning infrastructure or LLM pre-training
- Expert proficiency in Python and distributed data processing frameworks (e.g., Ray Data, Spark)
- Experience handling unstructured and semi-structured data at scale, including text, code, images, and audio/video
- Strong mathematical foundation in probability, statistics, and importance sampling theory
- Deep understanding of pre-training dynamics and data quality impacts on model performance
- Experience with JAX or PyTorch internals related to distributed data loading
- Experience with multimodal datasets (image/video + text) and vision-language preprocessing
- Proficiency in Rust or C++ for performance-critical data path optimization
- Published research or significant practical experience in active learning or automated data selection
#Social Media #Machine Learning #Remote Work #Career Opportunity #Healthcare Benefits
"We prioritize candidate privacy and champion equal-opportunity employment. Central to our mission is our partnership with companies that share this commitment. We aim to foster a fair, transparent, and secure hiring environment for all. If you encounter any employer not adhering to these principles, please bring it to our attention immediately. We are not the EOR (Employer of Record) for this position.
Our role in this specific opportunity is to connect outstanding candidates with a top-tier employer."
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×