More jobs:
Senior Research Engineer, Training Data Infrastructure in Foundation Models
Job in
Cupertino, Santa Clara County, California, 95014, USA
Listed on 2026-06-01
Listing for:
Apple Inc.
Apprenticeship/Internship
position Listed on 2026-06-01
Job specializations:
-
Engineering
Data Engineer, AI Engineer
Job Description & How to Apply Below
If you're drawn to hard problems where the research and the product are inseparable, this is the team.
This position operates at the convergence of Software Engineering and Machine Learning Research. Unlike traditional backend roles, this position requires you to design systems where the outcome is the statistical distribution and quality of data itself. You will work alongside Research Scientists to transform theoretical observations into concrete, scalable engineering solutions. Your core focus will be the architecture of our Data Acquisition, Processing, and Repository Management systems for Large Model training.
You will lead technical efforts to enable active, quality-driven data curation, including filtering, deduping, synthetic data generation and data mixing, ensuring our models are trained on the highest-quality information available.
Research
Collaboration:
Experience working within or closely with ML research organizations (e.g., as a Research Engineer), with an ability to translate research results into engineering implementations. Domain Knowledge:
Familiarity with lifecycle of modern LLM training, end-to-end workflows, and underlying system architecture. Complex Data Types:
Experience in processing complex data modalities beyond plain text, such as source code repositories, images, videos, and audios.
Education:
Bachelor's degree in Computer Science, Electrical Engineering, or Mathematics. Technical Expertise: 4+ years of software engineering experience with a specific focus on Data Infrastructure, Distributed Systems, or AI/ML Engineering. Language Proficiency:
Expert fluency in Python, and strong competence in system languages such as C++. Cloud Architecture:
Extensive experience architecting solutions on major public cloud platforms (e.g. GCP) to build scalable data systems (e.g. with Apache Beam, GCS) Performance Engineering:
Deep experience profiling and optimizing high-throughput data systems. Demonstrated ability to debug distributed bottlenecks (e.g., stragglers, I/O saturation), optimize data formats and provide efficient data storage solutions.
Position Requirements
10+ Years
work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×