More jobs:
Job Description & How to Apply Below
Staff/Senior Data Engineer: AI Training Data (2-4 Months Contract)
Location:
Remote
Role Type:
Contract (2-4 Months)
Time Commitment: 40 hrs/week (Full-time availability required)
Compensation:
Hyper-competitive hourly rate (matching Tier-1 Staff engineering bands)
Experience:
6-12+ years
About Bespoke Labs
Bespoke Labs is a premier, VC-backed AI Research lab with an exceptionally talent-dense team of IIT and Ivy League alumni. We don’t just build tooling around AI—we build the massive-scale data systems and reasoning architectures that directly power next-generation models. Our research shapes the frontier of AI: we’ve published breakthroughs like GEPA, driven foundational datasets like Open Thoughts, and shipped state-of-the-art models including Bespoke-Mini Check and Bespoke-Mini Chart.
More on our website https://(Use the "Apply for this Job" box below). :)
Role Overview
We are looking for a top-tier Senior/Staff Data Engineer for a high-impact, 2-month sprint. You will leverage your deep expertise in enterprise-grade data platforms to architect and build the complex curation systems required for advanced AI model training.
This is not a traditional ETL pipeline role. We need a heavy-hitter who has already operated production data platforms at scale inside large, complex organizations (FAANG, Fortune 100). You will use the mental models, architectural intuition, and coding skills you've developed over your career to generate, transform, and evaluate the data that trains the next generation of AI.
What You Will Do (The Contract)
- Architect AI-Scale Systems:
Design the overarching data architecture and processing topology needed to programmatically curate and shape datasets at TB/PB scale, ensuring low latency and high consistency.
- Hands-On Development:
Write production-grade code (Python/Scala, Spark) to build custom ingestion logic, highly efficient transformation scripts, and performant data validation checks.
- Complex Data Logic:
Implement advanced filtering, deduplication, and quality-scoring algorithms at scale, ensuring the resulting data objects are optimized for LLM/ML consumption.
- Quality & Performance Tuning:
Rigorously test, benchmark, and optimize processing workloads (CPU/memory tuning, partitioning strategies in Spark/Iceberg) to meet aggressive throughput targets.
- Domain Subject Matter Expert:
Act as the ultimate technical authority on distributed systems, data processing, and cloud structures to ensure the training data factory meets enterprise-grade accuracy.
What You Bring to the Table (Your Past Experience)
To be successful in this contract, you must have a track record of:
- End-to-End Ownership:
Designing and owning enterprise data platforms (batch + streaming).
- High-Throughput Processing:
Building and operating Kafka-first streaming pipelines.
- Lakehouse Architecture:
Utilizing Apache Iceberg, Delta Lake, or Hudi for analytics and ML at scale.
- Reliability Engineering:
Ensuring data reliability through SLAs, monitoring, backfills, and recovery.
- Scale:
Processing billions of events and managing TB–PB scale data systems.
Required Qualifications (Non-Negotiable)
- Experience:
6+ years of Data Engineering experience.
- Seniority:
Demonstrated Senior/Staff-level ownership of production data platforms.
- Pedigree:
Background at Tier-1 enterprises (FAANG, large SaaS, Fortune 100).
- Technical Stack:
Deep fluency in Python/Scala, Spark, Kafka, Airflow, and Major Cloud Warehouses (Snowflake, Big Query, Redshift).
Position Requirements
10+ Years
work experience
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×