×
Register Here to Apply for Jobs or Post Jobs. X

Data Engineer

Job in San Francisco, San Francisco County, California, 94199, USA
Listing for: SupportFinity™
Full Time position
Listed on 2026-03-15
Job specializations:
  • IT/Tech
    Data Engineer, Machine Learning/ ML Engineer
Salary/Wage Range or Industry Benchmark: 60000 - 80000 USD Yearly USD 60000.00 80000.00 YEAR
Job Description & How to Apply Below

Overview

About Subquadratic Subquadratic is a foundational AI infrastructure company focused on speech and language. We build speech-to-text, text-to-speech, and speech-to-speech models designed for real-time, production environments. Our technology removes performance and cost bottlenecks that limit how voice and conversational AI scale in the real world.

The Role

We are hiring a Data Engineer to build the data infrastructure that powers Subquadratic's multi-modal AI research. You will design and scale data pipelines for pretraining, mid training, and post-training at trillion-token scale, process diverse data sources across language and speech domains, and generate high-quality synthetic data for model training.

This is a high-impact role where your work directly determines training quality and efficiency. If you re passionate about building data systems that power cutting-edge AI research, this role is for you.

What You ll Do
  • Build and scale data pipelines for pretraining, mid training, and post-training at trillion+ token scale across language and speech domains
  • Process and curate large-scale datasets including cleaning, deduplication, quality filtering, and optimization for distributed training
  • Generate synthetic data for model training and evaluation across diverse tasks and domains
  • Design efficient data loading systems achieving high throughput across multi-node training clusters
  • Build data versioning and reproducibility systems to track dataset compositions and enable reproducible experiments
  • Collaborate with ML engineers and researchers to optimize pipelines and improve data quality
Minimum Qualifications
  • Bachelor s degree in Computer Science, Engineering, or related field, or equivalent practical experience
  • 3+ years of experience building large-scale data pipelines for machine learning or data-intensive applications
  • Strong programming skills in Python and experience with data processing frameworks (Spark, Dask, Ray, or similar)
  • Experience with data quality techniques including deduplication, filtering, and validation at scale
  • Proven ability to optimize data pipelines for performance and throughput in distributed systems
  • Experience working with large datasets (100GB-10TB+) and understanding of storage systems and data formats
Preferred Qualifications
  • Experience building data pipelines for LLM pretraining or large-scale ML training
  • Hands-on experience with synthetic data generation for language or speech models
  • Experience with text processing at scale: tokenization, deduplication (Min Hash, LSH), and quality assessment
  • Familiarity with audio/speech data processing and dataset curation
  • Knowledge of data contamination detection and dataset versioning best practices
  • Experience optimizing data loaders for PyTorch or Tensor Flow at scale
  • Understanding of distributed storage systems (S3, GCS, HDFS) and data streaming patterns
Compensation & Benefits
  • Competitive base salary
  • Performance-based bonus aligned with research and model milestones
  • Equity participation
  • Comprehensive health, dental, and vision coverage
  • Flexible paid time off
EEO & Compliance

Subquadratic is proud to be an equal-opportunity employer. We are committed to building a diverse and inclusive culture that celebrates authenticity to win as one. We do not discriminate on the basis of race, religion, color, national origin, gender, gender identity, sexual orientation, age, marital status, disability, protected veteran status, citizenship or immigration status, or any other legally protected characteristics.

Subquadratic uses E-Verify to confirm employment eligibility in compliance with federal law. For more information please visit: https://(Use the "Apply for this Job" box below)..gov

Please note:

We do not accept unsolicited resumes from recruiters or employment agencies and will not be responsible for any fees related to unsolicited resumes.

About the company

Subquadratic

Be vigilant about potential scams, phishing attempts, or fraudulent activities, and seek credible sources or reviews to assess the trustworthiness of the company. Remember, your personal and financial security is paramount, and taking preventive measures is crucial to safeguarding your information from potential risks and unauthorized use. Support Finity is not responsible for any consequences that may arise from disclosing such information to unauthorized or fraudulent entities.

#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary