Principal ML Data Platform Engineer Job California Missouri USA,IT/Tech

Location: California

Senior/Principal Backend Engineer, ML Data Platform (0→1 Build)

This company is an emerging AI and data-infrastructure startup building a next-generation platform for processing large volumes of sensitive, unstructured data. They are hiring a Senior/Principal Backend Engineer to build their machine learning data infrastructure from the ground up.

This is a highly autonomous, high-ownership role for an engineer who thrives in ambiguity and can independently architect systems, build pipelines, and design ML experiment frameworks without depending on an existing data science team. The engineering culture requires in-person collaboration in San Francisco four days per week.

THE ROLE

You will be responsible for creating the entire ML data and experimentation platform, including systems for model evaluation, versioning, data ingestion, and large‑scale processing. The work spans backend engineering, ML evaluation frameworks, and data‑pipeline architecture.

KEY RESPONSIBILITIES

Build end‑to‑end evaluation pipelines for NLP and classification models
Design frameworks for experiment tracking, rapid model iteration, and A/B testing
Architect data flows across databases, cloud storage, and distributed compute environments
Create reproducible ML pipelines that function in both cloud and on‑prem setups
Build tooling for ingesting and processing diverse unstructured data, including text, transcripts, and PDFs
Establish foundational MLOps practices and model‑performance benchmarking
Own the full pipeline from raw data ingestion through dataset generation

CORE CHALLENGES

Standing up ML infrastructure from scratch
Developing evaluation systems for NER and classification models
Bridging structured databases with large data‑lake environments
Optimizing distributed compute jobs across Spark, Databricks, and on‑prem clusters
Scaling pipelines to very large data volumes
Operating without a staffed data science function

WHAT THEY’RE LOOKING FOR

5+ years backend engineering experience with deep data‑pipeline exposure
Significant Spark experience (preferably PySpark), cloud + on‑prem hybrid familiarity
Ability to design ML experiments and evaluate model performance
Strong Python skills and comfort with ML toolkits
Experience with Postgre

SQL, S3/Parquet, and distributed batch processing
NER/NLP understanding and prior ML‑infrastructure experience
Bonus: exposure to audio or document‑processing pipelines

TECHNICAL ENVIRONMENT

Spark (cloud + on‑prem)
PostgreSQL
S3‑based data lakes
Batch processing workflows
NLP and classification model evaluation at scale

SUCCESS LOOKS LIKE

Production‑ready evaluation pipelines within 90 days
Reliable experiment‑tracking system that accelerates model‑performance iteration
Scalable data infrastructure capable of supporting high‑volume workloads
Faster model‑improvement cycles through effective sampling and evaluation design

WHY THIS ROLE IS SPECIAL

You are the founding owner of ML infrastructure
Massive ownership across architecture, systems, and experimentation
Direct, measurable impact on model quality and platform capabilities
Opportunity to define best practices, standards, and systems from day one

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language