Founding Engineer – Full Stack ML DevTools & Systems
Listed on 2026-05-31
-
Software Development
Cloud Engineer - Software, DevOps, AI Engineer
Founding Engineer – Full Stack ML Dev Tools & Systems
Location: San Francisco, CA
Type: Full‑Time
Base Compensation: $150,000 – $250,000
Equity: Competitive Series A Equity Package
This is a founding‑level engineering role within a Series A AI infrastructure company building core developer tools and platform primitives for post‑training, evaluation, and reinforcement learning workflows.
The platform enables ML engineers and researchers to:
Create structured training data
Evaluate model performance reliably and reproducibly at scale
This is a high‑ownership role at the center of the product. You will operate across the Python SDK, backend systems, infrastructure, and developer experience—partnering directly with frontier labs, enterprise AI teams, and AI‑native startups.
This is not a narrow feature role. You will shape foundational platform architecture and developer workflows that power advanced model training systems.
Core ResponsibilitiesDesign and implement backend systems supporting post‑training workflows, dataset primitives, run tracking, and artifact management
Build reliable execution and orchestration systems with strong isolation and reproducibility
Improve observability, debugging capabilities, and performance across job execution and distributed data pipelines
Contribute to containerized infrastructure and Kubernetes‑based deployment patterns
Own and evolve the Python SDK with clean APIs, strong documentation, intuitive defaults, and extensibility
Design developer‑friendly abstractions for reinforcement learning, evaluation loops, and training workflows
Develop evaluation‑native workflows connecting capability measurement, data creation, training, and re‑evaluation loops
Improve CLI tools, developer interfaces, and local‑to‑cloud workflows
Work across compute, networking, storage, and IAM configurations
Design systems that are scalable, reproducible, and secure
Collaborate on distributed systems design and execution infrastructure
Partner directly with ML engineers and researchers to translate real‑world workflows into platform improvements
Incorporate structured customer feedback into roadmap decisions
Operate at the intersection of research needs and production reliability
RequirementsStrong production experience in Python
Comfort operating across the stack, including APIs, backend systems, data systems, and frontend integration
Deep understanding of Docker and Linux environments
Strong product instincts with a bias toward shipping
Demonstrated end‑to‑end ownership of production systems
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).