×
Register Here to Apply for Jobs or Post Jobs. X

Principal Engineer, Model Development Platform

Job in Sunnyvale, Santa Clara County, California, 94087, USA
Listing for: EngineersOfAI
Full Time position
Listed on 2026-06-23
Job specializations:
  • IT/Tech
Salary/Wage Range or Industry Benchmark: 150000 - 200000 USD Yearly USD 150000.00 200000.00 YEAR
Job Description & How to Apply Below

Responsibilities

  • System architecture & reliability - Design and evolve the platform's overall architecture for reliability, observability, and scalability. Set performance, latency, and availability targets, and drive the engineering standards to meet them.
  • Cross-domain technical leadership - Unify the platform across disciplines, from front-end UIs and distributed training to Spark data pipelines and optimization-based experiment scheduling, ensuring systems interoperate cleanly.
  • Hands-on problem solving - Dive into the hardest challenges across subteams, lead architectural reviews, and propose pragmatic solutions that balance innovation with operational simplicity.
  • Experimentation & scheduling systems - Build systems that optimize how models are tested in simulation and on-road, using techniques like linear programming and heuristic optimization to balance hardware, safety, and research priorities while improving throughput and turnaround.
  • Data & compute infrastructure - Architect pipelines that ingest, transform, and enrich petabytes of fleet sensor data, and drive efficient compute use across GPU, CPU, cloud, and edge for both prototyping and large-scale training.
  • Strategic collaboration - Partner with Product, Research, and Operations to align architecture with user needs and co-own the platform's long-term roadmap.
About You

Essential

  • Technical Leadership at Scale – 10+ years of experience designing and building large-scale distributed systems, ML/AI infrastructure, full stack web application, or developer platforms, including at least 3 years as a staff or principal-level engineer.
  • Architectural Depth & Breadth – Proven ability to design systems spanning web platforms, ML pipelines, and large-scale compute orchestration (e.g., Spark, Ray,
    Kubernetes
    , Airflow, MLflow).
  • Reliability and performance – Experience driving platform reliability improvements, defining SLAs/SLOs, and building self-healing and observable systems that operate at “four nines” availability or better.
  • Hands-On Systems Design
#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary