×
Register Here to Apply for Jobs or Post Jobs. X

Machine Learning, Platform Engineer

Job in San Francisco, San Francisco County, California, 94199, USA
Listing for: Gravity Engineering Services Pvt Ltd.
Full Time position
Listed on 2026-06-17
Job specializations:
  • Software Development
    Backend Developer, Software Engineer, DevOps
Salary/Wage Range or Industry Benchmark: 125000 - 150000 USD Yearly USD 125000.00 150000.00 YEAR
Job Description & How to Apply Below

About the Role

Our team focuses on enabling custom models and dedicated inference on Together. We are responsible for building a container platform, optimizing autoscaling, minimizing cold starts, achieving the best end-to-end model performance, and providing a best-in-class developer experience with great tooling. We often focus on video or audio generation across the stack:
CUDA kernels
, pytorch optimization
, inference engines
, container orchestration
, queueing theory
, etc. An ideal candidate will be great at profiling/optimization but know the word kubernetes
, or be intimately familiar with multi-cluster scheduling and have some sense of ML bottlenecks.

Responsibilities
  • New hires may work on multi-cluster orchestration, portfolio optimization, predictive autoscaling, control panes, model bring-up, model optimization, APIs for managing deployments, inference worker SDKs, and CLI tools.
  • Analyze and improve the robustness and scalability of existing distributed systems, APIs, databases, and infrastructure
  • Partner with product teams to understand functional requirements and deliver solutions that meet business needs
  • Write clear, well-tested, and maintainable software and IaC for both new and existing systems
  • Conduct design and code reviews, create developer documentation, and develop testing strategies for robustness and fault tolerance
Requirements
  • 5+ years of demonstrated experience in building large scale, fault tolerant, distributed systems.
  • Experience running serverless inference platforms, doing model bring-up on short notice, being on call, or running a cloud provider is a very big plus
  • Good taste and ability to thoughtfully discuss how what you’ve built has failed over time
  • Experience designing, analyzing and improving efficiency, scalability, and stability of various system resources
  • Excellent understanding of low level operating systems concepts including concurrency, networking and storage, performance and scale
  • Expert-level programmer in one or more of Python
    , Golang
    , Rust
    , C++, or Haskell
  • Proficiency in writing and maintaining Infrastructure as Code (IaC) using tools like Terraform
  • Experience with Kubernetes internals or other container orchestration systems
  • Sound judgement for when to use and when to not use LLMs for code
  • Bachelor’s or Master’s degree in Computer Science, Computer Engineering, or a related technical field, or equivalent practical experience
  • Writing-heavy roles or companies are a plus
#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary