×
Register Here to Apply for Jobs or Post Jobs. X

AI​/ML Infra Engineer - Hosting

Job in San Francisco, San Francisco County, California, 94199, USA
Listing for: Hamilton Barnes Associates Limited
Full Time position
Listed on 2026-06-25
Job specializations:
  • IT/Tech
    SRE/Site Reliability, IT Infrastructure, Systems Engineer
Salary/Wage Range or Industry Benchmark: 250000 USD Yearly USD 250000.00 YEAR
Job Description & How to Apply Below

Ready to take the next step in your career?

Join a rapidly growing AI cloud infrastructure provider building high-performance compute platforms for large-scale AI training and inference workloads. With expanding GPU infrastructure across Europe and the United States, the organisation enables AI teams to access scalable compute environments without traditional infrastructure limitations.

As a Senior ML Infrastructure Engineer, the successful candidate will help build and scale Kubernetes-based machine learning platforms supporting large-scale training and inference systems. The role focuses on workload orchestration, GPU scheduling, inference optimisation, and distributed systems reliability, working alongside highly technical teams at the intersection of machine learning, cloud infrastructure, and high-performance computing.

If you would like to learn more about this opportunity, feel free to reach out and apply today!

Responsibilities
  • Build and scale internal ML infrastructure platforms focused on AI training and inference workloads
  • Develop systems for workload orchestration, job scheduling, and reliable execution across Kubernetes environments
  • Improve and maintain inference infrastructure, including model packaging, deployment, and serving optimisation
  • Collaborate with infrastructure and platform teams to maximise GPU utilisation, hardware performance, and operational reliability
  • Design scalable systems and reusable platform capabilities that improve developer experience and operational efficiency
  • Support CI/CD, Git Ops, and infrastructure automation workflows across ML platform environments
  • Troubleshoot GPU performance, distributed systems behaviour, networking, and storage bottlenecks
  • Contribute to platform architecture discussions and long-term infrastructure scalability initiatives
Skills/Must Have
  • Strong ML engineering background with hands‑on experience supporting both training and inference infrastructure
  • Experience with infrastructure engineering, platform engineering, or software engineering environments
  • Strong programming skills in Python (Go experience is a plus)
  • Deep experience with Kubernetes, including operators, CRDs, workload orchestration, and GPU scheduling
  • Comfortable operating in Linux environments and debugging GPU‑related issues, including CUDA, drivers, networking, and file systems
  • Strong systems thinking and ability to design scalable, reliable, distributed infrastructure
  • Experience with CI/CD pipelines, Git Ops workflows, and infrastructure automation
Desirable Skills
  • Familiarity with orchestration and scheduling platforms such as Kueue, Flyte, Ray, or Slurm
  • Experience with PyTorch or JAX environments
  • Hands‑on experience deploying inference workloads using vLLM, SGLang, TensorRT‑LLM, or Triton
  • Knowledge of GPU networking and performance optimisation, including Infini Band, NVLink, and NCCL
  • Experience working within HPC or large-scale distributed systems environments
Benefits
  • Stock options
Salary
  • $250,000 base salary
#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary