Site Reliability Engineer - Inference
Listed on 2026-06-18
-
IT/Tech
AI Engineer (Applied/Software), Systems Engineer, Machine Learning/ ML Engineer
Join to apply for the Site Reliability Engineer
- Inference role at Jobright.ai
2 days ago Be among the first 25 applicants
Join to apply for the Site Reliability Engineer
- Inference role at Jobright.ai
Get AI-powered advice on this job and more exclusive features.
Jobright is an AI-powered career platform that helps job seekers discover the top opportunities in the US. We are NOT a staffing agency. Jobright does not hire directly for these positions. We connect you with verified openings from employers you can trust.
Job Summary:
Lambda is the #1 GPU Cloud for ML/AI teams, providing tools for building, testing, and deploying AI products Site Reliability Engineer
- Inference will work on developing a large-scale platform for running AI models and building a high-throughput, low-latency API for distributed systems.
Responsibilities:
• Work on our Inference service, helping us to develop our large-scale platform for running new, cutting-edge models across tens of thousands of GPUs
• Help build a high-throughput, low-latency API and routing system running at geographically-distributed scale
• Shape a highly reliable distributed system with a focus on reducing operational overhead and deep observability and capacity management.
• Work with the team and our internal ML researchers to adopt and improve new inference engines, models and architectures across a variety of different mediums (such as text, image, video and audio)
• Tackle global networking challenges to deliver the lowest possible latency to our users across all of Lambda’s available capacity
• Help push Lambda forward into the state of the art, and be part of a team that is operating right at the edge of new developments in the industry.
Qualifications:
Required:
• 8 or more years of experience as a software reliability engineer or software engineer working on large-scale, internet-facing production services
• Highly skilled at writing Go and Python
• Experience with bare-metal system installation and administration
• Experience deploying applications and operators on Kubernetes
• Product-focused, balancing operational needs and keeping overheads down with the need to ship features at a rapid pace
• Proven track record of working in an environment with rapid deployment and the ability to stay on top of shifting priorities as the industry rapidly develops
• Willingness to take ownership of projects and help drive them forwards through design, implementation, launch, and maintenance.
Preferred:
• Experience working with machine learning models
• Experience operating large-scale, geographically distributed systems
• Experience developing Kubernetes operators and components
Company:
Lambda provides infrastructure, cloud services, and software for the training and inferencing of AI models. Founded in 2012, headquartered in San Jose, California, USA, team size 201-500 employees, currently Late Stage. Lambda has a track record of offering H1B sponsor ships.
Seniority level- Seniority level
Mid-Senior level
- Employment type
Full-time
- Industries Software Development
Referrals increase your chances of interviewing by 2x
Inferred from the description for this jobMedical insurance
Vision insurance
401(k)
Get notified when a new job is posted.
Sign in to set job alerts for “Site Reliability Engineer” roles.San Francisco, CA $-$ 4 days ago
Software Engineer, Infrastructure, Early CareerSan Francisco, CA $-$ 11 hours ago
San Francisco, CA $-$ 3 days ago
San Francisco, CA $-$ 1 day ago
San Francisco, CA $-$ 1 day ago
San Francisco, CA $-$ 4 months ago
San Francisco, CA $99,500.00-$ 2 weeks ago
Full-Stack Software Engineer (Jr/Mid level)San Francisco, CA $-$ 1 day ago
San Francisco, CA $56.25-$ 5 days ago
Software Development Engineer I - Frontend & MobileSan Francisco, CA $99,500.00-$ 3 weeks ago
San Francisco, CA $-$ 2 months ago
San Francisco, CA $-$ 3 months ago
San Francisco, CA $-$ 9 months ago
San Francisco, CA $-$ 2 weeks ago
Software Engineer, AI Intern (Summer 2026)San Francisco, CA $-$ 2 months ago
Software Engineer, AI Intern (Winter 2026)San Francisco, CA $-$ 2 weeks ago
San Francisco, CA $-$ 3 days ago
Software Engineer, Frontend (All Levels)San Francisco, CA $-$ 2 weeks ago
San Francisco, CA $-$ 4 days ago
San Francisco, CA $-$ 2 weeks ago
San Francisco, CA $-$ 8 months ago
San Francisco, CA $-$ 2 years ago
San Francisco, CA $-$ 2 years ago
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).