Cloud Inference Engineer
Job in
San Francisco, San Francisco County, California, 94199, USA
Listed on 2026-05-29
Listing for:
Modular
Full Time
position Listed on 2026-05-29
Job specializations:
-
Software Development
Software Engineer, Cloud Engineer - Software, DevOps, AI Engineer
Job Description & How to Apply Below
Requirements
- We're seeking engineers who are passionate about pushing the boundaries of distributed inference systems and enjoy working at the intersection of large-scale systems and machine learning ,
- We are looking for candidates based on their breadth and depth of experience in backend engineering, AI inference, and distributed systems development ,
- 5+ years of experience working in backend engineering ,
- Experience with kubernetes and operating your own services ,
- Ability to create durable, reusable software tools and libraries that are leveraged across teams and functions ,
- Experience in machine learning technologies and use cases ,
- Creativity and curiosity for solving complex problems, a team-oriented attitude that enables you to work well with others, and alignment with our culture ,
- Strongly identifies with our core company cultural values ,
- (Desirable) Experience with high performance computing / networking ,
- (Desirable) Experience working on high scale ML inference infrastructure (traditional AI or genAI) ,
- (Desirable) Familiarity with golang
- In the Cloud Inference team, we are focused on building end to end distributed LLM inference deployments that are fully vertically integrated with the MAX stack ,
- Our goal is to make inference both the fastest and most scalable while also building an easiest platform for deploying and scaling models for enterprises and developers alike ,
- If this sounds exciting, we invite you to join our world-leading AI infrastructure team and help drive our industry forward! ,
- Build & ship a LLM focused inference platform using best in class inference techniques (disaggregated inference, multi-node deployment of large models, high performance networking, distributed kv-cache management, high throughput batch processing, etc) ,
- Push the envelope for operational excellence with request-to-kernel observability, multi-cloud deployments, clever autoscaling, cold-start optimizations, and more ,
- Collaborate with our kernels and genAI teams to achieve SOTA application performance by integrating SOTA kernel & serving optimizations with SOTA cluster optimizations ,
- Build helm charts, kubernetes operators, and more to make a create simple, effective, maintainable deployments
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×