Cloud Inference Engineer Job San Francisco area,California USA,Software Development

Requirements

We're seeking engineers who are passionate about pushing the boundaries of distributed inference systems and enjoy working at the intersection of large-scale systems and machine learning
We are looking for candidates based on their breadth and depth of experience in backend engineering, AI inference, and distributed systems development
5+ years of experience working in backend engineering
Experience with kubernetes and operating your own services
Ability to create durable, reusable software tools and libraries that are leveraged across teams and functions
Experience in machine learning technologies and use cases
Creativity and curiosity for solving complex problems, a team-oriented attitude that enables you to work well with others, and alignment with our culture
Strongly identifies with our core company cultural values
(Desirable) Experience with high performance computing / networking
(Desirable) Experience working on high scale ML inference infrastructure (traditional AI or genAI)
(Desirable) Familiarity with golang

What the job involves

In the Cloud Inference team, we are focused on building end to end distributed LLM inference deployments that are fully vertically integrated with the MAX stack
Our goal is to make inference both the fastest and most scalable while also building an easiest platform for deploying and scaling models for enterprises and developers alike
If this sounds exciting, we invite you to join our world-leading AI infrastructure team and help drive our industry forward!
Build & ship a LLM focused inference platform using best in class inference techniques (disaggregated inference, multi-node deployment of large models, high performance networking, distributed kv-cache management, high throughput batch processing, etc)
Push the envelope for operational excellence with request-to-kernel observability, multi-cloud deployments, clever autoscaling, cold-start optimizations, and more
Collaborate with our kernels and genAI teams to achieve SOTA application performance by integrating SOTA kernel & serving optimizations with SOTA cluster optimizations
Build helm charts, kubernetes operators, and more to make a create simple, effective, maintainable deployments

#J-18808-Ljbffr