LLMOps Engineer Job Redwood City area,California USA,IT/Tech

Position: Staff LLMOps Engineer

Overview

At Cognichip, we are building the next generation, enterprise product suite to empower semiconductor design engineers to achieve a 10x productivity boost with proprietary AI/ML models and modern cloud technologies.

We are seeking a Staff LLMOps Engineer to architect, deploy, and optimize our large language model (LLM) infrastructure on the cloud. This role focuses on taking trained models to production, scaling them efficiently across GPU clusters, and driving innovations in inference optimization. You will work closely with AI scientists, Dev Ops, and platform teams to ensure low-latency, high-throughput model serving for our enterprise SaaS product.

Core

Responsibilities

Design and implement production-ready LLM deployment pipelines on AWS and Kubernetes/EKS.
Build and scale LLM inference infrastructure (multi-GPU, multi-node) for high availability, low latency, and cost efficiency.
Optimize inference performance using vLLM, SGLang, or similar frameworks.
Implement advanced serving techniques: continuous batching, speculative decoding, KV-cache management, paged attention, and distributed scheduling.
Collaborate with AI researchers to operationalize model training outputs into production-grade services.
Establish monitoring and observability for LLM serving: latency, throughput, GPU utilization, failure recovery.
Drive automation of infrastructure provisioning, scaling, and updates using IaC (Terraform) and CI/CD pipelines.
Partner with security and compliance teams to ensure secure multi-tenant model hosting aligned with enterprise-grade requirements.

Required Qualifications

5+ years of experience in Dev Ops/AI infrastructure, with 2+ years focused on LLMOps (production deployment & optimization).
Proven track record of deploying and scaling LLMs in production environments.
Hands-on experience with GPU-accelerated inference and distributed AI serving.
Strong understanding of cloud-native architectures and secure enterprise SaaS deployment.

What We Offer

Opportunity to own and scale LLM infrastructure at a disruptive AI startup.
Competitive compensation package, including equity participation.
A team of high-caliber collaborators at the intersection of AI, cloud, and semiconductor design.
A culture of innovation, precision, and impact, where your work directly shapes the future of engineering.

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language