Production Support Engineering LMTS Job San Francisco area,California USA,Software Development

Description
Opportunity & Product

Join an agile team with deep startup roots. We operate as a high‑velocity ‘startup‑within‑Salesforce,’ following our recent acquisition. You’ll be managed by the same founders and engineers who built the original company, offering the autonomy of a small team backed by the global scale and trust of Salesforce.

We have successfully moved past the "0 to 1" phase. We have a product that works, customers who love it, and the backing of Salesforce. Now, we are entering the "1 to 100" phase: scaling our architecture to handle global demand, hardening our systems for enterprise‑grade resilience, and integrating deeply with the Agentforce ecosystem. This is your chance to help lead that transition.

What You’ll Do

As a Production Support Engineer (LMTS), you will be a senior technical lead within our embedded reliability team. You aren’t building the foundation alone—you’ll work alongside a group of engineers and product owners to ensure the Agentforce for Supply Chain platform is the most reliable AI‑powered engine in the industry.

This is a role for an engineer who loves the "scaling" problem. You will focus on production excellence, performance tuning, and infrastructure automation. Because you are embedded in the product organization, you’ll have a seat at the table during design reviews, ensuring that as we add new agentic capabilities, they are built to scale from day one.

Responsibilities

Scaling & Reliability:
Own the reliability roadmap for major product areas, working to transition our systems from startup‑speed architectures to highly‑available, global‑scale enterprise solutions.
Collaborative Leadership:
Partner with PMTS‑level engineers to refine our infrastructure strategy, contributing senior‑level perspectives on system design, capacity planning, and bottleneck identification.
Infrastructure as Code:
Maintain and evolve our automated environments, focusing on making our "infrastructure‑as‑plugins" model more robust and developer‑friendly.
AI Operations (AIOps):
Support the scaling of our AI/ML infrastructure, ensuring our models have the GPU resources and data pipelines required to deliver real‑time supply chain insights.
Production Excellence:
Lead the "1 to 100" hardening of our observability stack. You won’t just respond to incidents; you’ll build the tooling that prevents them and the telemetry that explains them.
Performance Engineering:
Deep‑dive into SQL optimization, API latency, and cross‑service communication to ensure our data‑intensive supply chain platform remains performant under heavy load.
AI‑First Workflow:
Lean into the future of engineering by using AI tools (Claude Code, etc.) to automate routine operational tasks and accelerate infrastructure delivery.
Contribute to building and maintaining the shared system context, an explicit repository of system designs, constraints, and standards that enables AI to operate accurately and reliably.
Critically evaluate code (Human or AI‑generated) for correctness, quality, security, and performance.

Required Qualifications

5+ years of experience in SRE, Production Engineering, or Backend Engineering with a heavy focus on operations and scale.
Proven Scaling

Experience:

You have previously helped take a product through a high‑growth phase (the "1 to 100" journey), dealing with the technical debt and architectural shifts that come with it.
Technical Breadth:
Strong proficiency in Kubernetes, Terraform/Open Tofu, and AWS/GCP/Azure.
Coding Mastery:
Ability to write and review production‑level code in Golang, Type Script, or Python—you view automation as a software engineering problem.
Systems Expert:
Deep understanding of distributed systems, including how to debug complex interactions between microservices, databases, and AI agents.
Low‑Ego

Collaboration:

Experience working within a senior team of Principal engineers, capable of both leading specific initiatives and supporting the broader group’s technical vision.
A demonstrated, genuine AI‑first approach to engineering. Using AI to move faster, build fluency across the stack, and contribute well beyond your core specialty.
Experience using AI tools (e.g., Claude Code, Git…