More jobs:
Senior AI/ML Engineer
Job in
San Mateo, San Mateo County, California, 94409, USA
Listed on 2026-01-01
Listing for:
TrueFoundry
Full Time
position Listed on 2026-01-01
Job specializations:
-
Software Development
AI Engineer
Job Description & How to Apply Below
About the Company:
True Foundry is an enterprise-grade AI/ML platform that accelerates the development, deployment, and scaling of GenAI and ML applications with security, cost efficiency, and cross-cloud flexibility. We’re now scaling our Enterprise Outcomes motion a strategic arm focused on delivering domain-specific solutions that drive business transformation and shape our product roadmap. We’re hiring a senior leader to build and lead the engineering arm of this motion.
About the Role:
You’ll design and own core components that enable enterprise customers to run production agentic AI safely and efficiently on True Foundry. This includes building robust orchestration for multi-step agents (graph/stateful workflows), model/routing logic, observability and policy enforcement (cost, data residency, rate limiting), and integrating upstream tooling like Lang Graph, Lang Chain, vector stores, and specialized LLM runtimes.
Responsibilities:
• Architect and implement scalable agent orchestration patterns (graph-based executors, state management, multi-agent coordination) for production workloads.
• Own critical integrations: model adapters, LLM gateway hooks, vector DBs, tools & external APIs, and the platform’s LLMops flows.
• Build and improve tracing, benchmarking and observability for LLMs and agents — token/cost accounting, latency p95, throughput, and correctness checks.
• Drive design for safety/guardrails: moderation hooks, human-in-the-loop checkpoints, replayable audit trails and policy enforcement.
• Mentor junior engineers, run design reviews, and improve engineering practices (testing, CI/CD, chaos testing for agents).
• Work directly with strategic customers to prototype complex agentic solutions and translate them into product features.
Required Skills:
• 3–9 years of software engineering with substantial experience building distributed systems, infra, or ML platforms.
• Deep practical experience integrating and deploying LLMs in production (RAG, retrieval, embeddings pipelines).
• Hands‑on experience with agent orchestration frameworks (Lang Graph / Lang Chain or custom agent runtimes) and stateful workflow design.
• Strong systems knowledge:
Kubernetes, container orchestration, service meshes, and performance tuning.
• Proven track record building observability, cost controls, and policy enforcement for production services.
Preferred
Skills:
• Experience building or contributing to open-source LLM orchestration tools (Lang Graph, Lang Chain, or similar).
• Familiarity with enterprise constraints: on‑prem/cloud hybrid deployments, data residency, compliance requirements.
• Background in security, privacy, or model governance for LLMs.
• Demonstrated leadership in cross‑functional projects and direct customer engagement.
We are an Equal Opportunity Employer. All employment decisions at our organization are based on merit, qualifications, and business needs. We do not discriminate on the basis of race, religion, color, gender, sexual orientation, age, marital status, disability, or any other status protected by law.
Location:
San Francisco, CA
#JLjbffr
Position Requirements
10+ Years
work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×