×
Register Here to Apply for Jobs or Post Jobs. X

MLOps Engineer, LLMOps – San Francisco

Job in San Francisco, San Francisco County, California, 94199, USA
Listing for: TRM Labs
Full Time position
Listed on 2026-03-15
Job specializations:
  • IT/Tech
    AI Engineer, Machine Learning/ ML Engineer
Salary/Wage Range or Industry Benchmark: 80000 - 100000 USD Yearly USD 80000.00 100000.00 YEAR
Job Description & How to Apply Below
Position: Staff MLOps Engineer, LLMOps – San Francisco Only

Build a Safer World.

TRM Labs provides blockchain analytics and AI solutions to help law enforcement and national security agencies, financial institutions, and cryptocurrency businesses detect, investigate, and disrupt crypto-related fraud and financial crime. TRM’s blockchain intelligence and AI platforms include solutions to trace the source and destination of funds, identify illicit activity, build cases, and construct an operating picture of threats. TRM is trusted by leading agencies and businesses worldwide who rely on TRM to enable a safer, more secure world for all.

The AI Engineering Team is chartered with enabling next-generation AI applications
, with a special focus on Large Language Models (LLMs) and agentic systems. Our mission is to build robust pipelines, high-performance infrastructure, and operational tooling that allow AI systems to be deployed with speed, safety, and scale.

We manage petabyte-scale pipelines, serve models with millisecond-level latency, and provide the observability and governance needed to make AI production-ready. We’re also deeply involved in evaluating and integrating cutting-edge tools in the LLM and agent space — including open-source stacks, vector databases, evaluation frameworks, and orchestration tools that unlock TRM’s ability to innovate faster than the market.

As a Staff
MLOps Engineer with a focus in LLMOps
, you’ll be at the core of building and scaling the technical infrastructure for AI/ML systems. You will:

  • Build reusable CI/CD workflows for model training, evaluation, and deployment — integrating Langfuse, Git Hub Actions, and experiment tracking, etc.
  • Automate model versioning, approval workflows, and compliance checks across environments.
  • Build out a modular and scalable AI infrastructure stack — including vector databases, feature stores, model registries, and observability tooling.
  • Partner with engineering and data science to embed AI models and agents into real-time applications and workflows.
  • Continuously evaluate and integrate state-of-the-art AI tools (e.g. Lang Chain, Llama Index, vLLM, MLflow, Bento

    ML, etc.).
  • Drive AI reliability and governance, enabling experimentation while ensuring compliance, security, and uptime.
  • Build and enhance AI/ML Model Performance
  • Ensure data accuracy, consistency and reliability, leading to better model training and inferencing
  • Deploy infrastructure to support offline and online evaluation of LLMs and agents — including regression testing, cost monitoring, and human-in-the-loop workflows.
  • Enable researchers to iterate quickly by providing sandboxes, dashboards, and reproducible environments.
What We’re Looking For
  • Write high-quality, maintainable software — primarily in Python, but we value engineering ability over language familiarity.
  • Have a strong background in scalable infrastructure
    , including:
    • Containerization and orchestration (e.g. Docker, Kubernetes)
    • Infrastructure-as-code and deployment (e.g. Terraform, CI/CD pipelines)
    • Monitoring and logging frameworks (e.g. Datadog, Prometheus, Open Telemetry)
  • Understand and implement ML Ops best practices
    , including:
    • Model versioning and rollback strategies
    • Automated evaluation and drift detection
    • Scalable model and agent serving infrastructure (e.g. vLLM, Triton, Bento

      ML)
  • Deploy and maintain LLM and agentic workflows in production, including:
    • Monitoring cost, latency, and performance
    • Capturing traces for analysis and debugging
    • Optimizing prompt/response flows with real-time data access
  • Demonstrate strong ownership and pragmatism
    , balancing infrastructure elegance with iterative delivery and measurable impact.
  • Learn about TRM Speed in this position:

    • Rapid Issue Resolution.TRM Engineers identify and resolve critical onsite issues in minutes to hours, not weeks. We create virtual war rooms, implement fixes, and share lessons with both customer stakeholders and internal teams within 48 hours.
    • Navigating Bureaucracy.We anticipate and address procedural hurdles, build trust with key stakeholders, and find alternative pathways to approvals. This keeps projects moving even in complex environments.
    • Efficient Knowledge Transfer.Engineers document and share updates in real time, ensuring the…
    To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
    (If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
     
     
     
    Search for further Jobs Here:
    (Try combinations for better Results! Or enter less keywords for broader Results)
    Location
    Increase/decrease your Search Radius (miles)

    Job Posting Language
    Employment Category
    Education (minimum level)
    Filters
    Education Level
    Experience Level (years)
    Posted in last:
    Salary