×
Register Here to Apply for Jobs or Post Jobs. X

Machine Learning Infrastructure Engineer

Job in San Francisco, San Francisco County, California, 94199, USA
Listing for: TRM Labs
Full Time position
Listed on 2026-03-16
Job specializations:
  • IT/Tech
    AI Engineer, Machine Learning/ ML Engineer, Data Scientist
Salary/Wage Range or Industry Benchmark: 125000 - 150000 USD Yearly USD 125000.00 150000.00 YEAR
Job Description & How to Apply Below

Build a Safer World.

TRM Labs provides blockchain analytics and AI solutions to help law enforcement and national security agencies, financial institutions, and cryptocurrency businesses detect, investigate, and disrupt crypto-related fraud and financial crime. TRM’s blockchain intelligence and AI platforms include solutions to trace the source and destination of funds, identify illicit activity, build cases, and construct an operating picture of threats. TRM is trusted by leading agencies and businesses worldwide who rely on TRM to enable a safer, more secure world for all.

TRM Labs provides blockchain analytics and AI solutions to help law enforcement and national security agencies, financial institutions, and cryptocurrency businesses detect, investigate, and disrupt crypto-related fraud and financial crime. TRM’s blockchain intelligence and AI platforms include solutions to trace the source and destination of funds, identify illicit activity, build cases, and construct an operating picture of threats. TRM is trusted by leading agencies and businesses worldwide who rely on TRM to enable a safer, more secure world for all.

At TRM, we’re on a mission to build a safer financial system for billions of people around the world. Our next-generation platform, which combines threat intelligence with machine learning, enables financial institutions and governments to detect cryptocurrency fraud and financial crime at an unprecedented scale.

As a Senior Software Engineer, ML Infrastructure at TRM Labs, you will collaborate with data scientists, engineers, and product managers to design and operate scalable GPU-backed infrastructure that powers TRM’s AI systems. You will work at the intersection of distributed systems, cloud infrastructure, GPU performance engineering, and applied machine learning — building the foundation that enables high-throughput, production-grade ML workloads.

The impact you’ll have here:
  • Design and operate GPU cluster infrastructure.

    Build and manage GPU-backed environments in cloud settings, including orchestration, autoscaling, resource isolation, and workload management across multiple concurrent models and users.

  • Optimize high-throughput inference.

    Implement and tune serving systems that maximize token throughput, batching efficiency, GPU occupancy, and cost effectiveness across interactive and batch workloads.

  • Enable distributed inference strategies.

    Support and operationalize model parallelism, tensor parallelism, and other distributed serving patterns for large-scale models.

  • Implement model optimization and compilation workflows.

    Integrate and optimize acceleration stacks such as Tensor

    RT, ONNX Runtime, vLLM, Flash Attention, and related tooling to improve performance and reduce inference cost.

  • Schedule heterogeneous workloads.

    Design systems that manage multiple models, multiple users, and mixed workload types across heterogeneous accelerators (e.g., NVIDIA GPUs, Inferentia), ensuring predictable performance under varying demand.

  • Build observability into ML infrastructure.

    Instrument systems to measure GPU load, memory utilization, batching efficiency, queue depth, and token throughput, and use data to continuously improve performance and reliability.

  • Partner across engineering teams.

    Work closely with infrastructure, ML, and product teams to ensure models transition smoothly from experimentation to production-grade, highly available services.

What we’re looking for:
  • Bachelor’s degree (or equivalent) in Computer Science or related field.

  • 5+ years of experience building and operating distributed systems or infrastructure in production environments.

  • Experience deploying and operating ML/LLM inference workloads on GPU clusters in cloud environments (AWS and/or GCP).

  • Deep understanding of high-throughput inference systems, including batching strategies, token throughput optimization, and the trade-offs between latency, throughput, and cost.

  • Experience with one or more ML serving frameworks such as Triton Inference Server, vLLM, Ray Serve, ONNX Runtime, or Hugging Face Optimum.

  • Experience optimizing GPU load, memory efficiency, and performance bottlenecks in production systems.

  • Familiarity with…

To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary