×
Register Here to Apply for Jobs or Post Jobs. X

Machine Learning Engineer - Distributed ML Systems

Job in Fresno, Fresno County, California, 93650, USA
Listing for: Pluralis Research
Full Time position
Listed on 2026-05-30
Job specializations:
  • Software Development
    Software Engineer
Salary/Wage Range or Industry Benchmark: 125000 - 150000 USD Yearly USD 125000.00 150000.00 YEAR
Job Description & How to Apply Below

Overview

Pluralis Research carries out foundational research on Protocol Learning
: multi-participant training of foundation models where no single participant has, or can ever obtain, a full copy of the model. The purpose of Protocol Learning is to facilitate the creation of community-trained and community-owned frontier models with self-sustaining economics.

We’re looking for Senior/Staff engineers with 5+ years of experience in distributed systems and ML large‑scale training. You’ll be implementing a novel substrate for training distributed ML models that work under consumer grade internet connection.

Responsibilities
Distributed Training Architecture & Optimization
  • Design and implement large‑scale distributed training systems optimized for heterogeneous hardware operating under low‑bandwidth, high‑latency conditions.

  • Develop and optimize model‑parallel training strategies (data, tensor, pipeline parallelism) with custom sharding techniques that minimize communication overhead.

  • Optimize GPU utilization, memory efficiency, and compute performance across distributed nodes.

  • Implement robust checkpointing, state synchronization, and recovery mechanisms for long‑running, fault‑prone training jobs.

  • Build monitoring and metrics systems to track training progress, model quality, and system bottlenecks.

Decentralized Networking & Resilience
  • Architect resilient training systems where nodes can fail, networks can partition, and participants can dynamically join or leave.

  • Design and optimize peer‑to‑peer topologies for decentralized coordination across non‑co‑located nodes.

  • Implement NAT traversal, peer discovery, dynamic routing, and connection lifecycle management.

  • Profile and optimize communication patterns to reduce latency and bandwidth overhead in multi‑participant environments.

What You’ll Bring
  • Strong experience building and operating distributed systems in production.

  • Hands‑on expertise with distributed training frameworks (FSDP, Deep Speed, Megatron, or similar).

  • Deep understanding of model parallelism (data, tensor, pipeline parallelism).

  • Expert‑level Python with production experience (concurrency, error handling, retry logic, clean architecture).

  • Strong networking fundamentals: P2P systems, gRPC, routing, NAT traversal, distributed coordination.

  • Experience optimizing GPU workloads, memory management, and large‑scale compute efficiency.

What We Offer
  • Equity‑heavy compensation with meaningful ownership in a mission‑driven company

  • Competitive base salary for senior engineering roles in Australia

  • Visa sponsorship available for exceptional candidates

  • Remote‑first with optional access to our Melbourne hub

  • World‑class team — team mates were previously at Google, Amazon, Microsoft, and leading startups

Backed by Union Square Ventures and other tier‑1 investors, we’re a world‑class, deeply technical team of ML researchers and engineers. Pluralis is unapologetically ideological. We view the world as a better place if we are able to implement what we are attempting, and Protocol Learning as the only plausible approach to preventing a handful of massive corporations monopolising model development, access and release, and achieving massive economic capture.

If this resonates, please apply.

#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary