×
Register Here to Apply for Jobs or Post Jobs. X
More jobs:

LLM Pre-training & Distributed Engineer; AI Infrastructure

Job in Oregon, Dane County, Wisconsin, 53575, USA
Listing for: Hyphen Connect
Apprenticeship/Internship position
Listed on 2026-04-28
Job specializations:
  • Engineering
    AI Engineer
Salary/Wage Range or Industry Benchmark: 80000 - 100000 USD Yearly USD 80000.00 100000.00 YEAR
Job Description & How to Apply Below
Position: LLM Pre-training & Distributed Engineer (AI Infrastructure)

We are seeking a highly skilled LLM Pre-training & Distributed Systems Engineer. This role is essential for orchestrating large-scale machine learning training runs and optimizing distributed infrastructure. The ideal candidate will have a deep understanding of GPU clusters and extensive experience in system engineering to ensure efficient and reliable training processes.

Responsibilities:
  • Orchestrate distributed training runs across 1,000+ GPUs using PyTorch, Deep Speed, or Megatron-LM.
  • Optimize networking (Infini Band/RDMA) and memory management to prevent out-of-memory errors.
  • Automate checkpointing and failure recovery during month-long training runs.
Required Skills:
  • Deep expertise in 3D parallelism (Data, Tensor, Pipeline).
  • Strong systems engineering background (C++, CUDA, Python).
#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary