More jobs:

Lead DevOps/MLOps Engineer

Job in Reston, Fairfax County, Virginia, 22090, USA

Listing for: RAZOR

Full Time position
Listed on 2026-06-03

Job specializations:

IT/Tech
Systems Engineer, SRE/Site Reliability

Salary/Wage Range or Industry Benchmark: 100000 - 125000 USD Yearly USD 100000.00 125000.00 YEAR

We’re looking for a strong Dev Ops engineer who can help scale and operationalize our infrastructure as the platform grows. This is not a pure platform-architecture role — the focus is CI/CD, infrastructure automation, deployment reliability, observability, and GPU-oriented workload scaling.

What You’ll Own

Improve CI/CD pipelines, deployment workflows, and release reliability
Standardize infrastructure and deployment patterns across environments
Improve observability through logging, metrics, tracing, dashboards, and rollout monitoring
Partner closely with backend engineering on:
- deployment strategies
- infrastructure automation
- environment consistency
- migration workflows
- possible Kubernetes migration efforts
Support ML-oriented infrastructure as a secondary responsibility:
- Sage Maker workloads
- Ray clusters
- GPU scaling patterns
- distributed batch execution
- autoscaling behavior
- runtime/image management
- artifact delivery/versioning

The Kind of Problems You’ll Work On

Deployment safety and rollback strategies
Infrastructure consistency across environments
Release automation and environment promotion flows
Autoscaling and runtime stability
GPU workload orchestration and scaling efficiency
Operational tooling that reduces friction for engineering teams

Stack

AWS
Terraform
Docker
Kubernetes
CI/CD systems
Sage Maker
Ray
GPU compute infrastructure

You’ll Probably Do Well Here If

You’ve operated production infrastructure at meaningful scale
You’re strong in practical Dev Ops execution and operational reliability
You care about automation, observability, and deployment safety
You’re comfortable improving developer workflows and infrastructure tooling
You’ve worked with distributed systems or GPU-oriented workloads before

#J-18808-Ljbffr

To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
View / Apply for Jobs
Matching My Jurisdiction

Lead DevOps​/MLOps Engineer

Lead DevOps/MLOps Engineer