×
Register Here to Apply for Jobs or Post Jobs. X

Lead DevOps​/MLOps Engineer

Job in Reston, Fairfax County, Virginia, 22090, USA
Listing for: RAZOR
Full Time position
Listed on 2026-06-03
Job specializations:
  • IT/Tech
    Systems Engineer, SRE/Site Reliability
Salary/Wage Range or Industry Benchmark: 100000 - 125000 USD Yearly USD 100000.00 125000.00 YEAR
Job Description & How to Apply Below

We’re looking for a strong Dev Ops engineer who can help scale and operationalize our infrastructure as the platform grows. This is not a pure platform-architecture role — the focus is CI/CD, infrastructure automation, deployment reliability, observability, and GPU-oriented workload scaling.

What You’ll Own
  • Improve CI/CD pipelines, deployment workflows, and release reliability
  • Standardize infrastructure and deployment patterns across environments
  • Improve observability through logging, metrics, tracing, dashboards, and rollout monitoring
  • Partner closely with backend engineering on:
    • deployment strategies
    • infrastructure automation
    • environment consistency
    • migration workflows
    • possible Kubernetes migration efforts
  • Support ML-oriented infrastructure as a secondary responsibility:
    • Sage Maker workloads
    • Ray clusters
    • GPU scaling patterns
    • distributed batch execution
    • autoscaling behavior
    • runtime/image management
    • artifact delivery/versioning
The Kind of Problems You’ll Work On
  • Deployment safety and rollback strategies
  • Infrastructure consistency across environments
  • Release automation and environment promotion flows
  • Autoscaling and runtime stability
  • GPU workload orchestration and scaling efficiency
  • Operational tooling that reduces friction for engineering teams
Stack
  • AWS
  • Terraform
  • Docker
  • Kubernetes
  • CI/CD systems
  • Sage Maker
  • Ray
  • GPU compute infrastructure
You’ll Probably Do Well Here If
  • You’ve operated production infrastructure at meaningful scale
  • You’re strong in practical Dev Ops execution and operational reliability
  • You care about automation, observability, and deployment safety
  • You’re comfortable improving developer workflows and infrastructure tooling
  • You’ve worked with distributed systems or GPU-oriented workloads before
#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary