×
Register Here to Apply for Jobs or Post Jobs. X

Lead Site Reliability Engineer

Remote / Online - Candidates ideally in
Fairfax, Fairfax County, Virginia, 22032, USA
Listing for: Onedynamic
Remote/Work from Home position
Listed on 2025-12-27
Job specializations:
  • IT/Tech
    Cloud Computing, Systems Engineer, SRE/Site Reliability, IT Support
Salary/Wage Range or Industry Benchmark: 70 - 75 USD Hourly USD 70.00 75.00 HOUR
Job Description & How to Apply Below

Quick Details

  • Location: Fully Remote (US)
  • Experience: 8+ Years
  • Rate: $70-75/hour
  • Duration: 6 months+
About One Dynamic

One Dynamic is a Service-Disabled Veteran-Owned Small Business (SDVOSB) headquartered in Fairfax, VA. We specialize in digital transformation, cloud infrastructure, quality assurance, and enterprise architecture for federal and healthcare organizations. We are currently seeking a Lead Site Reliability Engineer to support our client ARC, a rapidly growing device management company revolutionizing how frontline workers interact with enterprise mobile devices.

About the Role

The Lead Site Reliability Engineer is a senior technical leadership role responsible for the reliability, availability, and operational excellence of the cloud infrastructure and kiosks platform. This role owns uptime, SLAs, and incident response while driving long‑term improvements to system resilience, observability, and operational maturity. The Lead SRE serves as both a hands‑on technical leader and a force multiplier across platform, QA, and development teams.

This role is well‑suited for an experienced engineer who thrives in high‑ownership environments and can balance real‑time operational demands with strategic reliability initiatives. Strong communication, sound technical judgment, and a bias toward preventative engineering are critical to success.

Key Responsibilities
  • Own uptime, SLAs, and overall reliability of the cloud infrastructure and kiosks platform
  • Lead incident response, root‑cause analysis, and drive actionable postmortems
  • Automate infrastructure, deployments, and operational tasks using modern IaC and scripting in collaboration with the Platform Engineering team
  • Maintain and improve monitoring, alerting, and observability (e.g., Grafana, Prometheus, New Relic).
  • Execute and continuously improve disaster recovery and business continuity plans
  • Partner with platform engineering, QA, and development teams to ensure operational readiness
  • Establish and maintain runbooks, operational standards, and reliability best practices
  • Provide leadership, mentorship, and clear communication during both normal operations and incidents
  • Optimize cloud and Kubernetes environments for reliability, performance, and scalability
Required Qualifications
  • 8+ years in SRE, Dev Ops, or Platform Engineering roles; 2+ years in a senior or lead capacity
  • Strong experience supporting production environments with strict SLAs and high uptime requirements
  • Deep knowledge of Kubernetes, containers, and cloud‑native infrastructure
  • Proficiency in automation and scripting using Bash, Python, or Go
  • Hands‑on experience with CI/CD pipelines and release engineering in modern environments
  • Expert‑level familiarity with IaC tools (Terraform preferred)
  • Strong understanding of monitoring, alerting, logging, and observability tooling
  • Experience implementing and managing Git Ops workflows (ArgoCD or similar)
  • Demonstrated ability to lead incidents and communicate effectively with technical and non‑technical stakeholders
  • Solid understanding of disaster recovery planning, resilience practices, and system hardening
  • Must be authorized to work in the United States (US‑based candidates only)
The Ideal Candidate

You think several steps ahead. You are relentless, strategic, and a long‑term thinker. You believe the details are essential, and so you get them right. You are a fast learner. You take feedback well and implement it. You care about achieving the best outcome and do not focus on being right or wrong.

About the Client

ARC is a device management solution integrated with smart lockers, designed to store, secure, and charge company‑owned handheld devices (E.g., Zebra, Honeywell) used by frontline workers to perform core job functions. Launched in late 2021, ARC was spun off from Charge It Spot , a consumer‑facing phone‑charging technology company established in 2012.

ARC's Mission: Minimize Device Waste. Maximize Worker Productivity. Make Life Easier.

How to Apply

If you have the unique combination of skills and qualities we are seeking, please submit your resume via One Dynamic's careers portal. We look forward to hearing from you!

One Dynamic is an Equal Opportunity employer. Personnel are chosen based on ability without regard to race, color, religion, sex, national origin, disability, marital status, or sexual orientation, in accordance with federal and state law.

#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary