×
Register Here to Apply for Jobs or Post Jobs. X

Sr Site Reliability Engineer

Job in Suffolk, Virginia, 23432, USA
Listing for: DOMA Technologies
Full Time position
Listed on 2026-05-30
Job specializations:
  • IT/Tech
    Cloud Computing, Systems Engineer, SRE/Site Reliability
Salary/Wage Range or Industry Benchmark: 100000 - 125000 USD Yearly USD 100000.00 125000.00 YEAR
Job Description & How to Apply Below

About Commence

At Commence, we’re the start of a new age of data‑centric transformation, elevating health outcomes and powering more efficient processes for patients and programs. We combine quality, data‑driven solutions that fuel answers, technology that advances performance, and clinical expertise that builds trust to create a more efficient path to quality care.

With human‑centered, healthcare‑relevant, and value‑based solutions, we create new possibilities with data. We provide proof beyond the concept and performance beyond the scope with a focus on efficiencies that transform the lives of those we serve. With a culture driven by purpose, straightforward communication and clinical domain expertise, Commence cuts straight to better care.

Responsibilities
  • Design, implement, and own observability infrastructure including metrics, logging, tracing, and alerting across distributed systems.
  • Define and enforce SLOs, SLIs, and error budgets in partnership with product and engineering teams.
  • Lead incident response: triage, coordinate remediation, conduct blameless post‑mortems, and drive systemic fixes.
  • Build and maintain CI/CD pipelines that support rapid, safe delivery of changes to production.
  • Collaborate with engineering teams on infrastructure changes; able to read, modify, and contribute to existing infrastructure‑as‑code (Terraform or Cloud Formation).
  • Design and operate highly available, fault‑tolerant systems—including auto‑scaling, failover, and disaster recovery strategies.
  • Reduce operational toil through automation; eliminate manual processes before they become habits.
  • Collaborate with software engineers to establish reliability‑first design patterns and review architectures for operational risk.
  • Manage Kubernetes or container orchestration environments at scale.
  • Ensure systems meet compliance and security requirements, particularly those applicable to healthcare data (HIPAA, SOC 2).
  • Provide technical mentorship and guidance to engineers across the organization on reliability practices.
  • Participate in on‑call rotation with a commitment to continuously reducing the need for it.
Qualifications
  • 7+ years of experience in SRE, platform engineering, or Dev Ops roles.
  • Exceptional problem‑solving under pressure—demonstrated track record of diagnosing complex, high‑stakes system failures and building durable solutions.
  • Deep hands‑on experience with AWS services including EC2, EKS/ECS, Lambda, RDS, S3, Cloud Watch, and related tooling.
  • Familiarity with infrastructure‑as‑code (Terraform or Cloud Formation)—able to contribute to existing configurations.
  • Experience designing and operating distributed systems with strict availability and latency requirements.
  • Proficiency in at least one scripting or systems language (Python, Go, Bash, or similar) for automation and tooling.
  • Experience with container orchestration (Kubernetes, ECS) in production environments.
  • Expertise in observability tooling (Open Search, Prometheus/Grafana, or equivalent).
  • Hands‑on experience with CI/CD platforms (Git Hub Actions, Jenkins, Circle

    CI, or similar).
  • Proven ability to define and operationalize SLOs and error budgets.
  • Experience with relational and No

    SQL databases—performance tuning, replication, and backup strategies.
  • Strong working knowledge of networking fundamentals: DNS, load balancing, VPCs, TLS.
  • Excellent communication skills—able to translate technical risk into business impact for non‑engineering stakeholders.
Additional Requirements
  • AWS Certifications (Solutions Architect, Dev Ops Engineer, or Sys Ops Administrator).
  • Experience in healthcare technology or other regulated industries (HIPAA, SOC 2, FedRAMP).
  • Familiarity with chaos engineering practices and tooling.
  • Experience with data pipeline reliability (ETL/ELT workflows, streaming systems).
  • Exposure to AI/ML infrastructure and the reliability challenges unique to model serving.
  • Familiarity with additional cloud platforms (Azure, Google Cloud).
  • Contributions to open‑source reliability or infrastructure tooling.
Work Environment / Physical Demands

The work environment and physical demands described here are representative of those that must be met by an employee to successfully perform the…

To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary