×
Register Here to Apply for Jobs or Post Jobs. X

Senior Java Site Reliability Engineer

Job in McLean, Fairfax County, Virginia, USA
Listing for: Ekfrazo Technologies Private Limited
Full Time position
Listed on 2026-06-05
Job specializations:
  • IT/Tech
    SRE/Site Reliability, Cloud Computing
Salary/Wage Range or Industry Benchmark: 80000 - 100000 USD Yearly USD 80000.00 100000.00 YEAR
Job Description & How to Apply Below

Role:
Senior Java Site Reliability Engineer

Exp: 16-20 Years

Job Type: Contract

Project:
Hybrid

Location:
McLean, VA

Key Responsibilities

  • Support and maintain highly available production platforms across cloud and distributed environments. Drive incident management, root cause analysis, problem management, and platform stability initiatives.
  • Monitor and maintain uptime of Java applications and microservices.
  • Proactively identify and resolve application performance bottlenecks.
  • Conduct root cause analysis (RCA) for application outages and incidents.
  • Implement resiliency patterns including circuit breakers, retries, and failover mechanisms.
  • Lead reliability engineering efforts focused on system availability, performance optimization, and operational excellence. Implement and enhance observability solutions including monitoring, logging, alerting, and incident response automation.
  • Collaborate with development, infrastructure, and cloud engineering teams to improve deployment reliability and operational efficiency. Support infrastructure modernization, cloud transformation, and platform automation initiatives.
  • Coordinate disaster recovery testing, resiliency validation, capacity planning, and production readiness reviews. Provide technical leadership and mentor offshore/onshore engineering teams.

Required Experience

  • 16–20 years of experience in Site Reliability Engineering (SRE), Production Engineering, Platform Engineering, or Application Support.
  • Strong experience supporting large-scale enterprise production environments. Proven background in incident management, problem management, and operational support.
  • Experience working within banking, financial services, fintech, or other highly regulated industries. Hands‑on experience supporting mission‑critical applications with stringent availability and performance requirements.

Required Skills

  • Java
  • Kubernetes and Container Platforms
  • Docker
  • Cloud Platforms (AWS, Azure, or GCP)
  • CI/CD Tools (Jenkins, Git Hub Actions, Git Lab CI/CD, ArgoCD)
  • Infrastructure as Code (Terraform, Ansible)
  • Monitoring & Observability Tools (Splunk, Datadog, Grafana, Prometheus, Moogsoft)
  • Service Now, JIRA, Confluence
  • Python, Bash, or Shell Scripting
  • SQL and Database Troubleshooting
  • Application Performance Monitoring (APM)
  • Production Release Management
  • Disaster Recovery and High Availability Architectures
Education
  • Bachelor's degree in Computer Science, Information Systems, Engineering, or a related technical discipline.
#J-18808-Ljbffr
Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary