More jobs:
Senior Java Site Reliability Engineer
Job in
McLean, Fairfax County, Virginia, USA
Listed on 2026-06-05
Listing for:
Ekfrazo Technologies Private Limited
Full Time
position Listed on 2026-06-05
Job specializations:
-
IT/Tech
SRE/Site Reliability, Cloud Computing
Job Description & How to Apply Below
Role:
Senior Java Site Reliability Engineer
Exp: 16-20 Years
Job Type: Contract
Project:
Hybrid
Location:
McLean, VA
Key Responsibilities
- Support and maintain highly available production platforms across cloud and distributed environments. Drive incident management, root cause analysis, problem management, and platform stability initiatives.
- Monitor and maintain uptime of Java applications and microservices.
- Proactively identify and resolve application performance bottlenecks.
- Conduct root cause analysis (RCA) for application outages and incidents.
- Implement resiliency patterns including circuit breakers, retries, and failover mechanisms.
- Lead reliability engineering efforts focused on system availability, performance optimization, and operational excellence. Implement and enhance observability solutions including monitoring, logging, alerting, and incident response automation.
- Collaborate with development, infrastructure, and cloud engineering teams to improve deployment reliability and operational efficiency. Support infrastructure modernization, cloud transformation, and platform automation initiatives.
- Coordinate disaster recovery testing, resiliency validation, capacity planning, and production readiness reviews. Provide technical leadership and mentor offshore/onshore engineering teams.
Required Experience
- 16–20 years of experience in Site Reliability Engineering (SRE), Production Engineering, Platform Engineering, or Application Support.
- Strong experience supporting large-scale enterprise production environments. Proven background in incident management, problem management, and operational support.
- Experience working within banking, financial services, fintech, or other highly regulated industries. Hands‑on experience supporting mission‑critical applications with stringent availability and performance requirements.
Required Skills
- Java
- Kubernetes and Container Platforms
- Docker
- Cloud Platforms (AWS, Azure, or GCP)
- CI/CD Tools (Jenkins, Git Hub Actions, Git Lab CI/CD, ArgoCD)
- Infrastructure as Code (Terraform, Ansible)
- Monitoring & Observability Tools (Splunk, Datadog, Grafana, Prometheus, Moogsoft)
- Service Now, JIRA, Confluence
- Python, Bash, or Shell Scripting
- SQL and Database Troubleshooting
- Application Performance Monitoring (APM)
- Production Release Management
- Disaster Recovery and High Availability Architectures
- Bachelor's degree in Computer Science, Information Systems, Engineering, or a related technical discipline.
Position Requirements
10+ Years
work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×