EOP - Site Reliability Engineer - TS/SCI
Job in
Washington, District of Columbia, 20022, USA
Listed on 2026-03-01
Listing for:
cFocus Software Incorporated
Full Time
position Listed on 2026-03-01
Job specializations:
-
IT/Tech
Cloud Computing, IT Support, Systems Engineer, SRE/Site Reliability
Job Description & How to Apply Below
Overview
cFocus Software seeks a Site Reliability Engineer to join our program supporting the United States Secret Services (USSS). This position is remote. This position requires the ability to obtain a TS/SCI clearance.
Qualifications- Bachelor’s degree in Computer Science, Engineering, or related technical field (or equivalent experience).
- Minimum of 2 years of experience in systems engineering, Dev Ops, or Site Reliability Engineering roles.
- Strong proficiency with Linux/Unix operating systems.
- Experience with scripting and automation using Python, Bash, or similar languages.
- Experience with monitoring and logging tools such as Prometheus, Grafana, ELK Stack, or equivalent.
- Experience supporting CI/CD tools such as Git Lab, Jenkins, or ArgoCD.
- Experience with containerization and orchestration platforms including Docker and Kubernetes.
- Understanding of SRE principles including SLIs, SLOs, and error budgets.
- Strong troubleshooting, problem-solving, and documentation skills.
- Monitor system health, availability, and performance using centralized monitoring and logging tools.
- Respond to, troubleshoot, and resolve incidents in production environments and provide root cause analysis.
- Conduct after-action reporting and post-incident reviews to improve system resilience.
- Automate repetitive operational tasks including deployments, monitoring, and incident response.
- Administer user accounts, access controls, and authentication mechanisms.
- Maintain and configure workflow templates, user fields, and application configurations.
- Maintain test environments that mirror production and support pre-deployment testing.
- Design and maintain backup, high availability (HA), and disaster recovery (DR) solutions.
- Develop and maintain incident response and disaster recovery plans for supported applications.
- Configure and support integrations with complementary enterprise systems.
- Architect, build, and maintain on-premise and cloud infrastructure supporting applications.
- Administer production, staging, and development environments.
- Manage system logs and monitor for security and operational events.
- Maintain and improve CI/CD pipelines and Dev Sec Ops processes.
- Apply configuration management disciplines including patching, hardening, and documentation.
- Create and maintain dashboards, SLIs, SLOs, and service health metrics.
- Support operational readiness boards and weekly service reviews.
- Provide on-call support for outages, upgrades, and emergency maintenance as required.
- Support surge activities, including Presidential Transition-related data analysis if required.
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×