Site Reliability Engineer; SRE Washington, DC
Listed on 2026-02-19
-
IT/Tech
SRE/Site Reliability, Cloud Computing
🔎
Job Title:
Site Reliability Engineer (SRE) | Cognitive Minds | Washington, DC
🏢 Recruiting Company:
Cognitive Minds
🌍
Job Location:
Washington, DC
💼 Job Type: Contract Position
📧 Application Method:
The Site Reliability Engineer (SRE) will be responsible for designing, operating, and scaling highly reliable, resilient, and secure cloud-native systems. This role is crucial in ensuring performance, observability, automation, and continuous improvement across critical infrastructure and application services.
📜 DetailedJob Description
As an SRE, you will support hybrid and cloud environments (AWS/Azure), implement infrastructure automation, and enforce reliability through CI/CD, monitoring, SLO/SLIs, and incident response best practices. You will work closely with Dev Ops, Cloud Engineering, and Development teams to enhance system uptime, optimize performance, and build self-healing infrastructures. This role requires strong experience with containers, orchestration, IaC, observability platforms, and troubleshooting distributed systems.
✅ Key Responsibilities- Build and maintain reliable, scalable cloud infrastructure across AWS and Azure
- Implement infrastructure as code using Terraform, Cloud Formation, or similar tools
- Manage Kubernetes clusters and containerized workloads (EKS, AKS, Docker)
- Develop CI/CD pipelines using Git Hub Actions, Jenkins, or equivalent
- Implement observability solutions using Dynatrace, Prometheus, Grafana, Cloud Watch, or similar tools
- Automate deployments, provisioning, monitoring, and incident responses
- Troubleshoot production issues related to performance, networking, and application failures
- Improve resilience through chaos engineering, failover testing, and SRE best practices
- Work with cross-functional teams to drive operational excellence and reduce toil
- Manage incident response, root cause analysis, and post-mortems
- Proven experience as an SRE, Dev Ops Engineer, or Cloud Infrastructure Engineer
- Hands‑on expertise with AWS and/or Azure cloud platforms
- Strong skills in Kubernetes, Docker, and container orchestration
- Proficiency with IaC tools:
Terraform, Cloud Formation, Ansible - Deep understanding of Linux systems, networking, and databases
- Experience with CI/CD tools (Git Hub Actions, Jenkins, Git Lab CI, etc.)
- Strong scripting ability (Python, Bash)
- Familiarity with observability tools (Dynatrace, Prometheus, Grafana, ELK, etc.)
- Experience in performance engineering, scalability, and distributed systems troubleshooting
- Knowledge of ITIL, incident management, and Service Now workflows
- Certifications in AWS/Azure/CKA
- Experience with serverless computing or service mesh
- Hands‑on with Kafka, Redis, or other distributed systems
- Exposure to zero‑downtime deployments and reliability automation patterns
Showcase real‑world SRE achievements—especially performance improvements, reduced MTTR, automation wins, or Kubernetes/IaC projects. Quantified reliability impact is one of the strongest differentiators for SRE contract roles.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).