Site Reliability Engineer; SRE Washington,DC Job Washington area,District of Columbia USA,IT/Tech

Position: Site Reliability Engineer (SRE) | Cognitive Minds | Washington, DC

🔎

Job Title:

Site Reliability Engineer (SRE) | Cognitive Minds | Washington, DC

🏢 Recruiting Company:
Cognitive Minds
🌍

Job Location:

Washington, DC
💼 Job Type: Contract Position
📧 Application Method:

💡 Position Summary

The Site Reliability Engineer (SRE) will be responsible for designing, operating, and scaling highly reliable, resilient, and secure cloud-native systems. This role is crucial in ensuring performance, observability, automation, and continuous improvement across critical infrastructure and application services.

📜 Detailed

Job Description

As an SRE, you will support hybrid and cloud environments (AWS/Azure), implement infrastructure automation, and enforce reliability through CI/CD, monitoring, SLO/SLIs, and incident response best practices. You will work closely with Dev Ops, Cloud Engineering, and Development teams to enhance system uptime, optimize performance, and build self-healing infrastructures. This role requires strong experience with containers, orchestration, IaC, observability platforms, and troubleshooting distributed systems.

✅ Key Responsibilities

Build and maintain reliable, scalable cloud infrastructure across AWS and Azure
Implement infrastructure as code using Terraform, Cloud Formation, or similar tools
Manage Kubernetes clusters and containerized workloads (EKS, AKS, Docker)
Develop CI/CD pipelines using Git Hub Actions, Jenkins, or equivalent
Implement observability solutions using Dynatrace, Prometheus, Grafana, Cloud Watch, or similar tools
Automate deployments, provisioning, monitoring, and incident responses
Troubleshoot production issues related to performance, networking, and application failures
Improve resilience through chaos engineering, failover testing, and SRE best practices
Work with cross-functional teams to drive operational excellence and reduce toil
Manage incident response, root cause analysis, and post-mortems

🎓 Required Qualifications & Skills

Proven experience as an SRE, Dev Ops Engineer, or Cloud Infrastructure Engineer
Hands‑on expertise with AWS and/or Azure cloud platforms
Strong skills in Kubernetes, Docker, and container orchestration
Proficiency with IaC tools:
Terraform, Cloud Formation, Ansible
Deep understanding of Linux systems, networking, and databases
Experience with CI/CD tools (Git Hub Actions, Jenkins, Git Lab CI, etc.)
Strong scripting ability (Python, Bash)
Familiarity with observability tools (Dynatrace, Prometheus, Grafana, ELK, etc.)
Experience in performance engineering, scalability, and distributed systems troubleshooting
Knowledge of ITIL, incident management, and Service Now workflows

✨ Nice‑to‑Have Skills

Certifications in AWS/Azure/CKA
Experience with serverless computing or service mesh
Hands‑on with Kafka, Redis, or other distributed systems
Exposure to zero‑downtime deployments and reliability automation patterns

💡 Recruitment Pro Tip

Showcase real‑world SRE achievements—especially performance improvements, reduced MTTR, automation wins, or Kubernetes/IaC projects. Quantified reliability impact is one of the strongest differentiators for SRE contract roles.

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language

Site Reliability Engineer; SRE Washington, DC