Disaster Recovery/Cloud Reliability Engineer Job Memphis area,Tennessee USA,IT/Tech

Position: Disaster Recovery / Cloud Reliability Engineer

Vaco is actively seeking an AWS Disaster Recovery / Cloud Reliability Engineer for a direct-hire role supporting a nationally respected, technically focused, and growing organization in Memphis.

Position Summary

The AWS Cloud Reliability Engineer is responsible for technical execution focused on strengthening and modernizing enterprise recovery capabilities within an enterprise AWS ecosystem.

This individual will collaborate with infrastructure, security, compliance, and application teams to improve operational resilience, streamline recovery execution, and support cloud governance initiatives related to performance, availability, and utilization optimization.

The role requires hands‑on AWS engineering coupled with operational readiness, automation, and disaster recovery planning. The ideal candidate brings strong AWS expertise, practical infrastructure‑as‑code experience, and the ability to build and deploy repeatable recovery processes across complex distributed environments.

Primary Responsibilities

Design, implement, and support AWS-based resiliency and recovery solutions across enterprise applications and services including Elastic Disaster Recovery (EDR).
Develop, maintain, and continuously improve disaster recovery and cutover runbooks including detailed recovery procedures, system dependencies, stakeholder communications, validation checkpoints, escalation paths, and rollback processes to support recovery and failover events.
Coordinate and execute disaster recovery exercises, failover testing, and restoration activities while documenting outcomes and driving corrective improvements.
Build reusable IaC components and operational standards using Terraform, Cloud Formation, and related automation technologies.
Create automated deployment, provisioning, and support workflows through scripting and orchestration tools.
Enhance operational visibility by implementing monitoring, alerting, and reporting related to backup integrity, replication status, and recovery readiness.
Partner with governance, security, and risk stakeholders to ensure resiliency solutions align with internal controls and compliance expectations.

Qualifications

Minimum of 5 years experience within cloud engineering, infrastructure operations, Dev Ops, SRE, or related technical environments including hands‑on support of AWS production platforms and resiliency initiatives.
Strong understanding of disaster recovery operations including resiliency planning, failover testing, recovery validation, operational response, dependency management, and continuous improvement practices.
Experience supporting cloud infrastructure technologies including networking, storage, compute, backup, replication, monitoring, and identity management services within AWS environments.
Advanced experience with Infrastructure-as-Code and automation technologies including Terraform, Cloud Formation, scripting, and workflow orchestration using Python, Power Shell, Bash, or similar tools.
Ability to build scalable, repeatable operational processes and translate technical resiliency strategies into measurable business continuity and risk reduction outcomes.
Experience supporting cloud governance, utilization analysis, tagging strategy, reporting, and cost optimization initiatives aligned with Fin Ops methodologies and operational efficiency goals.
Familiarity with monitoring, observability, and logging platforms such as Cloud Watch, Splunk, Datadog, or related technologies.
Working knowledge of Linux and Windows administration, Git-based source control, and CI/CD tooling.
Experience operating within highly regulated environments where compliance, audit readiness, and operational controls are critical. Familiarity with governance and resiliency frameworks such as NIST, ISO, ITIL, or similar standards is preferred.
Exposure to containerized and orchestration technologies including Docker, ECS, EKS, and Kubernetes is considered a plus.
Strong written and verbal communication skills with the ability to collaborate effectively across distributed technical, operational, and business teams, including during high‑pressure recovery events or incident response situations.
Bachelor’s…

Disaster Recovery​/Cloud Reliability Engineer

Disaster Recovery/Cloud Reliability Engineer