Lead Cloud Engineer Job Atlanta area,Georgia USA,IT/Tech

Lead Cloud Engineer page is loaded## Lead Cloud Engineer locations:
US, GA - Atlanta time type:
Full time posted on:
Posted Todayjob requisition :
R28401

At Bose Corporation, we believe sound is the most powerful force on earth — and for over 60 years, we have been a company built on innovation, excellence, and independence. Privately owned, fiercely customer-focused, and driven by our values, we continue to lead industries and transform lives through sound.
Today, Bose Corporation is entering an exciting new era. Across multiple global Business Units and Global Functions, we are shaping the future of audio technology, automotive, luxury, and premium experiences. We invite you to join us in this transformation. #
** Job Description
**** Cloud Engineering Ops Lead (AWS + Application Support)
**** Mission
* * Keep our AWS platforms and customer-facing apps available, observable, recoverable, secure, and cost‑sensible. Make the runbook path the easiest path, so on-call personnel feel calm and releases feel straightforward—in a good way.
** Scope of the role
*** AWS operations: EC2, EKS, RDS, ALB/Cloud Front, IAM/OIDC, VPC/TGW/SGs, patching, and hygiene.
* Application support: release readiness, runbooks, post-deploy smoke checks, performance baselines, and clean rollback paths.
* Visibility: dashboards, logs, metrics, traces, synthetics, error budgets, and alert health.
* Backup & DR: policies, schedules, retention, cross-region copies, restore testing, and DR runbooks (RPO/RTO owned and measured).
* Incident leadership: run Sev‑1/2 bridges, keep comms clear, and land post‑mortems with actions that actually close.
* Cost hygiene: tagging, right-sizing, SP/RI coverage, lifecycle cleanups (EBS/EIP/AMIs).
* Team enablement: guardrails, golden runbooks, and small automations that remove toil.
** Day‑to‑day (what this looks like)
*** Triage overnight alerts and hot issues, set priorities, and make sure owners are clear.
* Keep dashboards honest; fix flapping or missing alerts before they wake people up.
* Check backups and recent restore points; open tickets for any gaps and track to done.
* Unblock releases; verify smoke checks; keep environments tidy and predictable.
* Lead or delegate break/fix; no lingering “mystery” incidents.
* Write down what we learned in the runbook so the next person can fix it faster.
** Weekly rhythm
*** Ops review: incidents, alerts, deploys, costs, capacity, and backup status in one short readout.
* Observability tune‑up: delete noise, add the missing signal, and test a synthetic from the edge.
* Backup/DR: run a small restore test and record RPO/RTO evidence.
* Patch and change review: what shipped, what rolled back, why.
** Monthly outcomes
*** Share availability/SLOs, MTTR, change failure rate, observability coverage, backup compliance, and costs in plain English.
* Close the top recurring issues (noisy alerts, flaky deploys).
* Refresh the most‑used runbooks; validate DR for one critical workload (tabletop or live restore).
** Core responsibilities
*** Own production readiness and stability for assigned AWS accounts and apps.
* Lead incidents and land post‑mortems; make the fixes stick.
* Keep monitoring/logging/tracing standards real; enforce SLOs and error budgets.
* Own backup strategy end-to-end, including monthly restore tests and DR docs.
* Keep access least‑privileged and auditable; rotate secrets and certs on time.
* Drive cost posture and mentor the team; make on-call humane.
** What “good” looks like
*** Visibility: one clear dashboard per service, clean alert routing, low false positives.
* Backups: 100% jobs green (or retried), documented RPO/RTO, and monthly restore tests that pass.
* Reliability: MTTR trending down; most issues solved by the first responder with a runbook.
* Change: predictable releases with smoke and rollback; fewer failed changes month over month.
* Cost: flat or down against growth; tagging at or above 95%.
** Minimum Experience Required
*** 8–10+ years in cloud/app operations with strong AWS hands-on experience.
* Comfortable leading incidents, shaping dashboards and alerts, and automating the boring bits (Terraform, Ansible, Python).
* Experience running backups/DR in AWS and…


Increase/decrease your Search Radius (miles)



Job Posting Language