Customer Data Platform- Senior Reliability Engineer — Onsite at GA
Listed on 2026-06-13
-
IT/Tech
Cybersecurity, Cloud Computing: Infrastructure & Operations, SRE/Site Reliability
Job Title:
Senior Reliability Engineer — Customer Data Platform
Location:
Atlanta , GA - Onsite
CDP MISSION:
Our mission is to be the authoritative source of truth for customer data — delivering timely, high-quality data at scale to power the contextual experiences that drive the growth of this company. Every customer profile must be accurate, trusted, and available when it matters, across every touchpoint, for the entire US adult population.
We are seeking a Senior Reliability Engineer to own production excellence for our Customer Data Platform (CDP) — the authoritative source of truth for customer data across the entire US adult population. An authoritative platform is only authoritative if it is available, secure, and timely. This role ensures exactly that: high availability, operational resilience, and compliance for the critical data systems that power customer experiences across every touchpoint.
You will lead 24x7 production support, incident management, platform governance, and security compliance — ensuring CDP remains the trusted foundation the business depends on. You will act as the bridge between engineering, platform, security, and compliance teams, driving the operational discipline that keeps CDP resilient, secure, and audit-ready at all times.
- Lead KTLO operations including 24x7 monitoring, incident management, and on-call processes — understanding that CDP downtime directly impacts customer experiences and business decisions
- Oversee production support for data pipelines, APIs, and platform services across Azure and Databricks ecosystems
- Manage job orchestration and monitoring (e.g., Control‑M), ensuring SLA adherence and timely resolution — because timeliness is a core promise of the authoritative source of truth
- Establish and enforce runbooks, SOPs, and escalation procedures tailored to CDP's criticality
- Drive root cause analysis (RCA) and implement preventive measures to reduce recurring issues and protect data trust
- Improve system reliability through automation, observability, proactive monitoring, and near‑real‑time availability targets
- Define and track SLAs, SLIs, and SLOs for critical CDP systems — with metrics aligned to data freshness, accuracy, and availability commitments
- Partner with engineering teams to implement resiliency patterns, failover strategies, and capacity planning for population‑scale data processing
- Identify and eliminate operational bottlenecks and manual processes that threaten CDP's reliability and timeliness
- Lead execution of compliance mandates, audits, and regulatory requirements impacting CDP systems — ensuring the platform that holds data for the entire US adult population meets the highest security standards
- Manage and remediate security violations, vulnerabilities, and policy breaches with urgency
- Oversee access controls, audit readiness, and governance processes in collaboration with security teams — protecting the trust that makes CDP authoritative
- Ensure adherence to data protection and privacy standards across all customer data systems
- Manage patching, upgrades, and vulnerability remediation across CDP platforms
- Lead password and credential rotation processes across systems and integrations
- Ensure operational readiness for infrastructure and platform changes with zero‑downtime deployment practices
- Coordinate with vendors and platform teams for issue resolution and maintenance activities
- Lead and coordinate onshore/offshore support teams, ensuring effective coverage and handoffs for 24x7 operations
- Partner with Data Engineering, AI/ML, and Platform teams to ensure operability and supportability of all CDP systems
- Provide operational readiness reviews for new deployments and features before they enter production
- Mentor team members and drive a culture of accountability, ownership, and continuous improvement
- Bachelor's degree…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).