Platform Operations Lead
Listed on 2026-06-03
-
IT/Tech
Systems Engineer, Cloud Computing
At HDR, our employee‑owners are fully engaged in creating a welcoming environment where each of us is valued and respected, a place where everyone is empowered to bring their authentic selves and novel ideas to work every day. We foster a culture of inclusion throughout our company and within our communities, constantly asking:
What is our impact on the world?
Watch Our Story:
Responsibilities- Define and lead the operational strategy for observability, monitoring, incident management, and reliability engineering across the VCF platform.
- Establish enterprise‑level standards for service health modeling, operational telemetry, dashboard architecture, alert governance, and SLO adoption.
- Own the major incident operational model for platform services, including escalation design, command structure, stakeholder communication, and recovery accountability.
- Drive long‑term reliability improvement initiatives by identifying systemic issues, recurring failure points, and architectural opportunities.
- Lead design decisions related to integration of VCF Operations with enterprise observability, ITSM, CMDB, automation, and reporting ecosystems.
- Oversee operational readiness for additional VCF platform tools and shared services, ensuring consistent support models and telemetry coverage.
- Direct capacity and performance strategies for platform growth, resiliency, and service sustainability.
- Embed cloud security operations, monitoring controls, policy adherence, and compliance reporting into platform operations practices.
- Partner with architecture, cloud, security, compliance, and service management teams to align platform operations with enterprise standards and risk controls.
- Establish operational governance for alert quality, incident trends, post‑incident action closure, and service performance reporting.
- Provide leadership, coaching, and technical direction to Platform Operations Engineers across all levels.
- Influence roadmap priorities for automation, resilience engineering, self‑healing capabilities, and platform operational maturity.
- Schedule & Presence:
This on‑site role supports 24/7 operations through real‑time collaboration within a 6:00 AM – 6:00 PM window, Monday through Friday. The position requires scheduled on‑call flexibility and the ability to remain reachable during off‑hours for critical business continuity.
- Deep hands‑on experience with VMware Cloud Foundation Operations, Aria Operations, Aria Operations for Logs, and adjacent VCF platform tools.
- Experience supporting hybrid cloud environments and integrating public cloud operations with on‑premises platforms.
- Strong familiarity with Dynatrace, Service Now, CMDB/ITOM, and enterprise event management ecosystems.
- Experience establishing operational controls aligned to internal audit, security policy, and compliance frameworks.
- Experience with infrastructure resilience, service restoration planning, and operational risk reduction initiatives.
- Relevant advanced certifications in VMware, Azure, security, ITIL, or observability disciplines.
- Bachelor’s degree in Information Technology, Computer Science, Engineering, or related field, or equivalent experience.
- Minimum 7 years of experience in platform operations, infrastructure engineering, observability, site reliability, or related enterprise operations roles.
- Deep experience leading monitoring, incident management, and reliability programs for complex enterprise infrastructure.
- Expert knowledge of VMware vSphere and strong operational knowledge of VMware Cloud Foundation platforms and dependencies.
- Demonstrated experience designing service reliability frameworks, operational governance models, and metrics‑based improvement programs.
- Strong experience with enterprise observability architecture, integration strategy, and automation design.
- Experience leading cross‑functional technical initiatives involving operations, security, cloud, and compliance stakeholders.
- Strong understanding of cloud security practices, operational controls, and regulatory/compliance considerations.
Full‑time
LocationsUnited States – Arizona, Phoenix;
Minnesota, St. Paul;
Texas, Austin;
Missouri, Kansas City;
South Carolina, Columbia;
Colorado, Denver;
Virginia, Vienna;
Minnesota, Saint Louis Park;
California, Folsom;
Oregon, Portland;
New Jersey, Pennington;
New Jersey, Woodcliff Lake;
Florida, Orlando;
Pennsylvania, Harrisburg;
Oregon, Salem;
Texas, Fort Worth;
Pennsylvania, Pittsburgh;
Texas, San Antonio;
Texas, Dallas;
New Mexico, Albuquerque;
Washington, Olympia;
Colorado, Englewood.
At HDR, we are committed to the principles of employment equity. We are an affirmative action and equal opportunity employer. We consider all qualified applicants, regardless of criminal histories, arrest and conviction records.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).