Senior Manager, Product Software Arch and Eng
Listed on 2026-02-16
-
IT/Tech
Systems Engineer, Cloud Computing
Who are we?
Equinix is the world's digital infrastructure company, shortening the path to connectivity to enable the innovations that enrich our work, life and planet.
A place where tech thinkers and future builders turn bold ideas into breakthrough experiences, we welcome your unique perspective. Help us challenge assumptions, uncover bias, and remove barriers—because progress starts with fresh ideas. You'll find belonging, purpose, and a team that welcomes you—because when you feel valued, you're empowered to do your best work.
Job SummaryWe are seeking an experienced Manager to build, lead, and scale a Site Reliability Engineering (SRE) team dedicated to Cloud and Server operations. This leader will establish the vision, strategy, processes, and tooling required to ensure our global server infrastructure is reliable, scalable, secure, and efficient. The role will combine deep technical expertise with strategic leadership, fostering a high-performance culture that blends software engineering and systems operations.
Key Responsibilities- Define and execute the SRE strategy for server operations, aligned with organizational objectives and SLAs.
- Build and mentor a high-performing team of SREs from the ground up, fostering a culture of accountability, innovation, and continuous improvement.
- Develop the team's roadmap for automation, monitoring, reliability engineering, and operational maturity.
- Partner with engineering, infrastructure, and security teams to ensure seamless delivery and operations.
- Collaborate with cross-functional teams including procurement, supply chain, IBX, and compliance to support operational excellence.
- Manage CAPEX/OPEX expenditures to optimize resource allocation and drive cost-efficiency across infrastructure initiatives.
- Oversee the health, performance, and capacity of server infrastructure, ensuring optimal uptime and efficiency.
- Lead implementation of monitoring, observability, and alerting systems to detect and resolve issues proactively.
- Establish incident response and postmortem practices, ensuring root cause analysis and prevention.
- Develop and enforce operational standards, playbooks, and change management processes.
- Drive automation for provisioning, configuration, patching, scaling, and recovery.
- Identify and implement tools that reduce toil, improve reliability, and increase operational speed.
- Oversee patching, vulnerability mitigation, and compliance for server hardware and OS layers.
- Ensure adherence to security and data protection requirements.
- Manage vendor relationships for hardware, monitoring tools, and infrastructure services.
- Develop and manage the SRE/server operations budget.
- 10+ years in infrastructure, systems engineering, or SRE roles, with 3+ years in a leadership position.
- Strong technical background in server infrastructure (hardware, OS, virtualization, cloud/hybrid environments).
- Expertise in automation tools, scripting languages, monitoring platforms, and incident management.
- Deep understanding of capacity planning, disaster recovery, and high-availability architectures.
- Excellent leadership, communication, and cross-functional collaboration skills.
- Familiarity with ITIL/ITSM processes.
- Experience with both on-premises and cloud infrastructure.
- Ansible/Puppet or Chef or similar.
- Terraform, Pulumi or similar.
- Linux Administration.
- Bare Metal Hypervisors (Xen/KVM or similar).
- K8S with focus on K8S Administration.
- Familiarity with Dev Ops teams.
- Familiarity with Service Teams concept.
United States - DA1 Dallas : 177, USD / Annual
United States - Redwood City Office GHQ : 213, USD / Annual
Our pay ranges reflect the minimum and maximum target for new hire pay for the full-time position determined by role, level, and location. The pay range shown is based on our compensation structure in place at the time of posting and may be updated periodically based on business needs. Individual pay is based on additional factors including job-related skills, experience, and relevant education and/or training.
The targeted pay range listed reflects the base pay…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).