Systems and Infrastructure Engineer - Enterprise Storage
Listed on 2026-06-22
-
IT/Tech
SRE/Site Reliability, Cloud Computing: Infrastructure & Operations, IT Infrastructure, Systems Engineer
Position Summary
The Enterprise Storage Platform team, part of Walmart Global Tech’s Enterprise and Cloud organization, designs, engineers, and operates the next‑generation storage infrastructure that powers Walmart’s mission‑critical applications, data platforms, cloud services, and Kubernetes workloads at multi‑petabyte scale.
We are hiring a Staff System & Infrastructure Engineer to help define and evolve Walmart’s enterprise storage strategy across block, file, object, and cloud‑native storage platforms. In this role, you will serve as a senior technical leader responsible for designing resilient storage architectures, driving platform modernization, improving reliability and performance, and enabling self‑service storage capabilities for engineering teams across Walmart.
This role is ideal for an engineer who combines deep enterprise storage expertise with modern platform engineering practices. You will work across on‑premises, cloud, and Kubernetes environments, using automation, Infrastructure as Code, observability, and AI/AIOps‑driven operational capabilities to simplify storage consumption, reduce operational toil, and improve infrastructure efficiency at massive scale.
You will partner closely with cloud, compute, networking, database, security, and application engineering teams to support Walmart’s largest business‑critical workloads while continuously advancing scalability, resiliency, utilization, performance, and cost optimization across the enterprise storage ecosystem.
What you’ll do Enterprise Storage Architecture and Engineering- Lead the design, architecture, and evolution of enterprise‑scale storage platforms across block, file, object, and cloud‑native storage services.
- Define storage platform strategy, reference architectures, technology roadmaps, and multi‑year modernization initiatives aligned to business growth and platform needs.
- Design highly available, resilient, scalable, and secure storage solutions supporting mission‑critical workloads across on‑premises, cloud, and Kubernetes environments.
- Establish data‑driven models to balance performance, resiliency, scalability, utilization, and cost across enterprise storage platforms.
- Develop AI‑powered remediation workflows and operational copilots that improve incident diagnosis, root cause analysis, and infrastructure recovery.
- Lead root cause analysis for critical production incidents and implement long‑term architectural improvements to prevent recurrence.
- Evaluate emerging storage technologies and provide technical leadership for adoption, migration, and platform transformation initiatives.
- Partner with application, database, cloud, infrastructure, security, and platform engineering teams to design optimized storage solutions for modern workloads.
- Define disaster recovery, business continuity, backup, replication, cyber‑resiliency, and data protection architectures for enterprise storage platforms.
- Drive storage efficiency through automation, intelligent tiering, deduplication, compression, lifecycle policies, and capacity optimization.
- Architect and develop self‑service storage platforms that allow application and infrastructure teams to provision, manage, and consume storage through APIs and automated workflows.
- Build scalable automation frameworks for provisioning, lifecycle management, replication, snapshots, backup orchestration, compliance validation, and infrastructure remediation.
- Design and implement Infrastructure as Code solutions using tools such as Terraform, Ansible, Argo CD, Git Ops, Helm, and CI/CD pipelines.
- Develop Python‑based automation services, SDK integrations, REST API solutions, and workflow orchestration capabilities that reduce manual work and improve platform reliability.
- Champion engineering practices that reduce operational toil and advance autonomous platform capabilities.
- Drive platform observability by designing telemetry, monitoring, alerting, logging, dashboarding, and operational analytics solutions.
- Provide technical leadership and mentorship to engineers across storage, cloud, infrastructure, and platform engineering…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).