Cloud Reliability Engineer II
Listed on 2026-02-16
-
IT/Tech
Systems Engineer, Cloud Computing
Category / Area of Expertise
IT & Technology
JobRequisition
481002
AddressUSA-NC-Salisbury-2110 Executive Drive
Store CodeGreenville Data Center - It (5118616)
Ahold Delhaize USA, a division of global food retailer Ahold Delhaize, is part of the U.S. family of brands, which includes five leading omnichannel grocery brands – Food Lion, Giant Food, The GIANT Company, Hannaford and Stop & Shop. Our associates support the brands with a wide range of services, including Finance, Legal, Sustainability, Commercial, Digital and E-commerce, Technology and more.
PrimaryPurpose
Cloud Reliability Engineer will help ensure service availability, identifying and automating manual processes, and bridging the gaps between product development teams and operations. Implementing operational improvements in availability, latency, performance, efficiency, change management, monitoring, incident response, patch management and capacity planning are all within scope for this role. Whether it’s done through code, the introduction of modern tools, and/or better processes continuous improvement and efficiency is the goal.
You’ll provide operational excellence with troubleshooting skills, ownership in supporting various Azure services.
Our flexible/hybrid work schedule includes 3 in-person days at one of our core locations and 2 remote days. Our core office locations are Salisbury, NC & Quincy, MA.
Applicants must be currently authorized to work in the United States on a full-time basis.
Duties and Responsibilities- Build, manage, and operate Azure Core Services with automation and infrastructure as code
- Manage, and operate the continuous delivery framework and tools, manage, and automate the lifecycle of the different platform components and help support product teams
- Leverage cloud architecture, applying site reliability principles, full-stack troubleshooting skills across network, application, security, Identity, OS, Containers, on-prem, and distributed services layers.
- Provide reasoning about system & application architecture as well as be comfortable looking at code and offering feedback on how it can be improved to increase reliability.
- Identifiy opportunities and drives the implementation of automation to improve patch management, service health, manageability, reliability, and telemetry.
- Own, triage, investigate and resolve service issues with an emphasis on broad communications, learning & teaching throughout the process
- Design process or technology solutions that monitor, identify, and resolve platform, system, deployment, and environmental issues both prior & post production releases, and ensure measurable improvements against Service KPIs.
- Drive Security and compliance aspects for services in accordance with Azure compliance requirements.
- Engage in service capacity planning, demand forecasting and work towards Azure cost optimizations.
- Create and document Runbooks, Operational procedures, and Standards on confluence
- Bachelor's Degree in Computer Science, Information Technology, Engineering, or related field (or equivalent work experience)
- 3+ years of IT experience focused on infrastructure which includes server, storage, network, security
- 3+ years of experience building, maintaining, and automating Azure environments in enterprise environments
- 2+ years of experience of with IaC tools (ARM, Terraform, JSON, Power Shell,
- Hands on experience deploying Azure Enterprise-scale reference architecture and its components
- Experience in Full stack Cloud Infrastructure Engineering, Operations, and Application Knowledge
- Ability to work in an Extreme Programming environment and work in a paired programming/engineering model
- Able to facilitate diverse teams, multi-task, and work under pressure to meet aggressive schedule targets
- Hands on experience with IaC tools like ADO, ARM, terraform, ansible, Power Shell, python, azcli, github
- Design, configuration, and maintenance of Kubernetes environments using AKS
- Technical expertise in Windows/Linux/VMware/Hyper-V/AKS, SQL and N0-SQL DB's, IaaS, PaaS, FaaS, Data, BCDR, Security, Management, Storage, Networking, Monitoring, Identity and Connectivity
- Experience managing and maintaining…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).