Azure Platform Reliability Engineer IV Job Quincy area,Massachusetts USA,IT/Tech

Introduction

Ahold Delhaize USA, a division of global food retailer Ahold Delhaize, is part of the U.S. family of brands, which also includes five leading omnichannel grocery brands – Food Lion, Giant Food, The GIANT Company, Hannaford and Stop & Shop. Ahold Delhaize USA associates support the brands with a wide range of services, including Finance, Legal, Sustainability, Commercial, Digital and E-commerce, Technology and more.

Overview

Ahold Delhaize USA, a division of global food retailer Ahold Delhaize, is part of the U.S. family of brands, which includes five leading omnichannel grocery brands - Food Lion, Giant Food, The GIANT Company, Hannaford and Stop & Shop. Our associates support the brands with a wide range of services, including Finance, Legal, Sustainability, Commercial, Digital and E-commerce, Technology and more.

Platform Reliability Engineer will help ensure service availability, identify and automate manual processes, and bridge the gaps between product development teams and operations. Implementing operational improvements in availability, latency, performance, efficiency, change management, monitoring, incident response, patch management and capacity planning are all within scope for this role. Whether it’s done through code, the introduction of modern tools, and/or better processes - continuous improvement and efficiency is the goal.

You’ll provide operational excellence with troubleshooting skills and ownership in supporting various Azure services. The role also requires managing Linux servers in the cloud, and the candidate must have a strong Linux background.

Our flexible/hybrid work schedule includes 3 in-person days at one of our core locations and 2 remote days. Our core office locations are Salisbury, NC & Quincy, MA.

Applicants must be currently authorized to work in the United States on a full-time basis

Responsibilities

Builds, manages, and operates Azure Core Services with automation and infrastructure as code
Manages and operates the continuous delivery framework and tools; manages and automates the lifecycle of platform components and supports product teams
Leverages cloud architecture, applying site reliability principles and full‑stack troubleshooting skills across network, application, security, identity, OS (including Linux), containers, on‑prem, and distributed services
Provides reasoning about system & application architecture; reviews code to improve reliability
Identifies automation opportunities to improve patching, service health, manageability, reliability, and telemetry
Owns, triages, investigates, and resolves service issues with a focus on communication, learning & teaching
Designs process or technology solutions that monitor, identify, and resolve system and deployment issues pre‑ and post‑production, ensuring measurable KPI improvements
Drives security and compliance for services in accordance with Azure compliance requirements
Engages in service capacity planning, forecasting, and cost optimization
Creates and documents Runbooks, operational procedures, and standards in Confluence
Communicates at a deep technical level with engineering, PM, and product teams
Works within agile/scrum project teams in a support role
Remains current on new technologies, methods, coding practices, TDD, CI/CD, and operational excellence
Manages, supports, and troubleshoots Linux servers and Linux-based workloads running in cloud environments
Implements and automates Linux system operations, patching, performance tuning, and hardening

Requirements

Builds, manages, and operates Azure Core Services with automation and infrastructure as code
Manages and operates the continuous delivery framework and tools; manages and automates the lifecycle of platform components and supports product teams
Leverages cloud architecture, applying site reliability principles and full‑stack troubleshooting skills across network, application, security, identity, OS (including Linux), containers, on‑prem, and distributed services
Provides reasoning about system & application architecture; reviews code to improve reliability
Identifies automation opportunities to improve patching, service health, manageability,…


Increase/decrease your Search Radius (miles)



Job Posting Language