Lead DevOps Engineer
Listed on 2026-02-16
-
IT/Tech
Systems Engineer, Cloud Computing
Join to apply for the Lead Dev Ops Engineer role at David Zwirner
At David Zwirner, we look to be an industry leader in our field, with our artists at the center of what we do. Our global exhibition program expands beyond our locations in New York, London, Los Angeles, Hong Kong, and Paris, representing seventy artists and estates. Home to innovative, singular, and pioneering exhibitions across a variety of media and genres. Active in both the primary and secondary markets, David Zwirner has helped foster the careers of some of the most influential artists today.
Aboutthe Opportunity
David Zwirner seeks an experienced and strategic Lead Dev Ops Engineer to guide the maturation of our infrastructure and provide coverage across European time zones (UTC to UTC +3). This role emphasizes technical leadership, strategic planning, and team mentorship. You will be responsible for enhancing the security, reliability, and resilience of our cloud footprint: primarily AWS, with additional environments including Azure, GCP, Alibaba Cloud, Databricks, and Vercel.
As the leader of the Dev Ops function, you will define and execute the strategic roadmap while remaining hands‑on with your team. We’re seeking a pragmatic leader who will take full ownership of our infrastructure, systematically address technical debt, and champion a culture of operational excellence and security. Candidates must be diligent, extremely organized, and possess excellent prioritization and communication skills.
The core engineering team operates on New York hours, so this position will play a key leadership and coverage role for EMEA.
- Leadership:
Lead direction and mentor for the Dev Ops team; set technical direction for infrastructure and security; foster a culture of ownership, reliability, and continuous improvement. - Roadmap Ownership & Strategy:
Define, own, and drive the Infrastructure & Security Roadmap, prioritizing infrastructure ownership, profound monitoring, disaster recovery, developer experience, and security hardening. - Infrastructure as Code (IaC):
Inventory and capture unmanaged resources in Terraform (and CDK/SST where required); create reusable modules and guardrails; institute code reviews and change management. - Platform Operations (AWS‑first):
Design and operate services built on ECS (Fargate), ECR, RDS, Elasti Cache, S3, ALB/Cloud Front, WAF, Lambda, Event Bridge, Cloud Watch; improve networking, IAM, and resilience. - Resilience & Reliability:
Modernize critical workloads; design and run disaster recovery drills; automate backups/restore; codify RPO/RTO targets and runbooks; lead incident response and postmortems. - Observability & On‑Call:
Standardize monitoring/alerting with Datadog, Cloud Watch, Sentry, Pager Duty; implement SLOs and noise‑reduction baselines; maintain a humane on‑call rotation. - Security Hardening:
Mature the configuration and rollout of tools like Jit and Crowd Strike; improve firewall/WAF rules; enforce secrets management and least‑privilege access; champion threat modeling and automated scrolling. - Collaboration & Governance:
Serve as a key technical voice on the Architecture Review Board; partner with Product and Engineering to align solutions with operational standards and business goals.
- Legal authorization to work in the UK.
- Track record in a senior/lead Dev Ops, SRE, or Platform role, including mentorship of engineers.
- Expert‑level Terraform (including importing existing resources and taming legacy estates).
- Deep, hands‑on experience with AWS (ECS, RDS, Elasti Cache, Lambda, ALB, WAF, S3, Cloud Front, Event Bridge, Cloud Watch) and production networking/IAM.
- Proven design and maintenance of CI/CD pipelines (Git Hub Actions) and container workflows (Docker, ECS Fargate or Kubernetes).
- Proficiency with modern observability/monitoring (Datadog, Cloud Watch, Sentry, Pager Duty), incident response, and incident retrospectives.
- Strong background in cloud security principles and practical hardening.
- Ability to define and execute a technical roadmap and communicate with both technical and non‑technical stakeholders.
- Experience with GCP, Azure, Alibaba Cloud, and managed platforms (Databricks, Vercel).
- Familiarity with SST/CDK, Next.js/Vercel delivery flows, and performance considerations for web platforms.
- VPN/zero‑trust networking (e.g., Tailscale); perimeter hardening and WAF tuning.
Please submit a resume and cover letter, and be prepared to provide three (3) professional references upon request.
#J-18808-LjbffrTo Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: