Senior Site Reliability Engineer Job New York New York USA,IT/Tech

Location: New York

Senior Site Reliability Engineer with deep expertise in optimizing system reliability, performance, and scalability across cloud environments (Azure, Kubernetes, Service Mesh).
Proficient in defining, measuring, and improving Service Level Objectives (SLOs), managing error budgets, and automating toil to drive operational excellence in a blameless culture.
Remote-first opportunity for US-based employees with the option to work in-person out of our Manhattan office.

Start your adventure with Zip

Join Zip’s Engineering function and put your name to solving fascinating challenges at scale in an agile, test-driven development environment. If you value good domain-driven design and enjoy delivering quality work at pace, you’ll be a great fit with the squads responsible for building cloud-native software applications that serve millions of customers and process billions of dollars in payments.

We are seeking a seasoned leader with extensive senior leadership experience to spearhead our Site Reliability Engineering (SRE) initiatives and mentor our engineering team. This role requires a deep understanding of operational excellence, managing production risk, and the ability to lead reliability initiatives from inception to completion. Collaboration is key in our environment, so we need someone who excels in a team-oriented setting.

As we aim to double our footprint this year, you will encounter complex challenges that demand innovative solutions and strategic insight to maintain and improve system reliability you are passionate about driving infrastructure excellence and nurturing talent within a dynamic SRE team, we would love to hear from you.

Interesting problems you’ll get to solve

Work within an infrastructure that is capable of handling billions of dollars in transactions quickly and securely
Collaborate with engineering teams to design and deploy highly reliable and scalable integrated solutions for Fortune 100 companies.
Develop automated upgrade systems for a constantly evolving Azure architecture
Maintain a complex event sourcing environment using CQRS principles
Develop self-service tooling and automation (e.g., using Terraform, Atlantis, ArgoCD) to empower development teams to operate services within established reliability standards and reduce toil.
Monitor for service health and create automatic recoveries using metrics-based canaries to ensure reliable code deployment

What you’ll bring to the team

10+ years of experience in a Site Reliability Engineering, Production Engineering, or equivalent role.
5+ years of experience working with Kubernetes or similar microservice architecture.
5+ years of experience working in an Azure environment
Proven experience defining and implementing Service Level Indicators (SLIs) and Service Level Objectives (SLOs) and managing error budgets.
Experience working in an agile environment and knowledge of agile practices
Jira experience with project management and story creation is a plus
Experience with CI/CD systems preferably using Azure Dev Ops or Git Hub Actions
Strong understanding of networking and routing protocols especially those involved in Service Mesh architectures
Experience incorporating AI tools such as ChatGPT, Cursor, Codex, or Git Hub CoPilot into your day to day work.
Must be able to work in an on-call rotation with a focus on sustainable incident response and post-mortem analysis (blameless culture).

What you’ll get in return

Zip is a place where you’ll get out what you put in. The newness of our sector means we need to move at pace and embrace change, and our promise to you when you join the team is that you’ll feel empowered and trusted to make big things happen quickly.

We want you to feel welcome and as though you have the support to be yourself, and care for yourself ause it’s important to us that you make the most of the opportunities you’ll get to grow your skills and your career, and be surrounded by smart, friendly people and leaders that have your back.

We think these are just some of the best things about being a Zipster. We will also offer you:

Flexible working culture
Incentive programs
20 days PTO every year
Generous paid parental leave
Leading family…


Increase/decrease your Search Radius (miles)



Job Posting Language