More jobs:
Site Reliability Engineer
Job in
New York, New York County, New York, 10261, USA
Listed on 2026-02-15
Listing for:
Berkley Hunt
Full Time
position Listed on 2026-02-15
Job specializations:
-
IT/Tech
Cloud Computing, Systems Engineer, SRE/Site Reliability
Job Description & How to Apply Below
Berkley Hunt has partnered with a high-growth fintech company to hire a Site Reliability Engineer to help build, operate, and scale a globally distributed, highly available cloud platform. This role focuses on reliability, automation, and operational excellence, working closely with engineering teams to ensure systems are resilient, scalable, and production-ready from day one.
Hybrid In Manhattan Who You Are:- You think in systems, not silos, you naturally connect infrastructure decisions to customer experience and business impact.
- You have strong experience running production environments at scale and understand what “good” looks like in terms of uptime, latency, and reliability.
- You’re confident operating Kubernetes in real-world production settings, not just deploying to it.
- You have a solid background in cloud architecture across AWS and GCP, and understand the trade-offs of distributed systems.
- You are proactive about identifying risk and eliminating single points of failure before they become incidents.
- You are comfortable working in fast-paced environments where priorities evolve and ownership is shared.
- You believe infrastructure should be repeatable, observable, and continuously improving.
- Architect and evolve cloud infrastructure to support a secure, highly available, and globally distributed fintech platform.
- Embed reliability best practices into the development lifecycle, influencing design decisions before code reaches production.
- Drive improvements in deployment workflows through Git Ops and Infrastructure-as-Code methodologies.
- Enhance system visibility by building robust monitoring, logging, and alerting frameworks.
- Lead incident response efforts, conduct post-incident reviews, and implement preventative measures to strengthen platform resilience.
- Continuously refine Kubernetes environments to improve performance, scalability, and operational efficiency.
- Partner cross-functionally with engineering and product teams to balance speed of delivery with operational stability.
- Reduce operational toil by identifying automation opportunities and improving internal tooling.
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×