Senior Engineer IT Reliability
Job in
New York, New York County, New York, 10261, USA
Listed on 2026-06-02
Listing for:
JetBlue
Full Time
position Listed on 2026-06-02
Job specializations:
-
IT/Tech
Systems Engineer, SRE/Site Reliability, Network Engineer, Cloud Computing
Job Description & How to Apply Below
Position Summary
The Senior Reliability Engineer (Infrastructure) is responsible for ensuring the reliability, availability, and recoverability of Jet Blue's critical infrastructure platforms. This role applies engineering discipline to operational challenges, leads response to complex incidents, and drives improvements that reduce operational risk over time. The Senior Reliability Engineer works closely with cloud, platform, network, and application teams to ensure infrastructure systems are observable, resilient, and safe to operate in production, while exhibiting the Jet Blue values of Safety, Caring, Integrity, Passion, and Fun.
EssentialResponsibilities
- Own reliability outcomes for critical infrastructure platforms supporting Jet Blue production systems.
- Define and manage Service Level Indicators (SLIs) and Service Level Objectives (SLOs) for infrastructure capabilities.
- Lead response, diagnosis, and resolution of complex infrastructure incidents as Incident Commander or senior technical authority.
- Participate in a 24x7 on‑call rotation and help improve incident response practices.
- Diagnose and mitigate failures across Linux systems, Kubernetes platforms, Azure cloud infrastructure, and networking layers.
- Review and approve high‑risk infrastructure changes with consideration for blast radius, rollback readiness, and dependency impact.
- Identify and mitigate capacity, scaling, and saturation risks across infrastructure systems.
- Improve monitoring, alerting, and dashboards to reflect real system health and customer impact.
- Reduce operational toil through automation, tooling, and reliability‑focused engineering improvements.
- Develop and maintain operational documentation, runbooks, and recovery procedures.
- Lead blameless post‑incident reviews and drive corrective actions to prevent repeat incidents.
- Mentor engineers on operational excellence, reliability practices, and incident response.
- Collaborate with cloud, platform, network, and security teams to ensure reliable and secure infrastructure operations.
- Ensure infrastructure platforms meet regulatory, compliance, and security requirements as applicable.
- Other duties as assigned.
- Bachelor's Degree in Computer Science or a related discipline; OR demonstrated capability to perform job responsibilities with a combination of a High School Diploma/GED and at least four (4) years of relevant experience.
- Five (5) or more years of experience in Site Reliability Engineering, infrastructure operations, Dev Ops, or production engineering roles.
- Demonstrated experience operating and supporting large‑scale production infrastructure.
- Strong Linux troubleshooting skills across CPU, memory, disk, and process behavior.
- Strong understanding of networking fundamentals including TCP/IP, DNS, load balancing, and failure modes.
- Hands‑on experience operating Kubernetes clusters, including troubleshooting, scaling, and failure recovery.
- Experience operating infrastructure in a public cloud environment (Azure preferred).
- Experience with observability tools including metrics, logs, tracing, and alerting.
- Proficiency in at least one programming or scripting language (such as Python, Go, Java, or similar) used to automate operations and improve reliability.
- Experience using infrastructure‑as‑code and automation to reduce operational toil.
- Ability to make sound decisions under pressure and communicate clearly during incidents.
- Able to work flexible hours and participate in on‑call rotations.
- Available for occasional overnight travel (10%)
- Must pass a pre‑employment drug test
- Must be legally eligible to work in the country in which the position is located
- Authorization to work in the US is required. This position is not eligible for visa sponsorship
- Seven (7) or more years of experience in Site Reliability Engineering, infrastructure operations, Dev Ops, or production engineering roles.
- Experience defining and operationalizing SLOs and using error budgets to guide reliability decisions.
- Experience with capacity planning and demand forecasting.
- Experience operating highly available, distributed systems.
- Experience mentoring…
Position Requirements
10+ Years
work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×