Lead Engineer Cloud Automation
Listed on 2026-06-04
-
Engineering
Systems Engineer -
IT/Tech
Systems Engineer
Long Island City, NY, US, 11101 Washington, DC, US, 20005 Orlando, FL, US, 32827 Salt Lake City, UT, US, 84121
Position SummaryThe Lead Cloud Automation Engineer serves as the primary point of contact and technical leader of the Cloud Automation team. The Lead Engineer plans, prioritizes, delegates and assigns tasks to ensure business and customer deliverables are met quickly and efficiently. He coordinates with leadership to determine goals and resolve issues; protectively reporting status, challenges, issues and constraints. The Lead Engineer monitors and adjusts priorities to balance workload and maintain team cooperation and output.
The Lead Engineer reviews, critiques and edits team work products and documentation for readability, correctness and quality. The Lead engineer serves as a technical thought leader; following developments in the field, exploring products and technologies to improve infrastructure quality, compliance, time to market and cost. The Lead Engineer is responsible for designing, documenting, provisioning, and maintaining cloud infrastructure via code.
The Lead Engineer prepares performance and cost-optimal cloud services to deliver infrastructure aligned with reference architecture standards, frameworks, and patterns. The Lead Engineer maintains the infrastructure and collaborates with operational teams for change management and incident response. The Lead Engineer excels when working on complex projects, is motivated to deliver results, maintains operational excellence, and models the Jet Blue values of Safety, Caring, Integrity, Passion, and Fun.
- Write and execute Infrastructure as Code (IaC) pipelines capable of deploying standard cloud services including Virtual Networks, Firewalls, Load Balancers, Storage Accounts, Application Program Interface (API) Management Gateways, Kubernetes clusters, Messaging bus services, Managed databases, and Virtual Machines
- Write and execute code capable of managing Cloud governance policies, security, and cost management constructs
- Implement and leverage Configuration Management and Git Ops to maintain infrastructure leveraging Ansible, Salt, and Terraform
- Plan and build Cost effective Platform as a Service (PaaS) solutions including provisions for high availability and disaster recovery
- Code and deploy infrastructure leveraging Availability Zones/Availability Sets
- Create and maintain infrastructure documentation and operational procedures using tools such as Confluence and Lucidchart
- Collaborate and provide knowledge transfer to operational support teams and colleagues
- Create monitor alerts and remediation workflows
- Build auto-remediation capabilities using automation frameworks and serverless functions
- Implement autoscaling features, container management policies, and specify virtual hardware to optimize for low cost and high application performance
- Monitor, operate, maintain, and improve cloud environment based on operational metrics, Service Level Agreements (SLAs), and best practices
- Other duties as assigned
- Bachelor’s Degree in Computer Science or related IT discipline; OR demonstrated capability to perform job responsibilities with a High School Diploma/GED and at least four (4) years of previous relevant work experience
- Three (3) years of experience in a production operations environment
- Three (3) years of experience in Amazon AWS Cloud production environment
- Three (3) years of experience coding and scripting in one of the following:
Python, Bash, Power Shell - Three (3) years of experience deploying and operating containerized applications
- Experience deploying scaling, and administering production Kubernetes clusters - Azure Kubernetes Service (AKS) preferred
- Excellent knowledge of Continuous Integration (CI)/Continuous Delivery (CD) and Infrastructure as Code (IaC) automation tools
- Proficient in IT Infrastructure, networking concepts, IT security, server engineering, virtualization, data center tools, processes, and modern event-driven application architecture
- Strong understanding of DR (Disaster Recovery) and HA (highly availability) solutions and their use
- Available for overnight travel (up to 10%)
- In…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).