Senior DevOps Engineer - AI Healthcare Leader Job New York New York USA,IT/Tech

Location: New York

Overview

Staff Dev Ops Engineer — Cloud Infrastructure, Kubernetes & AI Platform Operations

This opportunity is with a client of Andiamo, an innovative healthcare technology organization building AI-driven digital platforms that support patients, providers, and enterprise healthcare systems are seeking a highly experienced Staff Dev Ops Engineer to help lead the evolution of a modern cloud infrastructure environment powering mission-critical healthcare and AI applications. This is a senior-level engineering role designed for someone who thrives in complex distributed systems, enjoys solving large-scale operational challenges, and wants meaningful ownership over platform reliability, scalability, and infrastructure strategy.

You will play a central role in designing and operating cloud-native infrastructure across both internal platforms and enterprise partner environments. The ideal candidate combines deep Kubernetes expertise with strong cloud engineering capabilities, infrastructure-as-code experience, and a passion for building resilient, secure, and highly automated systems. This role also offers the opportunity to work at the intersection of Dev Ops, AI infrastructure, platform reliability, and healthcare technology in a highly collaborative and fast-moving environment.

Responsibilities

Cloud Infrastructure & Platform Engineering:
Lead the design, implementation, and ongoing optimization of Kubernetes-based infrastructure environments supporting large-scale production applications and enterprise integrations.
Architect and maintain cloud-native systems across multi-cloud environments, ensuring scalability, reliability, security, and operational efficiency.
Develop and enhance reusable infrastructure-as-code modules using Terraform across cloud providers and supporting services.
Drive improvements to deployment pipelines, automation frameworks, and platform tooling that enable engineering teams to ship software efficiently and safely.
CI/CD, Automation & Developer Enablement:
Design and maintain enterprise-grade CI/CD workflows and reusable pipeline frameworks that support secure and scalable software delivery.
Support Git Ops-based deployment strategies and operational workflows across engineering teams.
Own and maintain critical infrastructure services running within Kubernetes environments, including deployment automation, ingress systems, observability tooling, and operational support platforms.
Continuously improve developer productivity, deployment reliability, and operational visibility through automation and platform enhancements.
Security, Compliance & Reliability:
Implement and support infrastructure security controls, secrets management strategies, container security scanning, and software supply chain protections.
Partner with internal teams to support compliance initiatives aligned to regulated environments including healthcare and security-focused operational standards.
Lead disaster recovery readiness initiatives including failover testing, operational runbooks, resiliency planning, and recovery validation exercises.
Monitor, troubleshoot, and improve production reliability while participating in operational incident response and daytime on-call rotations.
AI Infrastructure & Operational Innovation:
Contribute to the development of next-generation AI-powered operational tooling and intelligent infrastructure automation.
Help evaluate and implement emerging technologies that improve observability, operational scalability, and platform intelligence.
Support environments involving AI workloads, high-performance infrastructure, and advanced cloud orchestration patterns.
Leadership &

Cross-Functional Collaboration:

Mentor engineers across the Dev Ops and infrastructure organization while helping establish operational standards and engineering best practices.
Partner closely with software engineering, security, product, and platform teams to drive infrastructure initiatives and long-term technical strategy.
Provide technical leadership on complex platform projects spanning cloud architecture, reliability engineering, automation, and enterprise integrations.

Qualifications

Required Qualifications

5+ years of experience in Dev Ops, Platform Engineering, or Site Reliability Engineering
Deep expertise with Kubernetes and cloud-native operational tooling
Strong hands-on experience with Helm, ArgoCD, Helmfile, cert-manager, Kyverno, NGINX Ingress, and related Kubernetes ecosystem technologies
Extensive experience designing and operating infrastructure on Google Cloud Platform including GKE, IAM, Cloud SQL, storage services, and identity management
Advanced Terraform experience including modular infrastructure design, multi-environment deployments, and infrastructure testing practices
Strong experience with Git Lab CI/CD pipelines, Git Ops methodologies, and deployment automation
Programming proficiency in Python and/or Go
Experience supporting infrastructure security, secrets management, and compliance-focused operational…