×
Register Here to Apply for Jobs or Post Jobs. X

Principal Site Reliability Engineer – AI

Job in New York, New York County, New York, 10261, USA
Listing for: Motion Recruitment Partners, LLC
Full Time position
Listed on 2026-01-01
Job specializations:
  • IT/Tech
    Cloud Computing, Systems Engineer
Salary/Wage Range or Industry Benchmark: 100000 - 125000 USD Yearly USD 100000.00 125000.00 YEAR
Job Description & How to Apply Below
Location: New York

About Our Client

Our client is an AI-driven health-tech start-up on a mission to transform patient care through intelligent, secure, and highly reliable clinical automation tools. Their platform powers real-time insights for clinicians, improving patient outcomes and enabling healthcare systems to operate with unprecedented efficiency. They are entering a high-growth phase and are seeking a Principal Site Reliability Engineer to help scale their infrastructure and ensure world-class reliability.

Role Overview

Our client is hiring a Principal Site Reliability Engineer to serve as the technical authority for the reliability, scalability, and performance of their cloud-native infrastructure. This individual will design and implement systems that support rapid product development while meeting the resilience requirements of clinical-grade AI applications. The role blends hands‑on engineering with architectural leadership and cross‑functional collaboration across product, ML, infrastructure, and security teams.

What

You’ll Do
  • Architect, build, and optimize scalable, secure, and highly available cloud infrastructure (AWS/Google Cloud Platform/Azure).
  • Lead incident response, root‑cause analysis, and production reliability improvements across the platform.
  • Implement observability frameworks (metrics, tracing, logging) that provide deep visibility into system performance.
  • Partner with ML and data engineering teams to operationalize AI/ML pipelines, ensuring reliability from data ingestion through model deployment.
  • Develop automated CI/CD pipelines, infrastructure‑as‑code, and guardrails for safer, faster deployments.
  • Define SLOs/SLIs and establish long‑term reliability roadmaps aligned with clinical‑grade requirements.
  • Mentor SREs and software engineers, promoting Dev Ops and reliability best practices across engineering.
  • Lead capacity planning, performance testing, and system hardening initiatives.
  • Collaborate with security teams to ensure compliance with HIPAA, SOC 2, and relevant privacy and security standards.
  • Evaluate new technologies and drive adoption of tools that improve operational excellence.
What They’re Looking For
  • 8+ years in SRE, Dev Ops, Infrastructure Engineering, or related fields.
  • Deep expertise with Kubernetes, container orchestration, and microservices architecture.
  • Strong experience with cloud platforms (AWS/Google Cloud Platform/Azure) and infrastructure‑as‑code tools such as Terraform, Pulumi, or Cloud Formation.
  • Advanced proficiency in automation/scripting languages such as Python, Go, or Bash.
  • Strong knowledge of distributed systems, reliability engineering patterns, and modern observability stacks (Prometheus, Grafana, Open Telemetry, Datadog, etc.).
  • Experience supporting highly regulated or mission‑critical environments (healthcare, fintech, SaaS).
  • Hands‑on experience with ML infrastructure, model lifecycle management, or data pipelines is a plus.
  • Excellent communication skills and a proactive, ownership‑oriented mindset.
Why Candidates Love This Role
  • Mission‑driven work that directly influences patient care and health outcomes.
  • Ownership of foundational infrastructure in a rapidly scaling AI start‑up.
  • Competitive compensation, equity, and benefits.
  • A modern, cloud‑native tech stack with the ability to shape future architecture.
  • A collaborative and innovative engineering culture.
#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary