MLOps and Cloud Ops Engineer
Job in
New York, New York County, New York, 10261, USA
Listed on 2026-05-16
Listing for:
Selby Jennings
Full Time
position Listed on 2026-05-16
Job specializations:
-
IT/Tech
Cloud Computing, SRE/Site Reliability, Data Engineering, AI Engineer (Applied/Software)
Job Description & How to Apply Below
We are seeking a Senior MLOps & Cloud Operations Engineer to help design, build, and operate the cloud and machine‑learning infrastructure that supports production AI systems. This individual will own core platform components and ensure that machine learning workflows move seamlessly from development into stable, monitored, and compliant production environments.
This is a hands‑on, high‑ownership role with close collaboration across data science, engineering, and platform teams.
Key Responsibilities- Design and maintain secure, highly available cloud environments that support AI and data workloads
- Operate and monitor core cloud and Dev Ops resources to ensure system reliability and performance
- Oversee enterprise‑scale data platforms and analytics environments, including storage strategy, lifecycle management, and job reliability
- Build and manage scalable infrastructure across AWS, Azure, and other cloud platforms as needed
- Develop and maintain CI/CD pipelines for machine learning workflows, covering training, testing, and deployment
- Implement standards for model versioning, artifact tracking, and experiment reproducibility
- Deploy and operate machine learning models using containerized and orchestrated solutions (e.g., Docker and Kubernetes)
- Monitor live ML systems for performance issues, anomalies, and model drift, and support iterative improvement
- Partner closely with data scientists, data engineers, and application teams to enable smooth ML operations
- Apply infrastructure‑as‑code and Dev Sec Ops best practices, ensuring documentation and operational readiness
- Contribute to governance, security, compliance, and data‑privacy frameworks for AI systems
- 5+ years of experience in cloud infrastructure, Dev Ops, or MLOps roles
- Strong hands‑on experience with major cloud platforms such as AWS and/or Azure
- Experience supporting or operating Databricks environments
- Proficiency with infrastructure‑as‑code tools (e.g., Terraform, Pulumi, or similar)
- Solid experience with containerization and orchestration technologies (Docker, Kubernetes)
- Experience building CI/CD pipelines for machine learning or data systems
- Strong scripting and automation skills (Python, Bash, or equivalent)
- Familiarity with ML lifecycle tools and observability platforms for monitoring and alerting
- Experience deploying and operating machine learning models in production environments
- Exposure to model monitoring, alerting, and retraining workflows
- Background working in regulated environments such as healthcare, finance, or life sciences
- Familiarity with security, compliance, and data‑privacy requirements (e.g., HIPAA, SOC
2)
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×