Staff Engineer, Machine Learning Operations
Listed on 2026-02-21
-
IT/Tech
AI Engineer, Machine Learning/ ML Engineer, Data Engineer, Cloud Computing
Staff Engineer, Machine Learning Operations
Job Category :
Information Technology
Requisition Number : STAFF
005886
Apply now
- Posted :
February 18, 2026 - Full-Time
- Remote
Showing 1 location
TN - Brentwood Physical
Corporate Headquarters
5410 Maryland Way
Ste 301
Brentwood, TN 37027, USA
TN - Brentwood Physical
Corporate Headquarters
5410 Maryland Way
Ste 301
Brentwood, TN 37027, USA
Staff Engineer,Machine Learning Operations
The
Staff Engineer,Machine Learning Operations will architect, own, and scale the machine learning infrastructure and deployment pipelines that power Monogram's operational and clinical initiatives. Operating with full autonomy,you'llestablish
MLOpsexcellence, mentor engineering teams, and drive strategic technical decisions that directlyimpactpatient outcomes. This role requires a seasoned engineer who can balance innovation with production reliability while building systems that handle healthcare's most sensitive and complex data.
- ML Platform Ownership:Architect andmaintainenterprise-grade ML infrastructure, including model versioning, automated testing frameworks, containerization strategies, CI/CD pipelines, and comprehensive monitoring systems for model performance, data quality, and drift detection.
- Technical Leadership:Drive
MLOpsstrategy and standards across the organization. Mentor data scientists and engineers on production best practices, system design, and scalable architecture patterns. - End-to-End Model Lifecycle:Own the complete journey from model development through production deployment, including real-time and batch inference systems, A/B testing frameworks, and automated retraining pipelines.
- Cross-Functional Partnership:Collaborate with clinical leaders, product teams, and data scientists to translate complex healthcare requirements into robust, scalable ML solutions. Present technical strategies to executive stakeholders.
- Production Excellence:Build fault-tolerant, compliant systems that meet healthcare security and privacy standards. Establish SLAs, incident response protocols, and disaster recovery procedures for mission-critical ML services.
- Innovation & Scale:Evaluate and integratecutting-edge
MLOpstools and practices. Design systems that scale with Monogram's growth while reducing operational overhead and improving model iteration velocity.
- Experience:
- 10+ years in software engineering with 5+ years focused on ML infrastructure,MLOps, or production ML systems
- 5+ years of Python development with strong software engineering fundamentals
- 3+ years architecting and deploying production ML systems on cloud platforms (Azure preferred)
- Proven track record building and scaling ML platforms from the ground up
- Healthcare or regulated industry experience strongly preferred
- Technical
Skills: - Expert-levelproficiencywith
MLOpstooling (MLflow, Kubeflow, Sage Maker, Azure ML, etc.) - Deep experience with containerization (Docker, Kubernetes), orchestration tools (Airflow, Prefect), and infrastructure-as-code (Terraform, ARM templates)
- Advanced knowledge of CI/CD systems, automated testing strategies, andGit Ops workflows
- Expertise in model monitoring, observability, feature stores, and experiment tracking at scale
- Production experience with both batch and real-time inference architectures
- Understanding of healthcare data standards (FHIR, HL7, claims data) is a plus
- Leadership & Communication:
- Demonstrated ability to influence technical direction and mentor senior engineers
- Exceptional communication skills with ability to distill complex technical concepts for diverse audiences
- Track recordof driving consensus on architectural decisions across multiple stakeholders
- Educational Background:
- Bachelor's degree in Computer Science, Engineering, or related field required
- Master's degree or equivalent practical experience preferred
- Additional
Skills: - Systems thinking with focus on reliability, scalability, and maintainability
- Deep understanding of security, compliance, and privacy requirements in healthcare (HIPAA)
- Bias toward action with pragmatic approach to technical debt and iterative improvement
- Comprehensive Benefits-Medical, dental, and vision insurance,…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).