Machine Learning Operations Engineer
Listed on 2025-12-08
-
IT/Tech
AI Engineer, Machine Learning/ ML Engineer
Description
Propio is on a mission to make communication accessible to everyone. As a leader in real-time interpretation and multilingual language services, we connect people with the information they need across language, culture, and modality. We’re committed to building AI-powered tools to enhance interpreter workflows, automate multilingual insights, and scale communication quality across industries.
The Machine Learning Operations Engineer will design, build, and maintain the production infrastructure required to deploy, scale, monitor, and govern Propio’s ML and agentic AI systems. This role ensures that translation, speech, interpretation, and conversational AI models run reliably, securely, and cost-effectively in real-time environments. The MLOps Engineer bridges ML engineering, Dev Ops, and platform engineering—owning the end-to-end operational lifecycle from training pipelines to automated deployment to observability, aligning with HIPAA, SOC2, and HITRUST standards.
Key Responsibilities Model Deployment, Serving & Infrastructure- Build and maintain scalable model serving infrastructure for real-time inference (translation, ASR/TTS, agentic AI workflows)
- Implement automated CI/CD pipelines for ML models and LLM agents, including versioning, rollback strategies, and multi-environment promotion (dev ? staging ? prod)
- Develop GPU/compute orchestration strategies for cost-efficient workloads across AWS (Sage Maker, ECS/EKS, EC2, or Databricks)
- Implement reproducible ML workflows with strong dependency management, data lineage, feature versioning, and reproducibility guarantees
- Integrate observability platforms (Datadog, MLflow, Lang Smith) for end-to-end tracing of agentic workflows and multi-step tool execution
- Build alerting systems and dashboards for both business-level metrics (quality, throughput) and engineering metrics (GPU load, memory, queue depth)
- Ensure ML systems meet HIPAA, SOC2, and HITRUST standards, including encryption, audit logging, access controls, and secure handling of PHI
- Implement data validation, schema enforcement, and drift detection to guarantee data quality for both training and inference
- Manage model registry, feature store, and lineage tracking across all AI services
- Work closely with Machine Learning Engineers to product ionize models and agentic systems, ensuring seamless handoff from experimentation to deployment
- Collaborate with Data Engineering to operationalize data pipelines feeding ML/LLM workflows
- Partner with Dev Ops, Security Engineering, and Platform Engineering to integrate ML systems into Propio’s cloud stack
- Optimize model serving architectures for latency, concurrency, and cost
- Implement autoscaling, caching, routing, and load-balancing solutions for high-volume LLM and speech-based systems.
- Evaluate and implement new technologies (vector databases, real-time streaming infra, model compression, quantization)
- 3+ years of experience in ML Ops, Dev Ops, or ML platform Engineering or similar infrastructure-focused ML roles
- Strong experience with AWS (Sage Maker, EKS/ECS, Lambda, Step Functions, S3, IAM), Databricks, or equivalent cloud ecosystems
- Strong proficiency with ML lifecycle tools: MLflow, Kubeflow, Sage Maker Pipelines, Airflow, Prefect, or equivalent
- Strong foundations in CI/CD, containerization (Docker), orchestration (Kubernetes), and infrastructure-as-code (Terraform, Cloud Formation)
- Experience implementing monitoring and observability for ML systems (Datadog, Prometheus/Grafana, Lang Smith, MLflow)
- Familiarity with securing ML pipelines and handling regulated data under HIPAA and SOC2
- Proficiency in Python and experience supporting ML engineers in product ionizing ML/LLM workflows
- Bachelor’s or Master’s in Computer Science, Software Engineering, ML/AI, or related field
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).