Senior MLOps Engineer
Listed on 2026-05-16
-
IT/Tech
AI Engineer (Applied/Software), Machine Learning/ ML Engineer
Senior MLOps Engineer
Location:
India | Job Type: Full-time, Remote | Experience Level: Senior
We are looking for a Senior MLOps Engineer to design, build, and operate production‑grade infrastructure and pipelines for Machine Learning (ML), Deep Learning, and Generative AI (GenAI) solutions. Your primary focus is ensuring these systems are reliable, scalable, observable, and secure across the full lifecycle—training, deployment, monitoring, and retraining.
This is a senior hands‑on role at the intersection of ML engineering and cloud/platform reliability. You will take ownership of architectural decisions, drive engineering standards, and mentor more junior engineers. While MLOps is the top priority, there is room to contribute to AI solution implementation (model integration/experimentation) where bandwidth allows. You will work closely with AI engineers, Dev Ops, and the Data & AI Architect to standardize and scale repeatable production patterns.
Key Responsibilities- Design, implement, and maintain end‑to‑end ML pipelines covering training, validation, deployment, monitoring, and retraining — with a focus on production reliability and long‑term maintainability.
- Own and operate production ML infrastructure using Infrastructure as Code (IaC), making architectural tradeoffs and enforcing best practices.
- Lead CI/CD practices for ML, including artifact/model versioning, promotion, rollout/rollback, and dev/test/prod parity.
- Deploy and run ML/GenAI workloads on Azure using Azure App Service and Azure Container Apps, with monitoring via Application Insights.
- Implement robust model observability: performance monitoring, data quality checks, drift detection, alerting, and dashboards.
- Drive compute and cost optimization for training and inference (scaling policies, capacity planning, cost/performance tradeoffs).
- Support GenAI operational needs, including LLM inference patterns, embeddings, and retrieval pipelines; enable hooks for evals/guardrails where required.
- Ensure ML systems meet security and governance requirements (RBAC/least privilege, secrets management, audit logging, encryption, secure access patterns).
- Partner with the Data & AI Architect to translate architecture standards into reusable pipeline templates and operational controls.
- Collaborate with and mentor AI engineers; contribute to model development/experimentation as capacity allows.
- 5+ years of experience in MLOps, ML engineering, platform engineering, or a closely related role, with at least 2 years in a senior or lead capacity.
- Strong proficiency in Python for ML workflows, automation, and pipeline development.
- Hands‑on experience building and operating ML systems on Azure (OCI exposure is a plus).
- Proven track record of owning production‑grade MLOps pipelines end‑to‑end (training, deployment, monitoring, retraining) with measurable reliability or efficiency outcomes.
- Strong experience with Infrastructure as Code (Terraform or equivalent).
- Experience with MLOps tooling such as MLflow (or equivalent experiment tracking) and CI/CD pipelines.
- Experience containerizing services using Docker in production environments.
- Hands‑on experience deploying and monitoring services on Azure using Azure App Service, Azure Container Apps, and Application Insights.
- Solid understanding of GenAI/LLM‑based systems (inference workflows, embeddings, retrieval/RAG components) and their operational considerations.
- Strong communication and collaboration skills; comfortable working across functions and influencing technical decisions without direct authority.
- Experience with orchestration tools such as Apache Airflow or Azure‑native alternatives (Azure Data Factory / Azure ML Pipelines).
- Experience with feature stores and/or real‑time inference patterns.
- Exposure to multi‑cloud architectures.
- Prior experience mentoring engineers or leading technical initiatives across teams.
- Opportunity to grow your career with a rapidly growing organisation.
- Exposure to working with a Microsoft gold partner organisation with the latest technologies.
- People‑first organisation culture.
- Company‑Paid Group Mediclaim Insurance for employee, spouse and…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).