×
Register Here to Apply for Jobs or Post Jobs. X

Senior Machine Learning Operations Engineer

Job in 453115, Agra, Madhya Pradesh, India
Listing for: Brightly Software
Full Time position
Listed on 2026-06-25
Job specializations:
  • IT/Tech
    Machine Learning/ ML Engineer, AWS, AI Engineer (Applied/Software)
Job Description & How to Apply Below
Location:

India

Employment Type:

Full‑time
About Brightly Software
Brightly Software is a leader in intelligent asset management and operational optimization, empowering organizations with data‑driven insights. As we expand our AI and ML capabilities, we are seeking a  Senior MLOps Engineer  to build and scale the infrastructure that powers our next generation of predictive and autonomous solutions.

Role Overview
As a  Senior MLOps Engineer , you will architect, develop, and operate end‑to‑end machine learning infrastructure on AWS. You will work at the intersection of ML engineering, cloud infrastructure, and developer productivity—enabling Brightly's data science teams to move seamlessly from experimentation to reliable, secure, and cost‑efficient production systems.
Your work will ensure that ML models and data pipelines are  scalable ,  observable , and  compliant with best‑in‑class MLOps practices .

Key Responsibilities
ML Platform & Infrastructure (AWS‑focused)
Design, build, and operate ML/AI development platforms on AWS, leveraging services such as  Amazon Sage Maker (Studio, Training, Real‑Time & Async Inference, Pipelines, Feature Store) ,  S3 ,  Glue ,  Lambda ,  ECS/EKS , and related cloud infrastructure.
Implement infrastructure‑as‑code using  Terraform  or equivalent, and manage workflow orchestration using  AWS Step Functions  or  Airflow . Data & Model Pipelines
Build automated data ingestion and transformation pipelines using  S3, Glue, EMR/Spark, and Redshift , incorporating data quality and lineage tooling (e.g.,  Great Expectations, Deequ ).

CI/CD for Machine Learning
Develop CI/CD pipelines for ML with  Code Build, Code Pipeline, or Git Hub Actions , integrating unit tests, data contract checks, model validation, canary/shadow deployments, and automated rollback strategies.
Model Deployment & Operations
Deploy real‑time inference endpoints (Sage Maker endpoints or FastAPI‑based services on Lambda/ECS/EKS) and scalable batch processing jobs.
Define SLOs, implement autoscaling, and drive cost/performance optimizations across ML workloads.
Monitoring, Observability & Governance
Implement production monitoring for drift, bias, and performance using  Sage Maker Model Monitor  and service telemetry tools like  Cloud Watch ,  Prometheus , and  Grafana .
Enforce security and governance best practices, including  least‑privilege IAM , VPC‑isolated architectures, encryption, and secret management.
Cross‑Functional Collaboration
Partner closely with data scientists, ML engineers, and backend engineers to product ionize ML models and streamline development workflows.
Contribute to the integration of emerging GenAI workloads, including  Amazon Bedrock , vector databases (e.g.,  Open Search ), and RAG pipelines.

Required Qualifications
Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
8+ years of professional experience in ML engineering, Dev Ops, cloud engineering, or MLOps roles, with at least 3 years in a senior or lead capacity.
3+ years of proven track record in designing and architecting robust, scalable ML systems and infrastructure in cloud environments, particularly on AWS.
5+ years of deep experience building on the  AWS ML ecosystem , including Sage Maker, S3, Lambda, ECR, EKS/ECS, Step Functions, IAM, VPC networking, and CI/CD tooling.
3+ years of hands-on experience deploying, maintaining, and scaling ML models in production environments.
3+ years of strong Python development skills and familiarity with Docker‑based workflows.
5+ years of solid understanding of ML life cycles, model evaluation, and monitoring patterns.
5+ years of extensive experience with infrastructure‑as‑code (Terraform, Cloud Formation).
5+ years of expertise in designing system architecture for ML platforms, including microservices, container orchestration, and cloud networking.
3+ years of familiarity with MLOps best practices as defined by AWS and industry standards.
2+ years of experience with data quality frameworks (Great Expectations, Deequ).
2+ years of experience optimizing distributed training workflows on AWS.
3+ years of knowledge of security and compliance requirements for ML in enterprise settings, such as IAM, encryption, and secret management.
2+ years of experience with monitoring tools (Cloud Watch, Prometheus, Grafana) and implementing model observability solutions.
5+ years of effective cross-functional collaboration skills, working closely with data scientists, ML engineers, and software engineers to deliver production-grade ML solutions.
7+ years of excellent problem-solving and communication abilities, with a focus on delivering scalable, reliable, and cost-effective ML platforms.
Position Requirements
10+ Years work experience
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary