×
Register Here to Apply for Jobs or Post Jobs. X

MLops Engineer

Job in Indianapolis, Hamilton County, Indiana, 46262, USA
Listing for: Insight Global
Full Time position
Listed on 2026-04-29
Job specializations:
  • IT/Tech
    Cloud Computing, SRE/Site Reliability
Salary/Wage Range or Industry Benchmark: 80000 - 100000 USD Yearly USD 80000.00 100000.00 YEAR
Job Description & How to Apply Below
Location: Indianapolis

Job Description

Insight Global is seeking a Machine Learning Reliability Engineer for a large enterprise client modernizing and scaling its ML/AI platform. This role focuses on ensuring ML systems are reliable, observable, and cost‑efficient  engineer will define SLOs, build robust Datadog monitoring, standardize incident response, and partner closely with Fin Ops and governance teams. This is a highly visible role critical to production ML success - ideal for an SRE who understands ML workloads and wants to own reliability, observability, and operational excellence across enterprise AI systems.

We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances.

If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to  To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy:

Skills and Requirements
  • Strong background in Site Reliability Engineering (SRE) principles
  • Hands‑on Datadog experience (dashboards, metrics, logs, traces, alerting)
  • Experience supporting ML/AI systems in production
  • Ability to define and enforce SLOs / SLIs for distributed systems
  • Monitoring of availability, latency, accuracy, drift, and pipeline health
  • Experience operating in cloud environments (Azure strongly preferred)
  • Proven skills in performance tuning and cost optimization
  • Incident response ownership (alerts, runbooks, escalation paths)
  • ML‑specific observability (model performance, drift, LLM monitoring)
  • AI / LLM observability experience
  • Snowflake and modern data platform monitoring
  • Fin Ops partnership experience
  • Service Now integration (incident & change management)
  • Enterprise audit, governance, and compliance exposure
#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary