×
Register Here to Apply for Jobs or Post Jobs. X

Machine Learning Operations Engineer

Job in Dallas, Dallas County, Texas, 75215, USA
Listing for: 4MINDS
Full Time position
Listed on 2026-06-06
Job specializations:
  • IT/Tech
    AI Engineer, Cloud Computing, Systems Engineer, Data Engineer
Salary/Wage Range or Industry Benchmark: 125000 - 150000 USD Yearly USD 125000.00 150000.00 YEAR
Job Description & How to Apply Below

4

Minds is an enterprise AI fine‑tuning platform that transforms how organizations build and operate private, domain‑specific AI. Unlike static systems, 4

Minds’s AI platform learns continuously from live data in real time and can be deployed on‑prem or in your cloud provider. Our patented technologies scale existing engineering teams and empower new AI teams, enabling rapid AI deployment, adaptation, and ROI. Through 4

Minds’s automated data pipeline and proprietary knowledge graph, enterprises can connect all their data sources—including Microsoft, Databricks, AWS, and Google—creating adaptive AI that surpasses the capabilities of conventional RAG‑based systems.

Role Overview

As Machine Learning Ops Engineer at 4

Minds, you will own the infrastructure that makes our AI platform perform, scale, and ship across the most demanding deployment environments in the enterprise market: GCP, AWS, Azure, Core Weave, and on‑premise. This isn't a role where you maintain what others built. You'll actively research, evaluate, and drive improvements across every layer of the stack, from inference pipeline reliability to GPU performance optimization across hardware architectures.

Working in close partnership with the CTO, you'll take on initiatives that sit at the frontier of what's possible with modern AI infrastructure. Our platform's ability to deploy privately, on‑premise or in any cloud, is a core product promise, and you're the engineer who makes that promise real s is a senior, hands‑on role on a focused engineering and research team.

You'll bring production discipline to a system that demands it, while continuously pushing the boundaries of how we scale, optimize, and extend our infrastructure as the platform grows.

Key Responsibilities
  • Design, build, and continuously improve CI/CD pipelines that move AI models reliably from development through production, including testing, validation, and deployment automation
  • Own inference pipeline reliability and performance across GCP, AWS, Azure, Core Weave, and on‑premise environments, proactively identifying and implementing improvements
  • Research and evaluate GPU scaling approaches across hardware architectures to inform infrastructure decisions and extend platform capabilities
  • Implement and manage Nvidia Triton Inference Server and leverage Nvidia Fleet Command to streamline model inference workflows
  • Manage GPU clusters and deploy models using Kubernetes and Docker to ensure scalable, efficient model serving across all deployment environments
  • Automate model retraining and redeployment processes in response to data updates and performance changes
  • Monitor system health, performance, and reliability using AI observability tools, with a focus on continuous improvement rather than maintenance alone
  • Partner closely with the CTO on infrastructure research initiatives, translating emerging hardware and deployment capabilities into production‑ready systems
  • Support early on‑premise customer installations and contribute to knowledge transfer as Solutions Engineering takes ownership of that function
Required Qualifications
  • 5+ years of hands‑on experience in production ML infrastructure engineering, with a track record of deploying and operating AI models at scale
  • Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience
  • Deep proficiency with Kubernetes and Docker for deploying and managing AI workloads across diverse environments
  • Hands‑on experience with CI/CD pipelines designed for AI and ML model lifecycle management
  • Experience designing and managing infrastructure across multiple cloud platforms, including at least two of: GCP, AWS, Azure, Core Weave
  • Solid understanding of GPU cluster management and the performance tradeoffs across hardware configurations
  • Experience with on‑premise AI deployment and the infrastructure complexity it introduces
  • Strong grasp of MLOps principles and AI model lifecycle management from experimentation through production
  • Ability to work autonomously, make infrastructure decisions with limited oversight, and communicate technical tradeoffs clearly to senior leadership
Preferred Qualifications
  • 7+ years of ML…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary