DevOps Engineer
Listed on 2026-02-16
-
IT/Tech
Cloud Computing, Systems Engineer, Data Engineer, SRE/Site Reliability
Overview
4
Minds is an enterprise AI fine-tuning platform that transforms how organizations build and operate private, domain-specific AI. Our AI learns continuously from live data in real time and can be deployed on-prem or your cloud provider. Our patented technologies scale existing engineering teams and empower new AI teams, enabling rapid AI deployment, adaptation, and ROI. Through 4
Minds’s automated data pipeline and proprietary knowledge graph, enterprises can connect all their data sources, including Microsoft, Databricks, AWS and Google, creating adaptive AI that surpasses the capabilities of conventional RAG-based systems.
We’re seeking a Dev Ops Engineer to build and maintain the infrastructure that powers our enterprise AI platform across cloud and on-premises environments. You’ll design scalable deployment pipelines, ensure system reliability, and enable our engineering teams to ship faster while maintaining enterprise-grade security and compliance standards.
You’ll take on the infrastructure lifecycle from provisioning through monitoring for our frontend and backend of our platform and support our AI teams to optimize how we build, deploy, and run AI workloads hybrid deployment model, supporting both cloud and on-prem installations, creates unique challenges that require creative solutions.
Reporting to our CTO, you’ll have significant autonomy to collaborate on and establish Dev Ops practices, select tooling, and shape how 4
Minds delivers reliable, secure AI infrastructure to enterprise customers.
- Design, implement, and maintain CI/CD pipelines for automated building, testing, and deployment of AI platform components
- Manage infrastructure-as-code across AWS, GCP, Azure, and on-premises environments using Terraform, Pulumi, or similar tools
- Build and maintain Kubernetes clusters optimized for AI/ML workloads, including GPU scheduling and resource management
- Implement monitoring, logging, and alerting systems to ensure platform reliability and rapid incident response
- Develop and enforce security best practices, including secrets management, access controls, and compliance automation
- Collaborate with engineering teams to containerize applications and optimize deployment workflows
- Create and maintain documentation for infrastructure, deployment procedures, and runbooks
- Automate operational tasks to reduce toil and improve team velocity
- Support enterprise customer deployments, including on-premises installations with unique infrastructure requirements
- Optimize infrastructure costs while maintaining performance and reliability standards
- BS in Computer Science, Engineering, or related technical field
- 5+ years of experience in Dev Ops, SRE, or infrastructure engineering roles
- Strong proficiency with cloud platforms (AWS, GCP, or Azure), including compute, networking, and security services
- Hands-on experience with Kubernetes in production environments, including deployment, scaling, and troubleshooting
- Expertise with infrastructure-as-code tools (Terraform, Pulumi, Cloud Formation, or similar)
- Experience building and maintaining CI/CD pipelines (Git Hub Actions, Git Lab CI, Jenkins, or similar)
- Strong scripting skills in Python, Bash, or Go for automation
- Solid understanding of networking fundamentals, including DNS, load balancing, and firewalls
- Experience with monitoring and observability tools (Prometheus, Grafana, Datadog, or similar)
- Ability to work autonomously and drive technical decisions in a fast-paced environment
- Clear technical communication with both technical and non-technical stakeholders
- Deep ownership mindset: you care about outcomes, not job titles
- MS in Computer Science, Engineering, or related technical field
- 7+ years of experience in Dev Ops, SRE, or infrastructure engineering roles
- Experience supporting AI/ML infrastructure, including GPU clusters and model serving
- Background with on-premises or hybrid cloud deployments for enterprise customers
- Experience with data pipeline infrastructure (Kafka, Airflow, or similar)
- Familiarity with security compliance frameworks (SOC 2, HIPAA, FedRAMP)
- Track record of…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).