DevOps Engineer/Machine Learning Operations
Listed on 2026-01-12
-
IT/Tech
AI Engineer, Machine Learning/ ML Engineer
Summary
Our client is seeking a skilled Kubernetes & ML Ops Engineer to join their fast-paced, collaborative Dev Ops and Data Science team. This role focuses on deploying and operating scalable AI/ML services on AWS EKS, managing vector databases, and integrating cutting-edge AI services like Azure OpenAI APIs.
Responsibilities- Deploy and manage microservices, including APIs, Docker-based services, and vector stores on Amazon EKS
- Ensure 24/7 cluster uptime and seamless service connectivity
- Validate stability and performance of deployed configurations and services
- Manage Postgre
SQL with pgvector for embedding storage - Securely expose and integrate vector databases within Kubernetes environments
- Monitor and troubleshoot database performance issues
- Implement and maintain CI/CD pipelines using Git Lab
- Use Terraform for Infrastructure as Code (Cloud Formation experience a plus)
- Automate build, test, and deployment workflows
- Set up monitoring dashboards and alerting via Datadog (Splunk or alternatives welcomed)
- Ensure full visibility into system health, latency, and uptime metrics
- Collaborate with data science teams to operationalize ML workloads
- Support Retrieval-Augmented Generation (RAG) architectures and vector-based search
- Integrate securely with Azure OpenAI APIs, including implementing internal guardrails
- Proven expertise managing scalable production Kubernetes workloads on EKS (or equivalent)
- Hands‑on experience with Postgre
SQL pgvector; knowledge of other vector stores like Pinecone, Weaviate, or Milvus is a plus - Solid conceptual understanding of LLMs, embeddings, retrievers, and RAG systems
- Familiarity with OpenAI services and API-based LLM workflows
- Highly collaborative Dev Ops/Data Science team with a strong emphasis on secure and ethical AI usage
- Fast‑paced environment dedicated to pushing the boundaries of enterprise AI capabilities
- Experience deploying AI/ML workloads on AWS, Azure, or Google Cloud Platform
- Exposure to cloud‑native AI services such as Sage Maker, Azure ML, or Vertex AI
- Understanding of compliance and security guardrails related to LLM deployments
- Knowledge of service meshes or API gateways for secure model exposure
The pay range is the lowest to highest compensation we reasonably in good faith believe we would pay at posting for this role. We may ultimately pay more or less than this range. Employee pay is based on factors like relevant education, qualifications, certifications, experience, skills, seniority, location, performance, union contract and business needs. This range may be modified in the future.
Bonuses& Incentives
This job is not eligible for bonuses, incentives or commissions.
BenefitsWe offer comprehensive benefits including medical/dental/vision insurance, HSA, FSA, 401(k), and life, disability & ADD insurance to eligible employees. Salaried personnel receive paid time off. Hourly employees are not eligible for paid time off unless required by law. Hourly employees on a Service Contract Act project are eligible for paid sick leave.
EEO StatementKforce is an Equal Opportunity/Affirmative Action Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, pregnancy, sexual orientation, gender identity, national origin, age, protected veteran status, or disability status.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).