AI DevOps Engineer
Listed on 2026-06-18
-
IT/Tech
SRE/Site Reliability, Cloud Computing: Infrastructure & Operations, AI Engineer (Applied/Software), IT Infrastructure
New York, United States | Posted on 05/01/2026
Do you love a career where you Experience , Grow & Contribute at the same time, while earning at least 10% above the market? If so, we are excited to have bumped onto you.
Learn how we are redefining the meaning of work, and be a part of the team raved by Clients, Job-seekers and Employees.
If you are a Field Service Engineer looking for excitement, challenge and stability in your work, then you would be glad to come across this page.
We are an IT Solutions Integrator/Consulting Firm helping our clients hire the right professional for an exciting long-term project. Here are a few details.
Check if you are up for maximizing your earning/growth potential, leveraging our Disruptive Talent Solution.
Location:
New York, NY
Type:
Contract
Our client is seeking a highly skilled AI Dev Ops Engineer to design, build, and operate scalable, secure, and production-grade infrastructure supporting modern AI platforms and LLM-powered applications.
This role sits at the intersection of Dev Ops, Platform Engineering, Site Reliability Engineering (SRE), and AI Infrastructure
, enabling high-performance AI systems, agent-based workflows, and enterprise AI platforms within a regulated financial services environment.
The ideal candidate will have strong expertise in Kubernetes, Terraform, cloud infrastructure, automation, and AI platform operations
, along with experience supporting modern AI/LLM workloads in production environments.
- Design, deploy, and manage scalable infrastructure for AI and LLM-based applications in production environments.
- Build and maintain Infrastructure-as-Code (IaC) using tools such as Terraform for secure, repeatable, and auditable deployments.
- Deploy, manage, and scale containerized environments using Kubernetes with a focus on high availability and reliability.
- Implement Dev Ops, Platform Engineering, and SRE best practices to improve system reliability, scalability, and operational efficiency.
- Support AI platform services for model serving, inference, experimentation, and evaluation workflows.
- Deploy and maintain infrastructure supporting AI agents, orchestration frameworks, and LLM runtime dependencies.
- Design and manage vector database infrastructure including Pinecone, Weaviate, or PostgreSQL with pgvector for RAG and semantic search use cases.
- Enable AI developer platforms and tooling for engineering teams building AI-powered applications.
- Implement monitoring, alerting, logging, and incident response processes for mission-critical AI systems.
- Collaborate with security, compliance, and governance teams to ensure adherence to regulatory and enterprise security standards.
- Continuously improve automation, developer experience, and operational processes for AI infrastructure environments.
- Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience.
- Proven experience as a Dev Ops Engineer, Platform Engineer, or Site Reliability Engineer (SRE).
- Strong hands‑on experience managing large‑scale production infrastructure.
- Expertise with Terraform and Infrastructure-as-Code (IaC) methodologies.
- Strong experience deploying and operating Kubernetes-based environments.
- Experience supporting infrastructure for AI platforms or LLM-based applications.
- Strong understanding of automation, scalability, reliability, and cloud-native architectures.
- Experience supporting production‑grade LLM applications and AI agent workloads.
- Hands‑on experience with vector databases such as Pinecone, Weaviate, or pgvector.
- Experience building or supporting AI tooling and internal AI developer platforms.
- Knowledge of observability, monitoring, capacity planning, and reliability engineering for AI/ML systems.
- Experience working within financial services or other highly regulated industries.
- Strong communication and cross‑functional collaboration skills.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).