AI Infrastructure Engineer San Jose, CA
Listed on 2026-06-18
-
IT/Tech
AI Engineer (Applied/Software), Systems Engineer, Cloud Computing: Infrastructure & Operations, IT Infrastructure
AI Infrastructure Engineer – San Jose, CA
Duration: 6+ months
Must have skills: AI, Kubernetes, Orchestration and Dev Ops (All four skills are mandatory)
Role DescriptionArchitect and build custom Artificial Intelligence (AI) infrastructure solutions leveraging the Nutanix Kubernetes Platform and Nutanix AI. You will be responsible for designing high-performance computational stacks that integrate Nutanix AI, high-speed software‑defined storage, and GPU‑accelerated nodes. Your mission is to make AI infrastructure invisible by optimizing for performance, power consumption, and seamless hybrid‑multicloud scalability across on‑prem.
Minimum Experience:
10 years. Educational
Qualification:
12 years full‑time education.
As an AI Infrastructure Engineer, you will design tailored AI solutions that bridge the gap between private data centers and public cloud. Your day‑to‑day will involve optimizing the Nutanix computational stack for large language models (LLMs) and generative AI workloads. You will serve as the SME for Nutanix AI, ensuring that compute, storage (Nutanix Objects/Files), and networking (Flow) are perfectly tuned for AI model training and inference.
Nutanix‑SpecificResponsibilities
- Hybrid Multicloud Architecture:
Design seamless AI workflows using NC2 on Prem, allowing for rapid bursting of AI workloads from on‑prem AHV clusters to the public cloud. - Data Services for AI:
Architect high‑performance storage backends using Nutanix Objects (S3‑compatible) to handle the massive datasets required for AI/ML. - Kubernetes & Orchestration:
Deploy and manage AI workloads using Nutanix Kubernetes Platform (NKP) to ensure containerized AI models are scalable and resilient. - Infrastructure‑as‑Code:
Implement IaC using Nutanix Calm or Terraform to automate the lifecycle of GPU‑enabled nodes. - Observability:
Design frameworks (monitoring, logging, alerting) for proactive issue detection. Hands‑on experience on Prometheus, Grafana, ELK, and Open Telemetry. Ensure high availability, disaster recovery, and fault tolerance across all systems. - Networking & Security:
Familiarity with Zero‑Trust architectures, enterprise networking, storage, and virtualization. - Invisible Infrastructure:
Modernize legacy 3‑tier AI silos into a unified, web‑scale Nutanix environment.
- Nutanix Core:
Deep proficiency in AOS (Acropolis Operating System) and AHV (Native Hypervisor). - AI Performance:
Experience with GPU Passthrough and vGPU configurations on Nutanix to optimize AI training performance. - Security:
Applying Nutanix Flow for micro‑segmentation to secure sensitive AI training data. - Cost Management:
Using Nutanix Cloud Manager (NCM) Cost Governance to monitor and optimize spend across hybrid environments.
- SME Leadership:
Act as the primary technical authority for Nutanix AI integrations within the San Jose office. - Collaboration:
Work across teams to dismantle data silos, moving the organization toward a "One Platform" philosophy. - Strategic Vision:
Stay ahead of Nutanix product roadmaps to inform long‑term AI infrastructure strategy.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).