AI Infrastructure Engineer San Jose,CA Job San Jose area,California USA,IT/Tech

AI Infrastructure Engineer – San Jose, CA

Duration: 6+ months

Must have skills: AI, Kubernetes, Orchestration and Dev Ops (All four skills are mandatory)

Role Description

Architect and build custom Artificial Intelligence (AI) infrastructure solutions leveraging the Nutanix Kubernetes Platform and Nutanix AI. You will be responsible for designing high-performance computational stacks that integrate Nutanix AI, high-speed software‑defined storage, and GPU‑accelerated nodes. Your mission is to make AI infrastructure invisible by optimizing for performance, power consumption, and seamless hybrid‑multicloud scalability across on‑prem.

Minimum Experience:

10 years. Educational

Qualification:

12 years full‑time education.

Summary

As an AI Infrastructure Engineer, you will design tailored AI solutions that bridge the gap between private data centers and public cloud. Your day‑to‑day will involve optimizing the Nutanix computational stack for large language models (LLMs) and generative AI workloads. You will serve as the SME for Nutanix AI, ensuring that compute, storage (Nutanix Objects/Files), and networking (Flow) are perfectly tuned for AI model training and inference.

Nutanix‑Specific

Responsibilities

Hybrid Multicloud Architecture:
Design seamless AI workflows using NC2 on Prem, allowing for rapid bursting of AI workloads from on‑prem AHV clusters to the public cloud.
Data Services for AI:
Architect high‑performance storage backends using Nutanix Objects (S3‑compatible) to handle the massive datasets required for AI/ML.
Kubernetes & Orchestration:
Deploy and manage AI workloads using Nutanix Kubernetes Platform (NKP) to ensure containerized AI models are scalable and resilient.
Infrastructure‑as‑Code:
Implement IaC using Nutanix Calm or Terraform to automate the lifecycle of GPU‑enabled nodes.
Observability:
Design frameworks (monitoring, logging, alerting) for proactive issue detection. Hands‑on experience on Prometheus, Grafana, ELK, and Open Telemetry. Ensure high availability, disaster recovery, and fault tolerance across all systems.
Networking & Security:
Familiarity with Zero‑Trust architectures, enterprise networking, storage, and virtualization.
Invisible Infrastructure:
Modernize legacy 3‑tier AI silos into a unified, web‑scale Nutanix environment.

Professional & Technical Skills

Nutanix Core:
Deep proficiency in AOS (Acropolis Operating System) and AHV (Native Hypervisor).
AI Performance:
Experience with GPU Passthrough and vGPU configurations on Nutanix to optimize AI training performance.
Security:
Applying Nutanix Flow for micro‑segmentation to secure sensitive AI training data.
Cost Management:
Using Nutanix Cloud Manager (NCM) Cost Governance to monitor and optimize spend across hybrid environments.

Expectations

SME Leadership:
Act as the primary technical authority for Nutanix AI integrations within the San Jose office.
Collaboration:

Work across teams to dismantle data silos, moving the organization toward a "One Platform" philosophy.
Strategic Vision:
Stay ahead of Nutanix product roadmaps to inform long‑term AI infrastructure strategy.

#J-18808-Ljbffr

AI Infrastructure Engineer San Jose, CA