Senior AI Platform Engineer - Frisco Job San Jose California USA,IT/Tech

Senior AI Platform Engineer - Frisco Role Overview

At McAfee, you’ll create solutions in a fun, challenging environment where innovation is encouraged—and excellence is recognized. You’ll use your awesome skills to help engineering

This role is responsible for designing, building, and scaling enterprise-grade Generative AI platforms and developer ecosystems. The focus is on enabling secure, scalable, reliable, and production-ready GenAI capabilities across the organization leveraging LLMs, AI gateways, Kubernetes, and cloud-native infrastructure.

The role combines deep expertise in platform engineering, AI infrastructure, and generative AI at enterprise scale. It operates with a platform-as-a-product mindset, enabling self-service AI capabilities through developer portals (e.g., Backstage templates and plugins) to accelerate adoption and standardization.

The engineer will partner closely with Security and Governance teams to embed responsible AI practices, enforce policy-driven controls, and provide token-level usage and cost visibility. This role also drives consistency in model access patterns, observability, and lifecycle management of AI services across environments.

This is a Hybrid Position located in Frisco, TX. We are only considering candidates within a commutable distance to the Frisco office. You will be required to be onsite on an as-needed basis; when not working onsite, you will work from your home office. We are only considering candidates within a commutable distance to the office location and are not offering relocation assistance at this time.

About

The Role

Design, build, and scale enterprise-grade Generative AI platforms supporting LLM applications, AI agents, RAG architectures, and multi-model routing.

Architect and implement secure, scalable AI infrastructure leveraging cloud-native technologies (AWS, GCP, Kubernetes, GKE/EKS).
Enable self-service AI capabilities for engineering teams through standardized platform services, APIs, and Backstage templates/plugins.
Build and operate Retrieval-Augmented Generation (RAG) infrastructure, including embedding pipelines and vector stores (Open Search, Aurora pgvector).
Develop and manage enterprise AI gateway capabilities, including model routing, rate limiting, token tracking, and policy enforcement.
Integrate GenAI services into CI/CD pipelines and platform workflows to enable seamless deployment and lifecycle management.
Build observability platforms for GenAI systems, tracking token usage, latency, response quality, failure rates, throughput, and cost visibility.
Own lifecycle management of Kubernetes-based AI platforms including upgrades, patching, scaling.
Define SLIs/SLOs and reliability benchmarks for AI platform services.
Implement AI security guardrails including PII redaction, prompt injection defenses, and policy-driven controls.
Integrate Dev Sec Ops and AI security scanning into deployment pipelines to enforce secure-by-design practices.
Design AI release validation, risk analysis, and governance frameworks for production readiness.
Build reusable infrastructure modules and platform automation frameworks using Infrastructure as Code (Terraform or equivalent).
Develop upgrade and patching strategies for AI platforms with minimal downtime and operational risk.
Ensure platform security posture, compliance, and lifecycle governance across environments.
Drive multi-cloud AI platform strategy and lead modernization initiatives across AWS and GCP.
Partner with Security and Governance teams to enforce responsible AI practices and enterprise standards.
Drive measurable improvements in developer productivity, platform adoption, and AI cost efficiency through standardized platform capabilities.

About You

10+ years of experience in platform engineering, with hands‑on AI/ML or GenAI platform experience.
Hands‑on experience with at least one LLM ecosystem (AWS Bedrock, OpenAI, Anthropic).
Strong Kubernetes experience (EKS/GKE), including GPU scheduling, autoscaling, and multi‑tenant isolation.
Strong programming expertise in Python and Go; experience building services using FastAPI and gRPC.
Deep expertise in AWS (IAM, VPC, KMS) and Infrastructure as…