Architect - Platform Engineer Job Lubbock area,Texas USA,IT/Tech

This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for an Architect - Platform Engineer based in the United States.

This is a senior-level architecture role focused on designing and scaling next-generation infrastructure for GenAI and large language model (LLM) workloads in enterprise and production environments. You will define the platform foundations that power distributed training, GPU-accelerated computing, and AI model deployment role blends deep systems engineering expertise with modern cloud-native architecture, requiring strong fluency across Kubernetes, high-performance computing, and AI infrastructure stacks.

You will collaborate with data scientists, ML engineers, and software architects to deliver robust, scalable GenAI platforms. The environment is highly innovative, fast-paced, and centered on cutting‑edge AI transformation across industries. This role is ideal for a hands‑on architect who thrives at the intersection of infrastructure, performance engineering, and applied AI systems.

Accountabilities

Design, build, and optimize scalable infrastructure for GenAI and LLM workloads across multi‑GPU and distributed computing environments.
Architect and manage high‑performance compute platforms using Slurm clusters and container orchestration systems such as Kubernetes and Open Shift.
Lead GPU performance profiling, benchmarking, and optimization for distributed training and inference workloads.
Enable and maintain NVIDIA GPU ecosystem components including CUDA, cuDNN, NCCL, Triton, and related tooling.
Develop and operationalize GenAI pipelines supporting fine‑tuning, RAG architectures, multi‑modal systems, and LLMOps workflows.
Build reusable infrastructure‑as‑code templates using tools such as Terraform and Helm to support scalable deployments.
Collaborate with cross‑functional engineering teams to deploy AI solutions into both research and production environments.
Drive automation, CI/CD practices, and platform reliability through modern Dev Ops and cloud engineering principles.
Lead technical architecture discussions with internal and client‑facing stakeholders, providing scalable and production‑ready solutions.

Requirements

10+ years of experience in platform engineering, infrastructure architecture, or high‑performance computing environments.
Strong hands‑on expertise with Kubernetes and/or Red Hat Open Shift in production‑scale deployments.
Deep knowledge of GPU computing ecosystems including CUDA, cuDNN, NCCL, Nsight, and TensorRT/Triton.
Proven experience with Slurm‑based distributed training systems and multi‑GPU optimization.
Strong Linux systems expertise with performance tuning and infrastructure scaling experience.
Experience building and deploying GenAI workloads such as LLM fine‑tuning, RAG pipelines, or multimodal AI systems.
Solid understanding of infrastructure‑as‑code tools including Terraform and Ansible.
Experience working with cloud GPU environments (AWS, Azure, GCP, OCI) or on‑prem GPU clusters.
Strong communication and leadership skills with experience mentoring teams and driving architecture decisions.
Ability to work in client‑facing environments and translate technical complexity into scalable solutions.

Benefits

Competitive compensation aligned with senior‑level platform engineering roles
Remote‑first flexibility across the United States and Canada regions
Opportunity to work on cutting‑edge GenAI and LLM infrastructure at enterprise scale
Exposure to leading cloud and AI ecosystems including major hyperscalers and GPU platforms
Career growth within a fast‑scaling AI‑first engineering organization
Hands‑on work with advanced technologies such as distributed training, GPU clusters, and LLM systems
Collaborative, innovation‑driven environment with strong emphasis on learning and technical excellence
Opportunity to work on high‑impact AI transformation projects across multiple industries.

#J-18808-Ljbffr