×
Register Here to Apply for Jobs or Post Jobs. X

Architect - Platform Engineer

Job in Lubbock, Lubbock County, Texas, 79401, USA
Listing for: Jobgether
Full Time position
Listed on 2026-06-21
Job specializations:
  • IT/Tech
    AI Engineer (Applied/Software), SRE/Site Reliability, IT Infrastructure
Salary/Wage Range or Industry Benchmark: 150000 - 200000 USD Yearly USD 150000.00 200000.00 YEAR
Job Description & How to Apply Below

This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for an Architect - Platform Engineer based in the United States.

This is a senior-level architecture role focused on designing and scaling next-generation infrastructure for GenAI and large language model (LLM) workloads in enterprise and production environments. You will define the platform foundations that power distributed training, GPU-accelerated computing, and AI model deployment  role blends deep systems engineering expertise with modern cloud-native architecture, requiring strong fluency across Kubernetes, high-performance computing, and AI infrastructure stacks.

You will collaborate with data scientists, ML engineers, and software architects to deliver robust, scalable GenAI platforms. The environment is highly innovative, fast-paced, and centered on cutting‑edge AI transformation across industries. This role is ideal for a hands‑on architect who thrives at the intersection of infrastructure, performance engineering, and applied AI systems.

Accountabilities
  • Design, build, and optimize scalable infrastructure for GenAI and LLM workloads across multi‑GPU and distributed computing environments.
  • Architect and manage high‑performance compute platforms using Slurm clusters and container orchestration systems such as Kubernetes and Open Shift.
  • Lead GPU performance profiling, benchmarking, and optimization for distributed training and inference workloads.
  • Enable and maintain NVIDIA GPU ecosystem components including CUDA, cuDNN, NCCL, Triton, and related tooling.
  • Develop and operationalize GenAI pipelines supporting fine‑tuning, RAG architectures, multi‑modal systems, and LLMOps workflows.
  • Build reusable infrastructure‑as‑code templates using tools such as Terraform and Helm to support scalable deployments.
  • Collaborate with cross‑functional engineering teams to deploy AI solutions into both research and production environments.
  • Drive automation, CI/CD practices, and platform reliability through modern Dev Ops and cloud engineering principles.
  • Lead technical architecture discussions with internal and client‑facing stakeholders, providing scalable and production‑ready solutions.
Requirements
  • 10+ years of experience in platform engineering, infrastructure architecture, or high‑performance computing environments.
  • Strong hands‑on expertise with Kubernetes and/or Red Hat Open Shift in production‑scale deployments.
  • Deep knowledge of GPU computing ecosystems including CUDA, cuDNN, NCCL, Nsight, and TensorRT/Triton.
  • Proven experience with Slurm‑based distributed training systems and multi‑GPU optimization.
  • Strong Linux systems expertise with performance tuning and infrastructure scaling experience.
  • Experience building and deploying GenAI workloads such as LLM fine‑tuning, RAG pipelines, or multimodal AI systems.
  • Solid understanding of infrastructure‑as‑code tools including Terraform and Ansible.
  • Experience working with cloud GPU environments (AWS, Azure, GCP, OCI) or on‑prem GPU clusters.
  • Strong communication and leadership skills with experience mentoring teams and driving architecture decisions.
  • Ability to work in client‑facing environments and translate technical complexity into scalable solutions.
Benefits
  • Competitive compensation aligned with senior‑level platform engineering roles
  • Remote‑first flexibility across the United States and Canada regions
  • Opportunity to work on cutting‑edge GenAI and LLM infrastructure at enterprise scale
  • Exposure to leading cloud and AI ecosystems including major hyperscalers and GPU platforms
  • Career growth within a fast‑scaling AI‑first engineering organization
  • Hands‑on work with advanced technologies such as distributed training, GPU clusters, and LLM systems
  • Collaborative, innovation‑driven environment with strong emphasis on learning and technical excellence
  • Opportunity to work on high‑impact AI transformation projects across multiple industries.
#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary