×
Register Here to Apply for Jobs or Post Jobs. X

Platform Engineer

Job in Austin, Travis County, Texas, 78716, USA
Listing for: Green Key Resources
Full Time position
Listed on 2026-05-16
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing, SRE/Site Reliability, IT Support
Salary/Wage Range or Industry Benchmark: 80000 - 100000 USD Yearly USD 80000.00 100000.00 YEAR
Job Description & How to Apply Below


* Must be eligible for Top Security Clearance

Position Overview

We are seeking a Platform Engineer to lead the operation and reliability of our GPU-based bare-metal Kubernetes infrastructure. In this role, you will own CI/CD systems, maintain high-availability compute environments, and support deployments across both lab and field operations.

Responsibilities
  • Deploy, manage, and scale bare-metal Kubernetes clusters supporting NVIDIA GPUs, with hybrid cloud bursting to AWS for elastic compute and storage workloads.
  • Operate and optimize NVIDIA GPU infrastructure for machine learning training and inference workloads.
  • Own the end-to-end CI/CD lifecycle, including build automation, artifact management, signing, version pinning, and repeatable deployments across cloud and edge environments.
  • Design and maintain observability systems, including centralized logging, metrics collection, dashboards, and alerting to ensure real-time visibility into infrastructure and application health.
  • Partner with robotics, computer vision, and software engineering teams to develop streamlined developer tooling and improve engineering velocity for the our platform.
  • Implement and maintain infrastructure-as-code standards using tools such as Terraform, Helm, and Ansible across on-premises and cloud deployments.
  • Manage networking, storage, cluster security, and system hardening for production-grade bare-metal environments in accordance with applicable defense and security requirements.
Qualifications
  • Strong Python and Bash scripting skills.
  • 5+ years of experience in Platform Engineering, Dev Ops, Site Reliability Engineering, or Infrastructure Engineering roles supporting production Kubernetes environments.
  • Deep expertise administering bare-metal Kubernetes clusters, including cluster lifecycle management, CNI networking, storage backends, node operations, and upgrades.
  • Hands‑on experience with NVIDIA GPU infrastructure, including CUDA, Kubernetes GPU scheduling, NVIDIA device plugins, and ML orchestration platforms such as Kubeflow.
  • Strong experience building and maintaining CI/CD systems using tools such as Git Lab CI, Git Hub Actions, Jenkins, or similar platforms.
  • Experience with observability and monitoring stacks for distributed Linux systems, including centralized logging, metrics, and alerting platforms (e.g., ELK/Open Search, Prometheus, Grafana).
  • Experience building and maintaining Linux-based C++ and Python tool chains using CMake, including cross‑compilation for ARM-based platforms such as NVIDIA Jetson.
  • Strong Linux systems administration experience (Debian/Ubuntu preferred), including networking, storage management, kernel tuning, and security hardening in production environments.
#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary