×
Register Here to Apply for Jobs or Post Jobs. X

Site Reliability Engineer – Automation

Job in Memphis, Shelby County, Tennessee, 37544, USA
Listing for: xAI
Full Time position
Listed on 2025-12-30
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing, SRE/Site Reliability, Data Engineer
Salary/Wage Range or Industry Benchmark: 80000 - 100000 USD Yearly USD 80000.00 100000.00 YEAR
Job Description & How to Apply Below

About xAI

xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands‑on and to contribute directly to the company’s mission.

Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All engineers are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.

About the Role

As a Site Reliability Engineer in Automation, you will focus on automating firmware upgrades, scripting solutions for hardware from key vendors like NVIDIA, Dell, Supermicro, and HP, and proactively identifying issues to implement automated fixes. Leveraging skills in Python, Bash, Linux, and Kubernetes, you will enhance datacenter efficiency, reduce manual interventions, and support scalable AI infrastructure at xAI.

Responsibilities
  • Develop and maintain scripts in Python and Bash for handling firmware packages, performing upgrades, and automating the entire process across Linux and Kubernetes environments.
  • Work with hardware from vendors such as NVIDIA, Dell, Supermicro, and HP to ensure seamless firmware integration, testing, and deployment in the datacenter.
  • Identify operational problems in real‑time, design automated fixes or workflows to resolve them, and implement scalable solutions to prevent recurrence.
  • Collaborate with Datacenter Operations Technicians to deploy automation tools, troubleshoot firmware‑related issues, and optimize processes for high‑availability systems.
  • Integrate automation scripts into CI/CD pipelines or orchestration tools like Kubernetes for efficient scaling and management.
  • Monitor and refine automated processes, ensuring they align with datacenter reliability goals and minimize downtime.
  • Document automation scripts, firmware upgrade procedures, and problem‑solving approaches to build a reusable knowledge base for the team.
  • Participate in on‑call rotations and incident response, applying automation to accelerate resolutions in the Memphis datacenter.
Required Qualifications
  • Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent experience).
  • 5+ years of experience in site reliability engineering or automation roles, preferably in datacenter or cloud environments.
  • Proficiency in Python, Bash, Linux, and Kubernetes for scripting, automation, and orchestration.
  • Hands‑on experience with firmware packages, including writing scripts for upgrades and automating deployment processes.
  • Familiarity with hardware from vendors like NVIDIA, Dell, Supermicro, and HP, including integration and troubleshooting in production settings.
  • Strong problem‑solving skills with a proven ability to identify issues and automate fixes to improve system efficiency.
  • Experience in high‑performance computing or AI infrastructure environments.
  • Excellent collaboration skills for working with cross‑functional teams in fast‑paced settings.
Preferred Qualifications
  • Experience automating firmware management in large‑scale datacenters or supercomputing clusters.
  • Knowledge of additional tools like Ansible, Terraform, ArgoCD or additional containerization tools for enhanced automation.
  • Prior work in a startup or tech company like xAI, with contributions to scalable automation systems.

xAI is an equal opportunity employer.

California Consumer Privacy Act (CCPA) Notice#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary