×
Register Here to Apply for Jobs or Post Jobs. X

IOC Systems Specialist

Job in Fort Worth, Tarrant County, Texas, 76102, USA
Listing for: Optomi
Full Time position
Listed on 2026-06-12
Job specializations:
  • IT/Tech
    SRE/Site Reliability, Cloud Computing: Infrastructure & Operations, IT Infrastructure, IT Support
Salary/Wage Range or Industry Benchmark: 60000 - 80000 USD Yearly USD 60000.00 80000.00 YEAR
Job Description & How to Apply Below

Onsite | M-F 8 hr shifts (rotating on call)

Optomi, in partnership with a leading AI cloud infrastructure organization, is seeking an IOC Systems Specialist to join their growing operations team in Fort Worth, TX. This role will provide Tier 2 operational support for high-performance computing (HPC) cloud environments focused on large-scale AI training and inference workloads. The ideal candidate will have hands‑on experience supporting HPC infrastructure, Kubernetes environments, Slurm workload management, and enterprise storage platforms such as WEKA and VAST.

This individual will play a key role in maintaining system stability, troubleshooting complex incidents, and supporting mission‑critical infrastructure within a 24x7 IOC/NOC environment.

What the Right Candidate Will Enjoy:
  • Working with cutting-edge AI and HPC infrastructure technologies!
  • Exposure to advanced Kubernetes, cloud, and storage technologies!
  • Opportunities to contribute to operational improvements and automation initiatives!
  • Joining a fast‑growing organization focused on sustainable, renewable‑powered AI infrastructure!
  • Collaborative environment with strong technical leadership and growth opportunities!
What Type of Experience the Right Candidate Has:
  • 2–5 years of experience supporting or operating HPC clusters in production environments
  • Strong operational experience with WEKA and VAST storage platforms
  • Hands‑on experience with Kubernetes administration and troubleshooting
  • Experience supporting Slurm workload manager environments
  • Familiarity with HPC monitoring, observability, and alerting platforms
  • Experience performing incident response and root cause analysis in complex systems
  • Understanding of cloud platforms such as AWS, Azure, or GCP
  • Knowledge of HPC networking and storage technologies, including Infini Band and high‑throughput interconnects
Responsibilities of the Right Candidate:
  • Provide Tier 2 operational support for HPC cloud infrastructure environments
  • Monitor, troubleshoot, and resolve incidents involving Kubernetes, Slurm, storage, networking, and cloud systems
  • Serve as an escalation point for Tier 1 support teams
  • Perform root cause analysis and coordinate with engineering teams on permanent resolutions
  • Execute operational changes, upgrades, patching, and maintenance activities
  • Maintain and improve operational documentation, runbooks, and knowledge base articles
  • Support monitoring and observability tooling to proactively identify system issues
  • Assist with operational readiness and production support for new HPC capabilities
  • Mentor junior operations staff and support continuous service improvement initiatives
  • Participate in on‑call rotations and major incident response activities
Job Must Haves:
  • Must have hands‑on experience with WEKA and VAST storage environments
  • 2–5 years supporting HPC clusters in production or IOC/NOC environments
  • Working knowledge of Kubernetes
  • Operational experience with Slurm workload manager
  • Familiarity with HPC monitoring and observability tooling
  • Experience with incident response and root cause analysis
  • Understanding of AWS, Azure, or GCP cloud platforms
  • Knowledge of HPC networking and storage infrastructure
  • Ability to work onsite in Fort Worth on a rotating 12‑hour shift schedule
Nice to Have

Skills:
  • Relevant certifications such as CKA/CKAD, RHCSA, Linux+, ITIL, or Server+
  • Experience with GPU or HPC vendor technologies
  • Experience supporting AI or large‑scale compute environments
#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary