×
Register Here to Apply for Jobs or Post Jobs. X

IOC Systems Analyst

Job in Fort Worth, Tarrant County, Texas, 76102, USA
Listing for: Optomi
Full Time position
Listed on 2026-06-12
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing, SRE/Site Reliability, IT Support
Salary/Wage Range or Industry Benchmark: 80000 - 100000 USD Yearly USD 80000.00 100000.00 YEAR
Job Description & How to Apply Below

Optomi, in partnership with a leading AI Cloud Service Provider, is seeking an IOC Systems Specialist to join a fast-paced operations team supporting large-scale HPC and GPU cloud environments.

Position Summary

The IOC Systems Specialist is responsible for providing Tier 2 operational support for high-performance computing (HPC) cloud infrastructure in a 24x7 IOC/NOC environment. This role focuses on monitoring, troubleshooting, and resolving complex incidents across Kubernetes clusters, Slurm-managed workloads, cloud services, and large-scale storage environments. The specialist will help ensure system stability, performance, and uptime while supporting mission-critical AI and GPU computing operations.

What

the right candidate will enjoy
  • Working with cutting-edge AI and HPC infrastructure technologies
  • Supporting large-scale GPU cloud environments in a highly technical operations setting
  • Collaborating with engineering and infrastructure teams to solve complex production issues
  • Opportunities to grow within cloud, Kubernetes, HPC, and observability technologies
  • Being part of a sustainability-focused organization powered by renewable energy
What type of experience the right candidate has
  • 2–5 years of experience supporting or operating HPC clusters in a production IOC/NOC environment
  • Hands‑on experience with Kubernetes and Slurm workload manager
  • Experience supporting storage technologies such as WEKA and VAST
  • Background in incident response, troubleshooting, and root cause analysis within complex systems
  • Familiarity with cloud platforms such as AWS, Azure, or GCP
  • Understanding of HPC networking and storage infrastructure, including Infini Band, Ethernet fabrics, and high‑throughput storage environments
  • Post‑secondary education in Computer Science, Engineering, or related technical discipline, or equivalent hands‑on experience
What the responsibilities are of the right candidate
  • Provide Tier 2 operational support for HPC cloud environments while maintaining system stability and SLA adherence
  • Monitor, troubleshoot, and resolve incidents related to Kubernetes, Slurm, storage systems, and associated cloud infrastructure
  • Act as an escalation point for Tier 1 support teams and coordinate with engineering teams for permanent resolution of issues
  • Perform root cause analysis and contribute to continuous operational improvements
  • Execute operational changes, maintenance activities, patching, and upgrades following change management procedures
  • Support and maintain monitoring, alerting, and observability tools for proactive issue detection
  • Maintain runbooks, operational documentation, incident reports, and knowledge base articles
  • Support operational readiness for new HPC technologies and infrastructure deployments
  • Provide guidance and mentorship to Tier 1 operations staff and data center technicians
  • Participate in a 24x7 rotating shift schedule and major incident response activities
#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary