×
Register Here to Apply for Jobs or Post Jobs. X

Senior Site Reliability Engineer

Job in Irving, Dallas County, Texas, 75084, USA
Listing for: Themesoft Inc.
Full Time position
Listed on 2026-06-05
Job specializations:
  • IT/Tech
    Systems Engineer, SRE/Site Reliability
Salary/Wage Range or Industry Benchmark: 70 USD Hourly USD 70.00 HOUR
Job Description & How to Apply Below

Location

Charlotte, NC;
Irving, TX;
Chandler, AZ

Duration

12+ Months (Extension Converts or Direct Hire)

Hybrid

Hybrid work schedule

Pay Rate

$70/hr on W2 + Benefits

Overview

Seeking a senior engineer for L2/L3 application and middleware production support with an SRE mindset (shift from reactive to proactive reliability) across VM and container‑adjacent/Open Shift (OCP) environments. The role owns incident response, problem management, and runbook‑driven operations, and drives observability, automation/IaC, compliance guardrails, and CI/CD‑integrated operational automation to reduce toil and improve stability/MTTR.

Responsibilities
  • L2/L3 escalation and recovery; reliability signals & alert quality; blameless post‑incident learning.
  • Logs, metrics, traces, dashboards, and actionable alerting.
  • Infrastructure‑as‑code and config‑as‑code.
  • Standardized automation (status, start, stop, restart).
  • Intelligent automation / AI‑assisted ops with guardrails.
  • Drift and compliance checks & remediation.
  • CI/CD integration.
  • Runbooks and operational documentation.
  • Embed SRE practices: define reliability signals, improve alert quality, drive blameless learning, and prioritize systemic fixes and toil reduction.
  • Implement and continuously improve observability across applications and middleware to improve detection, diagnosis, and MTTR.
  • Design, develop, and maintain IaC and config‑as‑code for VM‑based and container‑adjacent workloads, including Open Shift (OCP) enablement.
  • Build and support automation for operational actions across middleware components to enable safer self‑service and reduce dependency bottlenecks.
  • Integrate AI/agent‑based approaches into workflows for triage assistance, predictive signals, and automated remediation guardrails.
  • Monitor configuration drift, support automated compliance checks, and implement remediation patterns aligned with enterprise change management, security, and risk controls.
  • Integrate infrastructure and operational automation with CI/CD pipelines for repeatable, auditable deployments and safer rollouts.
  • Support core platform components that enable applications and container platforms, including ingress patterns, load balancing integration, and shared supporting services.
Qualifications
  • 4+ years of Systems Engineering or Technology Infrastructure/Operations Engineering experience, or equivalent demonstrated through work experience, training, military experience, or education.
  • 4+ years of application and/or middleware production support in complex, high‑availability environments, including incident response and problem management with strong root cause discipline.
  • 4+ years of hands‑on automation and configuration management experience (Ansible preferred or similar) and strong scripting skills (Python, Bash, Power Shell, or similar).
  • 4+ years of Linux administration (RHEL preferred) and/or Windows Server administration supporting enterprise production workloads.
  • 4+ years of Git‑based version control practices, including pull requests and peer review, focused on repeatability and code quality.
  • Experience with infrastructure‑as‑code concepts, modular design, and environment consistency.
  • Experience supporting hybrid/private cloud platforms and container‑adjacent hosting models; familiarity with Open Shift (OCP) or Kubernetes‑based platforms.
  • Experience implementing SRE operating practices (reliability metrics, reduction of manual toil, continuous improvement via post‑incident learnings).
  • Experience supporting common middleware platforms and shared services; ability to build automation patterns that standardize operational actions and reduce manual intervention.
  • Familiarity with enterprise observability and operational support practices (service health dashboards, alert engineering, actionable telemetry).
  • Exposure to responsible AI usage in operations (security, validation, accuracy, and appropriate guardrails for automation/agents).
  • Strong cross‑functional communication skills and experience operating in regulated environments.
Job Expectations
  • Deliver assigned operational engineering and automation outcomes with a strong focus on stability, resiliency, and measurable toil reduction.
  • Participate in on‑call rotations and operational support coverage as required.
  • Follow enterprise change management, risk, and compliance processes.
  • Continuously improve platform reliability and automation maturity through standardization, documentation, and repeatable delivery.
  • This position offers a hybrid work schedule.
  • This position is not eligible for visa sponsorship.
  • Relocation assistance is not available for this position.
  • Flexibility to work in a 24/7 environment, including weekends and holidays.
  • Flexibility to frequently be on call beyond normal working hours.
#J-18808-Ljbffr
Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary