Senior Site Reliability Engineer Job Irving area,Texas USA,IT/Tech

Location

Charlotte, NC;
Irving, TX;
Chandler, AZ

Duration

12+ Months (Extension Converts or Direct Hire)

Hybrid

Hybrid work schedule

Pay Rate

$70/hr on W2 + Benefits

Overview

Seeking a senior engineer for L2/L3 application and middleware production support with an SRE mindset (shift from reactive to proactive reliability) across VM and container‑adjacent/Open Shift (OCP) environments. The role owns incident response, problem management, and runbook‑driven operations, and drives observability, automation/IaC, compliance guardrails, and CI/CD‑integrated operational automation to reduce toil and improve stability/MTTR.

Responsibilities

L2/L3 escalation and recovery; reliability signals & alert quality; blameless post‑incident learning.
Logs, metrics, traces, dashboards, and actionable alerting.
Infrastructure‑as‑code and config‑as‑code.
Standardized automation (status, start, stop, restart).
Intelligent automation / AI‑assisted ops with guardrails.
Drift and compliance checks & remediation.
CI/CD integration.
Runbooks and operational documentation.
Embed SRE practices: define reliability signals, improve alert quality, drive blameless learning, and prioritize systemic fixes and toil reduction.
Implement and continuously improve observability across applications and middleware to improve detection, diagnosis, and MTTR.
Design, develop, and maintain IaC and config‑as‑code for VM‑based and container‑adjacent workloads, including Open Shift (OCP) enablement.
Build and support automation for operational actions across middleware components to enable safer self‑service and reduce dependency bottlenecks.
Integrate AI/agent‑based approaches into workflows for triage assistance, predictive signals, and automated remediation guardrails.
Monitor configuration drift, support automated compliance checks, and implement remediation patterns aligned with enterprise change management, security, and risk controls.
Integrate infrastructure and operational automation with CI/CD pipelines for repeatable, auditable deployments and safer rollouts.
Support core platform components that enable applications and container platforms, including ingress patterns, load balancing integration, and shared supporting services.

Qualifications

4+ years of Systems Engineering or Technology Infrastructure/Operations Engineering experience, or equivalent demonstrated through work experience, training, military experience, or education.
4+ years of application and/or middleware production support in complex, high‑availability environments, including incident response and problem management with strong root cause discipline.
4+ years of hands‑on automation and configuration management experience (Ansible preferred or similar) and strong scripting skills (Python, Bash, Power Shell, or similar).
4+ years of Linux administration (RHEL preferred) and/or Windows Server administration supporting enterprise production workloads.
4+ years of Git‑based version control practices, including pull requests and peer review, focused on repeatability and code quality.
Experience with infrastructure‑as‑code concepts, modular design, and environment consistency.
Experience supporting hybrid/private cloud platforms and container‑adjacent hosting models; familiarity with Open Shift (OCP) or Kubernetes‑based platforms.
Experience implementing SRE operating practices (reliability metrics, reduction of manual toil, continuous improvement via post‑incident learnings).
Experience supporting common middleware platforms and shared services; ability to build automation patterns that standardize operational actions and reduce manual intervention.
Familiarity with enterprise observability and operational support practices (service health dashboards, alert engineering, actionable telemetry).
Exposure to responsible AI usage in operations (security, validation, accuracy, and appropriate guardrails for automation/agents).
Strong cross‑functional communication skills and experience operating in regulated environments.

Job Expectations

Deliver assigned operational engineering and automation outcomes with a strong focus on stability, resiliency, and measurable toil reduction.
Participate in on‑call rotations and operational support coverage as required.
Follow enterprise change management, risk, and compliance processes.
Continuously improve platform reliability and automation maturity through standardization, documentation, and repeatable delivery.
This position offers a hybrid work schedule.
This position is not eligible for visa sponsorship.
Relocation assistance is not available for this position.
Flexibility to work in a 24/7 environment, including weekends and holidays.
Flexibility to frequently be on call beyond normal working hours.

#J-18808-Ljbffr