×
Register Here to Apply for Jobs or Post Jobs. X

Site Reliability Engineer

Job in Irving, Dallas County, Texas, 75084, USA
Listing for: Optomi
Full Time, Seasonal/Temporary position
Listed on 2025-12-12
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing, SRE/Site Reliability, IT Support
Job Description & How to Apply Below

Optomi, in partnership with our client, is seeking an experienced SRE II to join their team for a 6‑month contract‑to‑hire opportunity that is 2 days hybrid onsite in Irving, TX.

W2 only – no C2C/sponsorship at this time.

We are seeking a highly skilled Site Reliability Engineer II to join our engineering organization. This role focuses on building resilient, scalable, and automated systems— not traditional production support.
The ideal candidate has hands‑on engineering experience across cloud infrastructure, observability, automation, and reliability‑focused development.

You will work closely with development, cloud engineering, and platform teams to ensure high availability, optimal performance, and operational excellence of critical customer‑facing applications.

Key Responsibilities
  • Contribute directly to the reliability, scalability, performance, and security of critical applications.
  • Build reusable services, automation, and frameworks that improve platform stability and developer velocity.

Design and enhance cloud infrastructure using Azure services, including:

  • Event Hub
  • Function Apps
  • Implement and manage Infrastructure as Code (IaC) using Terraform.
Containerization & Orchestration
  • Build and deploy containerized applications using Docker (2–3+ years).
  • Support Kubernetes workloads via AKS, including scaling, upgrades, and cluster reliability improvements.
Development & Dev Ops
  • Collaborate with development teams using a working knowledge of .NET.
Monitoring, Observability & Incident Response
  • Implement and optimize monitoring and alerting strategies.
  • Use Splunk Observability Cloud (preferred) or equivalent observability platforms to enhance visibility and reduce MTTR.
  • Drive proactive incident identification, root‑cause analysis, and long‑term fixes.
Performance, Reliability & Scalability Enhancements
  • Design and implement SLOs, SLIs, and error budgets.
  • Develop auto‑scaling policies, failover strategies, and disaster recovery procedures.
  • Optimize application and database performance to ensure reliability across high‑traffic, mission‑critical systems.
Required Qualifications
  • 3–5+ years of hands‑on SRE experience
  • Bachelor’s degree in Computer Science, Engineering, or a related technical field (or equivalent experience)
Hands‑on experience with:
  • Terraform
  • Docker
  • Monitoring tools (Splunk Observability Cloud preferred)
  • .NET ecosystem (understanding of development fundamentals)
Preferred Skills
  • Strong troubleshooting and analytical skills
  • Performance tuning across applications, databases, and cloud services
  • Experience improving uptime, latency, throughput, or cost efficiency of production applications
  • Familiarity with SRE principles and modern operational practices

Seniority level: Mid‑Senior level

Employment type: Full-time

Job function: Information Technology

Industries: IT Services and IT Consulting

#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary