×
Register Here to Apply for Jobs or Post Jobs. X

Senior Site Reliability Engineer

Job in Plano, Collin County, Texas, 75086, USA
Listing for: Optomi
Full Time position
Listed on 2026-02-07
Job specializations:
  • IT/Tech
    SRE/Site Reliability, Systems Engineer
Job Description & How to Apply Below

Lead/Sr. SRE (Site Reliability Engineer) | AWS Hybrid 4x a week on-site| Plano, TX

Optomi, in partnership with a client in the financial services sector, is seeking a senior SRE engineer to ensure reliability, performance and availability of the applications within each domain. As a senior SRE engineer - applications, you will be working with development engineers, product owners, SRE Infrastructure, production engineers and Technology Operations Center personnel with a primary focus on improving observability, automation, overall system health, reliability and uptime.

Key Responsibilities:
  • Design, code, and maintain automation to streamline operations, reduce manual tasks, and improve system efficiency to enable a robust application environment.
  • Work with observability engineers to enable actionable insights into applications and infrastructure health and performance. Foster a collaborative team-culture and support professional development.
  • Ensure scalable & repeatable code deployments with CI/CD pipelines using Git Hub & Harness, repeatable deployments with infrastructure as code (IaC) using Terraform.
  • Build automation and operational runbooks primarily using Python scripting.
  • Manage container orchestration platforms and related cloud-native services.
  • Drive reliability improvements through Service Level Objectives (SLOs), error budgets, and Service Level Agreements (SLAs) aligned with business goals.
  • Design & implement observability improvements using Dynatrace & Cloud Watch.
  • Lead major incident responses and coordinate with stakeholders for resolution and drive problem management to prevent recurrence.
  • Conduct blameless post-incident reviews and drive continuous improvement.
  • Collaborate cross-functionally to embed SRE principles into application design and operation meeting reliability goals.
  • Participate in architectural reviews, providing input on reliability and scalability.
Key

Qualifications:
  • Experience with Dev Ops tools like Git Hub, Harness & Dynatrace.
  • Experience building self-healing systems and automated remediation workflows.
  • Demonstrated experience in problem-solving, key SRE/Dev Ops concepts & tools with a proven track record of achieving high system reliability and performance.
  • Strong experience with Terraform for AWS IaC.
  • Proficient in scripting and automation with Python and familiar with monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack).
  • Deep knowledge of container orchestration (Kubernetes/EKS).
  • Deep understanding of cloud platforms (e.g., AWS, GCP, Azure) and container orchestration technologies (e.g., Kubernetes).
  • Effective communication skills, with the ability to convey complex technical concepts to diverse audiences.
Preferred Qualifications:
  • Familiarity with Git Ops, secrets management, and infrastructure monitoring best practices.
  • Experience building self-healing systems and automated remediation workflows.
#J-18808-Ljbffr
Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary